Learning from COVID-19: The global health emergency has ended. Here’s what is needed to prepare for the next one
This article was originally published on The Conversation, an…
More results...
This article was originally published on The Conversation, an…
Colombian rebel leader Pablo Beltran says peace talks between his National Liberation Army and the government have been put “on pause” due to remarks made last week by President Gustavo Petro
Hey everyone out there,
Has anybody ever tried to crawl or scrape the web pages to find out some sensitive information or a potentially malicious URL that’s infected and now falls under a phishing attack ?
I have been thinking about this a lot for few days, I checked some websites that maintain a huge database of infected or malicious links, possibly a phishing links that are submitted by the users or you know found using automated scanners. The question I am more interested in is, how these automated scanners work ? Are they web crawlers underneath, and if yes, what’s the starting point of these web crawlers? By starting point, I meant, at first we need to provide some URLs from where they’ll start crawling and scraping all the hyper links associated with that URL or a web page and then recursively go to next URLs and follow the same process there as well ?
Some of the useful online platforms for finding malicious URLs :
PhishTank
openphish
urlscan
Hunchly
Malware baazar
(Please add more similar platforms in the comments if you know)
One thing I’d like to mention here about Hunchly, their daily reports subscription of dark web is what I am interested in, not specifically only dark web but it could be the websites on the surface web as well.
I was curious about how do crawlers, botnets and such things work, so I wanted to do some type of basic project in order to understand more about the domain of “phishing and threat intelligence”, basically the idea was to scrape websites across the internet, check if they are legit or an infected one, if they are infected, add it up on a spreadsheet or in CSV file.
I do know, many websites now don’t allow these bots to scrape their data and it’s against TOS of most of the websites, and in order to prevent it, many websites have already set-up their defense mechanisms, either via detecting such crawlers using ML algorithms which is trained on the huge datasets, or rate limiting (a very basic defense).
Please let me know if you have any ideas about this, also, what would be the best programming language to design a web crawler, keeping in mind the speed and multi processing architecture each language follows. So far I have heard about GO or JS, not sure about python tho, although have seen many examples written in python as well.
submitted by /u/RoninPark
[link] [comments]
Amid the high-level efforts to settle global crises, this weekend’s Group of 7 summit of rich democracies will also see an unusual diplomatic reconciliation.
China’s activity data for April continued to show an uneven path for recovery as the economy continues to emerge from the impact of its stringent Covid restrictions.
A group of divers on a dream holiday on the Red Sea off Egypt found themselves in a race against time to survive after their 137ft yacht suddenly capsized.
Despite having the gene that increases the risk of dementia, the Colombian man is the second patient to have been discovered with ‘natural immunity’ to Alzheimer’s. Scientists hope that this may help them to develop ways to stop or delay the onset of t…