Open source intelligence (OSINT) from the surface, deep or dark web is invaluable to threat intelligence investigations. Find the shortcuts to improve your research.
“Open source intelligence” doesn’t just refer to the accessibility of information. OSINT is the practice of collecting information from publicly available sources.
OSINT grew out of spycraft as it shifted away from clandestine methods of information gathering (think phone tapping, tails) and toward scouring publicly available information like newspapers and files or databases open to the public.
With the advent of the internet, vastly more information became publicly available and OSINT became increasingly useful not just to sophisticated government agencies and law enforcement, but to financial crime analysts, fraud and brand misuse investigations and particularly cybersecurity.
Cybersecurity teams frequently use OSINT for OPSEC (operational security) by understanding what of their company’s information is publicly available. This information may be on assets they control that are designed to be public-facing or become so through error, or on assets outside the company perimeter, like social media or third-party websites that may accidentally leak information.
The examples given are where companies may perform OSINT on the surface web (i.e., the internet most of us use every day). But OSINT can also be conducted on the deep or dark web.
The deep web is a layer below the surface web that requires login or subscription services. These sites can include academic journals, court record databases or even services like Netflix. OSINT can still be applied even to sites requiring login or subscription — as long as analysts can access the information legally, without hacking.
And, that extends to the dark web.
While the surface and deep web can be accessed by any common browser, the dark web requires specific software, like Tor (The Onion Router). Once inside, there’s lots of information that can be beneficial to threat intelligence gathering and other investigations.
If you’re using the dark web for OSINT, it’s important to remember:
As discussed above, OSINT is a valuable technique for OPSEC, but it can also be used to gather threat intelligence to proactively reduce cyber risks.
OSINT is used to analyze, monitor and track cyberthreats from targeted or indiscriminate attacks against an organization by malware and bad actors. There are typically one of two sources that trigger a cyber OSINT investigation:
In the case of an issue caught by a TIP, while the initial indicator is valuable, the level of detail and specificity to the organization often will require enrichment to understand how significant it is. Conducting OSINT across the surface, deep and dark web can enrich the indicator to understand urgency and scope. For example, a TIP may flag that email addresses and passwords are in a breach package or on a forum or dark web site. An analyst will want to go and see the full breach package to understand potential high-ranking targets for phishing attacks.
Additionally, the analyst can provide more detailed information regarding the breached information to include who may be impacted at their organization along with how the breach occurred for more amplifying information.
In the case where a threat hunter identifies an anomaly on the internal network, they need to understand if it’s malicious. This often requires a lot of research into current attacker tactics, techniques, procedures (TTPs). This may require researching and collecting info in areas where attackers reside like forums.
When it comes to the identification of a new threat or vulnerability that was reported by a news organization or cybersecurity news/research organization, there is the need for the analyst to confirm the reports. This is done by not only looking on the surface and deep web for additional reporting and details, but it may also include looking on the dark web for information on where this new threat or vulnerability will be conducted or has been conducted. This is where having the knowledge and ability to access the deep and dark web becomes important for a cyberthreat or cybersecurity analyst.
When searching for information on the surface web, the websites themselves hold several keys about who might be behind the content. (On the dark web, you won’t be so lucky as site operators and owners are anonymous.) These services provide user-friendly protocols for retrieving that information from the databases that house domain data.
WHOIS records provide top-level domain (the .com or .org root of the URL) information. This includes addresses, names and phone numbers used to register the domain, the date of registration and details about where it is hosted.
By combining WHOIS query and response protocols with additional search tools, investigators can uncover more information.
URLscan.io is a service that provides the end user with analysis of the IP address information and HTTP connections made during the site’s retrieval. The result panels include a top-level survey of what country the site is hosted in, what links are included on the main page and the IP location details. Details about how many subdomains it contains and what external links it contains can be found as well.
Through WHOIS analysis, hosting details can also be discovered. This can help lead investigators to find servers that host multiple sites or share webmasters, as well as valuable owner information.
DomainIQ operates similarly to URLscan.io and can provide identifying details about the site owner, host and what other pages they may be operating.
By using advanced search engine techniques, we can search the identifying data from WHOIS records (such as emails, names, servers or IP addresses) and find additional clues or information that may be lurking on other sites.
Carbon Date uses the advanced search engine technique of “carbon dating” that analyzes a website and gives the earliest known creation date of the page. You can also view previous versions of the page, including the first known scrape through archive.org.
“Google Dorking'' is the process of using advanced search parameters on Google. There are several techniques that can be used ranging from simple to more advanced. Some of the most common Bolean logic search operators are using quotes to search for exact phrasing or the dash symbol (-) to exclude specific words. You can also use Google to search specific file types or recent caches of a specific site.
These techniques can be used to find identifying information about moderators or search a site for identifying pieces. It can also be used to string together sites sharing specific information.
Common Google Dorking techniques include:
All of these tools can help investigate ownership and hosting information about the sites relevant to your research. Using WHOIS records and advanced search engine techniques can reveal identifying details on the host, moderator and IP, as well as what other sites might be sourced from the same owners.
Learn more about WHOIS records analysis, advanced search engine use and real-world examples of these techniques in action in our flash report, Investigating Site Ownership and History >
There are tons of tools available to aid OSINT for threat intelligence gathering, many of which are free to use. Below are some of our top go-to’s for conducting OSINT on the surface and dark web.
OSINT Framework indexes a multitude of connections to different URLs, recommending where to look next when conducting an investigation. It also provides suggestions on what services can help analysts find specific data that might aid in their research.
When you plug a piece of data (such as an email address, phone number, name, etc.) into the framework, it returns all known online sources that contain information relevant to that data. OSINT Framework also offers a list of potential resources where more information related to that particular source can be found.
Integrate data from public sources, commercial vendors and internal sources via the Maltego Transform Hub.All data comes pre-packaged as Transforms, ready to be used in investigations. Maltego takes one artifact and finds more.
A user feeds Maltego domain names, IP addresses, domain records, URLs or emails. The service finds connections and relationships within the data and allows users to create graphs in an intuitive point- and-click logic.
Websites are just one part of the internet. Shodanallows analysts to discover which of their devices are connected to the internet, where they are located and who is using them.
Shodan helps researchers monitor all devices within their network that are directly accessible from the internet and therefore vulnerable to attacks.
ThreatMineris a threat intelligence portal designed to enable an analyst to research indicators of compromise (IOCs) under a single interface. That interface allows for not only looking up IOCs but also providing the analyst with contextual information. With this context, the IOC is not just a data point but a useful piece of information and potentially intelligence.
Identify and enrich indicators of compromise to have a better understanding of attack origins.
Torch,or TorSearch, is a search engine designed to explore the hidden parts of the internet. Torch claims to have over a billion darknet pages indexed and allows users to browse the dark web uncensored and untracked.
Torch promises peace of mind to researchers who venture into the dark web to explore .onion sites. It also doesn't censor results — so investigators can find all types of information and join discussion forums to find out more about current malware, stolen data for sale or groups who might be planning a cyberattack.
Dark.failhas been crowned the new hidden wiki. It indexes every major darknet site and keeps track of all domains linked to a particular hidden service.
Tor admins rely on Dark.fail to disseminate links in the wake of takedowns of sites like DeepDotWeb. Researchers can use Dark.fail when exploring sites that correlate with the hidden service.