Open source intelligence (OSINT) from the surface, deep or dark web is invaluable to threat intelligence investigations. Find the shortcuts to improve your research.
What is OSINT?
“Open source intelligence” doesn’t just refer to the accessibility of information. OSINT is the practice of collecting information from publicly available sources.
OSINT grew out of spycraft as it shifted away from clandestine methods of information gathering (think phone tapping, tails) and toward scouring publicly available information like newspapers and files or databases open to the public.
With the advent of the internet, vastly more information became publicly available and OSINT became increasingly useful not just to sophisticated government agencies and law enforcement, but to financial crime analysts, fraud and brand misuse investigations and particularly cybersecurity.
Cybersecurity teams frequently use OSINT for OPSEC (operational security) by understanding what of their company’s information is publicly available. This information may be on assets they control that are designed to be public-facing or become so through error, or on assets outside the company perimeter, like social media or third-party websites that may accidentally leak information.
OSINT on the deep and dark web
The examples given are where companies may perform OSINT on the surface web (i.e., the internet most of us use every day). But OSINT can also be conducted on the deep or dark web.
The deep web is a layer below the surface web that requires login or subscription services. These sites can include academic journals, court record databases or even services like Netflix. OSINT can still be applied even to sites requiring login or subscription — as long as analysts can access the information legally, without hacking.
And, that extends to the dark web.
While the surface and deep web can be accessed by any common browser, the dark web requires specific software, like Tor (The Onion Router). Once inside, there’s lots of information that can be beneficial to threat intelligence gathering and other investigations.
If you’re using the dark web for OSINT, it’s important to remember:
- Paying for hacked/stolen items can qualify as OSINT, but there are lots of practical, ethical and legal considerations one should make before engaging in such a purchase (the DOJ CCIPS has good guidance here)
- Any website could introduce malicious code to your computer, but this is especially true on the dark web, where site owners often set boobytraps to track potential adversaries
- There is some anonymity to using the dark web, but there are still lots of details given to site owners about your identity — you’ll need to control your digital fingerprint
How is OSINT used in threat intelligence gathering?
As discussed above, OSINT is a valuable technique for OPSEC, but it can also be used to gather threat intelligence to proactively reduce cyber risks.
OSINT is used to analyze, monitor and track cyberthreats from targeted or indiscriminate attacks against an organization by malware and bad actors. There are typically one of two sources that trigger a cyber OSINT investigation:
- A flag or item of interest identified from a threat intelligence platform (TIP) or subscription service
- A new threat, vulnerability or data breach is identified from an OSINT news source
- A threat hunter identifying a potential advanced persistent threat (APT) within the network
In the case of an issue caught by a TIP, while the initial indicator is valuable, the level of detail and specificity to the organization often will require enrichment to understand how significant it is. Conducting OSINT across the surface, deep and dark web can enrich the indicator to understand urgency and scope. For example, a TIP may flag that email addresses and passwords are in a breach package or on a forum or dark web site. An analyst will want to go and see the full breach package to understand potential high-ranking targets for phishing attacks.
Additionally, the analyst can provide more detailed information regarding the breached information to include who may be impacted at their organization along with how the breach occurred for more amplifying information.
In the case where a threat hunter identifies an anomaly on the internal network, they need to understand if it’s malicious. This often requires a lot of research into current attacker tactics, techniques, procedures (TTPs). This may require researching and collecting info in areas where attackers reside like forums.
When it comes to the identification of a new threat or vulnerability that was reported by a news organization or cybersecurity news/research organization, there is the need for the analyst to confirm the reports. This is done by not only looking on the surface and deep web for additional reporting and details, but it may also include looking on the dark web for information on where this new threat or vulnerability will be conducted or has been conducted. This is where having the knowledge and ability to access the deep and dark web becomes important for a cyberthreat or cybersecurity analyst.
When searching for information on the surface web, the websites themselves hold several keys about who might be behind the content. (On the dark web, you won’t be so lucky as site operators and owners are anonymous.) These services provide user-friendly protocols for retrieving that information from the databases that house domain data.
Identifying site owners through WHOIS
WHOIS records provide top-level domain (the .com or .org root of the URL) information. This includes addresses, names and phone numbers used to register the domain, the date of registration and details about where it is hosted.
By combining WHOIS query and response protocols with additional search tools, investigators can uncover more information.
URLscan.io is a service that provides the end user with analysis of the IP address information and HTTP connections made during the site’s retrieval. The result panels include a top-level survey of what country the site is hosted in, what links are included on the main page and the IP location details. Details about how many subdomains it contains and what external links it contains can be found as well.
Through WHOIS analysis, hosting details can also be discovered. This can help lead investigators to find servers that host multiple sites or share webmasters, as well as valuable owner information.
DomainIQ operates similarly to URLscan.io and can provide identifying details about the site owner, host and what other pages they may be operating.
Utilizing advanced search engine techniques
By using advanced search engine techniques, we can search the identifying data from WHOIS records (such as emails, names, servers or IP addresses) and find additional clues or information that may be lurking on other sites.
Carbon Date uses the advanced search engine technique of “carbon dating” that analyzes a website and gives the earliest known creation date of the page. You can also view previous versions of the page, including the first known scrape through archive.org.
“Google Dorking'' is the process of using advanced search parameters on Google. There are several techniques that can be used ranging from simple to more advanced. Some of the most common Bolean logic search operators are using quotes to search for exact phrasing or the dash symbol (-) to exclude specific words. You can also use Google to search specific file types or recent caches of a specific site.
These techniques can be used to find identifying information about moderators or search a site for identifying pieces. It can also be used to string together sites sharing specific information.
Common Google Dorking techniques include:
- Intitle: identifies any mention of search text in the web page title
- Allintitle: only identifies pages with all of the search text in the web page title
- Inurl: identifies any mention of search text in the web page URL
- Intext: only identifies pages with all of the search text in the web page URL
- Site: limits results to the specified file type
- Filetype: limits results to only the specified file type
- Cache: shows the most recent cache of a site specified
- Around (X): searches for two different words within X words of one another
All of these tools can help investigate ownership and hosting information about the sites relevant to your research. Using WHOIS records and advanced search engine techniques can reveal identifying details on the host, moderator and IP, as well as what other sites might be sourced from the same owners.
Learn more about WHOIS records analysis, advanced search engine use and real-world examples of these techniques in action in our flash report, Investigating Site Ownership and History >
Top OSINT research tools
There are tons of tools available to aid OSINT for threat intelligence gathering, many of which are free to use. Below are some of our top go-to’s for conducting OSINT on the surface and dark web.
OSINT Framework: find free OSINT resources
WHAT IT IS
OSINT Framework indexes a multitude of connections to different URLs, recommending where to look next when conducting an investigation. It also provides suggestions on what services can help analysts find specific data that might aid in their research.
When you plug a piece of data (such as an email address, phone number, name, etc.) into the framework, it returns all known online sources that contain information relevant to that data. OSINT Framework also offers a list of potential resources where more information related to that particular source can be found.
Maltego Transform Hub: mine, merge and map information
WHAT IT IS
Integrate data from public sources, commercial vendors and internal sources via the Maltego Transform Hub.All data comes pre-packaged as Transforms, ready to be used in investigations. Maltego takes one artifact and finds more.
A user feeds Maltego domain names, IP addresses, domain records, URLs or emails. The service finds connections and relationships within the data and allows users to create graphs in an intuitive point- and-click logic.
Shodan: the search engine for the IoT
WHAT IT IS
Websites are just one part of the internet. Shodanallows analysts to discover which of their devices are connected to the internet, where they are located and who is using them.
Shodan helps researchers monitor all devices within their network that are directly accessible from the internet and therefore vulnerable to attacks.
ThreatMiner: IOC lookup and contextualization
WHAT IT IS
ThreatMiner is a threat intelligence portal designed to enable an analyst to research indicators of compromise (IOCs) under a single interface. That interface allows for not only looking up IOCs but also providing the analyst with contextual information. With this context, the IOC is not just a data point but a useful piece of information and potentially intelligence.
Identify and enrich indicators of compromise to have a better understanding of attack origins.
Torch search engine: explore the darknet
WHAT IT IS
Torch, or TorSearch, is a search engine designed to explore the hidden parts of the internet. Torch claims to have over a billion darknet pages indexed and allows users to browse the dark web uncensored and untracked.
Torch promises peace of mind to researchers who venture into the dark web to explore .onion sites. It also doesn't censor results — so investigators can find all types of information and join discussion forums to find out more about current malware, stolen data for sale or groups who might be planning a cyberattack.
Dark.fail: go deeper into the darknet
WHAT IT IS
Dark.fail has been crowned the new hidden wiki. It indexes every major darknet site and keeps track of all domains linked to a particular hidden service.
Tor admins rely on Dark.fail to disseminate links in the wake of takedowns of sites like DeepDotWeb. Researchers can use Dark.fail when exploring sites that correlate with the hidden service.
To learn more about Silo for Research, check out our experience Silo page!