Open-source intelligence (OSINT) techniques are invaluable to threat intelligence investigations. Find the tips, tools and shortcuts to improve your research.
What is OSINT?
“Open-source intelligence” doesn’t just refer to the accessibility of information. OSINT is the practice of collecting information from publicly available sources.
OSINT grew out of spycraft as it shifted away from clandestine methods of information gathering (think phone tapping, tails) and toward scouring publicly available information like newspapers and files or databases open to the public.
With the advent of the internet, vastly more information became publicly available and OSINT became increasingly useful not just to sophisticated government agencies and law enforcement, but to financial crime analysts, fraud and brand misuse investigations and particularly cybersecurity.
Cybersecurity teams frequently use OSINT for OPSEC (operational security) by understanding what of their company’s information is publicly available. This information may be on assets they control that are designed to be public-facing or become so through error, or on assets outside the company perimeter, like social media or third-party websites that may accidentally leak information.
OSINT on the deep and dark web
The examples given are where companies may perform OSINT on the surface web (i.e., the internet most of us use every day). But OSINT can also be conducted on the deep or dark web.
The deep web is a layer below the surface web that requires login or subscription services. These sites can include academic journals, court record databases or even services like Netflix. OSINT can still be applied even to sites requiring login or subscription — as long as analysts can access the information legally, without hacking.
And, that extends to the dark web.
While the surface and deep web can be accessed by any common browser, the dark web requires specific software, like Tor (The Onion Router). Once inside, there’s lots of information that can be beneficial to threat intelligence gathering and other investigations.
If you’re using the dark web for OSINT, it’s important to remember:
- Paying for hacked/stolen items can qualify as OSINT, but there are lots of practical, ethical and legal considerations one should make before engaging in such a purchase
- Any website could introduce malicious code to your computer, but this is especially true on the dark web, where site owners often set boobytraps to track potential adversaries
- There is some anonymity to using the dark web, but there are still lots of details given to site owners about your identity — you’ll need to control your digital fingerprint
Learn more: 3 things to consider before you start your dark web investigation >
How is OSINT used in threat intelligence gathering?
OSINT is a valuable technique for OPSEC, but it can also be used to gather threat intelligence to proactively reduce cyber risks.
OSINT is used to analyze, monitor and track cyberthreats from targeted or indiscriminate attacks against an organization by malware and bad actors. There are typically one of two sources that trigger a cyber OSINT investigation:
- A flag or item of interest identified from a threat intelligence platform (TIP) or subscription service
- A new threat, vulnerability or data breach is identified from an OSINT news source
- A threat hunter identifying a potential advanced persistent threat (APT) within the network
In the case of an issue caught by a TIP, while the initial indicator is valuable, the level of detail and specificity to the organization often will require enrichment to understand how significant it is. Conducting OSINT across the surface, deep and dark web can enrich the indicator to understand urgency and scope. For example, a TIP may flag that email addresses and passwords are in a breach package or on a forum or dark web site. An analyst will want to go and see the full breach package to understand potential high-ranking targets for phishing attacks.
Additionally, the analyst can provide more detailed information regarding the breached information to include who may be impacted at their organization along with how the breach occurred for more amplifying information.
In the case where a threat hunter identifies an anomaly on the internal network, they need to understand if it’s malicious. This often requires a lot of research into current attacker tactics, techniques, procedures (TTPs). This may require researching and collecting info in areas where attackers reside like forums.
When it comes to the identification of a new threat or vulnerability that was reported by a news organization or cybersecurity news/research organization, there is the need for the analyst to confirm the reports. This is done by not only looking on the surface and deep web for additional reporting and details, but it may also include looking on the dark web for information on where this new threat or vulnerability will be conducted or has been conducted. This is where having the knowledge and ability to access the deep and dark web becomes important for a cyberthreat or cybersecurity analyst.
OSINT techniques
When searching for information on the surface web, the websites themselves hold several keys about who might be behind the content. (On the dark web, you won’t be so lucky as site operators and owners are anonymous.) These services provide user-friendly protocols for retrieving that information from the databases that house domain data.
Learn more: Essential tools for improve surface and dark web research >
Identifying site owners through WHOIS
WHOIS records provide top-level domain (the .com or .org root of the URL) information. This includes addresses, names and phone numbers used to register the domain, the date of registration and details about where it is hosted.
By combining WHOIS query and response protocols with additional search tools, investigators can uncover more information.
URLscan.io
URLscan.io is a service that provides the end user with analysis of the IP address information and HTTP connections made during the site’s retrieval. The result panels include a top-level survey of what country the site is hosted in, what links are included on the main page and the IP location details. Details about how many subdomains it contains and what external links it contains can be found as well.
Through WHOIS analysis, hosting details can also be discovered. This can help lead investigators to find servers that host multiple sites or share webmasters, as well as valuable owner information.
DomainIQ
DomainIQ operates similarly to URLscan.io and can provide identifying details about the site owner, host and what other pages they may be operating.
Utilizing advanced search engine techniques
By using advanced search engine techniques, we can search the identifying data from WHOIS records (such as emails, names, servers or IP addresses) and find additional clues or information that may be lurking on other sites.
Carbon Date
Carbon Date uses the advanced search engine technique of “carbon dating” that analyzes a website and gives the earliest known creation date of the page. You can also view previous versions of the page, including the first known scrape through archive.org.
Google Dorking
“Google Dorking'' is the process of using advanced search parameters on Google. There are several techniques that can be used ranging from simple to more advanced. Some of the most common Bolean logic search operators are using quotes to search for exact phrasing or the dash symbol (-) to exclude specific words. You can also use Google to search specific file types or recent caches of a specific site.
These techniques can be used to find identifying information about moderators or search a site for identifying pieces. It can also be used to string together sites sharing specific information.
Common Google Dorking techniques include:
- Intitle: identifies any mention of search text in the web page title
- Allintitle: only identifies pages with all of the search text in the web page title
- Inurl: identifies any mention of search text in the web page URL
- Intext: only identifies pages with all of the search text in the web page URL
- Site: limits results to the specified file type
- Filetype: limits results to only the specified file type
- Cache: shows the most recent cache of a site specified
- Around (X): searches for two different words within X words of one another
All of these tools can help investigate ownership and hosting information about the sites relevant to your research. Using WHOIS records and advanced search engine techniques can reveal identifying details on the host, moderator and IP, as well as what other sites might be sourced from the same owners.
Learn more about WHOIS records analysis, advanced search engine use and real-world examples of these techniques in action in our flash report, Investigating Site Ownership and History >
Top OSINT research tools
There are tons of tools available to aid OSINT for threat intelligence gathering, many of which are free to use. Below are some of our top go-to’s for conducting OSINT on the surface and dark web.
Learn more: 21 OSINT research tools for threat intelligence investigations >
ExploitAlert: Stay on top of exploits
WHAT IT IS
ExploitAlert is a site that archives exploits and ways to address them. The site provides historical exploit data going back to 2005.
USE CASE
Users search for exploits on the site and find available patches, mitigation measures, etc. Or users can integrate the API into their software to monitor exploits in real time.
ThreatMiner: IOC lookup and contextualization
WHAT IT IS
ThreatMiner is a threat intelligence portal designed to enable an analyst to research indicators of compromise (IOCs) under a single interface. That interface allows for not only looking up IOCs but also providing the analyst with contextual information. With this context, the IOC is not just a data point but a useful piece of information and potentially intelligence.
USE CASE
Identify and enrich indicators of compromise to have a better understanding of attack origins.
GreyNoise: Cut through the noise and eliminate false positives
WHAT IT IS
GreyNoise Intelligence scours the internet for the IPs behind scan and attack traffic. During potential exploitation events, data from GreyNoise helps classify IP intent and accelerates alert triage.
USE CASE
Researchers can use the API to integrate GreyNoise with common security products to look up IPs quickly and sift through security alerts. It also offers a visualizer to give the full context behind scanner IP addresses under investigation, including: JA3 and HASSH fingerprints, web path data, malicious/benign classifications, “spoofability,” behavior tags and ports and protocols scanned.
Censys: Identify exposures attackers can exploit
WHAT IT IS
Censys is a platform that finds all devices connected to the internet (over 4 billion) and evaluates them every day, adding new IPs and removing old ones.
USE CASE
Security professionals can query hosts and certificates through Censys’ web interface or API to identify exposures that attackers are likely to exploit. Censys also provides the historical information to know when an infrastructure was weaponized.
AttackerKB: Focus your efforts with
WHAT IT IS
AttackerKB is a web portal that crowdsources vulnerability assessments and threat details. Given the high volume of incoming vulnerabilities, this forum provides critical information so that defenders can triage threats and make informed decisions.
USE CASE
Researchers get insights on which vulnerabilities they should address and mitigate, and which ones have a longer shelf life or are irrelevant to their business. AttackerKB shares real-time discussion about vulnerabilities that helps users cut through the chaos and noise of exploit alerts.
VirusTotal: Analyze suspicious files and URLs
WHAT IT IS
VirusTotal inspects items with over 70 antivirus scanners and URL/domain blacklisting services. Scanning reports produced by VirusTotal are shared with the public to raise the global IT security level and awareness about potentially harmful content.
USE CASE
Users can select a file from their computer using their browser and send it to VirusTotal. Results are shared with the submitter, and also between the examining partners, who use this data to improve their own systems.
Microsoft Defender Threat Intelligence: Uncover adversaries
WHAT IT IS
Microsoft Defender Threat Intelligence (formerly RiskIQ) is a threat hunting and investigation platform that streamlines triage, incident response, vulnerability management and cyberthreat intelligence analyst workflows.
USE CASE
Users can detect and respond to threats, prioritize incidents and proactively identify adversaries’ infrastructure associated with actor groups that can potentially target their organization.
DomainTools: Get the details on any domain
WHAT IT IS
DomainTools has tracked domain history since 1995 and makes it available to the public with an easy search tool.
USE CASE
Using the “Whois Lookup,” researchers can enter a domain name or IP address and get information about the registrant, registrar, dates (created, updated, expires), name servers, IP location, IP history and more.
DNSdumpster: Find and lookup DNS records
WHAT IT IS
DNSdumpster is a domain research tool that can discover hosts related to a domain. Finding visible hosts from the attackers’ perspective is an important part of the security assessment process.
USE CASE
After a user enters a domain name, DNSdumpster identifies and displays all associated subdomains, helping map an organization’s entire attack surface based on DNS records.
Shodan: The search engine for the IoT
WHAT IT IS
Websites are just one part of the internet. Shodan allows analysts to discover which of their devices are connected to the internet, where they are located and who is using them.
USE CASE
Shodan helps researchers monitor all devices within their network that are directly accessible from the Internet, and therefore vulnerable to attacks.
OSINT Framework: Find more OSINT resources
WHAT IT IS
OSINT Framework indexes a multitude of connections to different URLs, recommending where to look next when conducting an investigation. It also provides suggestions on what services can help analysts find specific data that might aid in their research.
USE CASE
When you plug a piece of data (such as an email address, phone number, name, etc.) into the framework, it returns all known online sources that contain information relevant to that data. The OSINT Framework also offers a list of potential resources where more information related to that particular source can be found.
To learn more OSINT tips for threat intelligence and other use cases, click here.
Tags Dark web research OSINT research Threat intelligence