Understand the differences between the surface, deep, and dark web. Learn how investigators can safely access and analyze data across all layers of the internet.
The internet consists of three layers: the surface web, the deep web, and the dark web. Each layer contains distinct types of content, visibility levels, and security risks that OSINT investigators must understand to conduct safe and effective online research.
The three layers of the internet
Now, let’s take a look at each layer of the internet. Here are those three layers at a glance:
- Surface web: Publicly accessible and indexed by search engines like Google
- Deep web: Hidden content not indexed, including databases and login-protected sites (think Netflix or academic journals)
- Dark web: Encrypted networks accessed via special browsers like Tor
The surface web
The surface web includes all web content accessible via search engines like Google, Yahoo, and Bing.
Search engines use web crawlers or spiders to scan web pages, index their content, and make them searchable. The spider is an automated program used by search engines to crawl the internet, discover new web pages and add them to their index database. Spiders analyze key text on the webpage and metadata information, such as meta titles and meta descriptions, to add them to a massive index database.
When a user queries a search engine using specific search keywords, the search engine will search within its index database and retrieve relevant web pages matching the user's query.
There are various search engines and online services for locating information on the surface web. The most popular ones include:
General purpose search engines:
Local or national search engines dedicated to searching within the websites of a particular country or language:
- Search: Switzerland
- Baidu: China
- Goo: Japan
- Google: Google search engine can be configured to return results in a particular language only. To apply this filter, follow these steps:
- On your computer, open Search settings.
- On the left, click Languages.
- Under Results Language Filter, click Edit.
- Select your preferred languages.
Google advanced search operators, or Google Dorks, can be leveraged to find hard-to-find content on the surface web. Artificial Intelligence, such as ChatGPT, can be used to create customized Google dorks that match our search needs quickly. DorkGPT is a free online service for generating Google dorks using AI
- Metasearch engines: These engines query multiple search engines and aggregate the results according to their relevance to user search queries. Meta searches reduce the time needed to query various search engines at once. MetaGer and Excite are two examples.
The deep web
The deep web is the most significant portion of the web. Some studies estimate its size to be around 96% of web content. The deep web contains all contents that conventional search engines cannot index or discover.
Many people confuse the terms "deep web" and "dark web" and use them interchangeably; however, they are each distinct. Content on the dark web is intentionally hidden for one reason or another, while content on the deep web is merely inaccessible.
Deep web content is rarely as nefarious as it sounds. To locate information on the deep web, a user needs to execute a search query or enter the exact URL of the online resources in the web browser address bar.
For example, when accessing your account on social media platforms or checking your online banking account, you are accessing deep web content. In addition, deep web contents include all the following:
- Website residing behind a paywall: media websites providing premium streaming services, such as Netflix, and commercial magazines that require a user's payment to access it, such as Janes magazine
- Grey information: includes academic papers, preprints, proceedings, conference and discussion papers, research reports, unpublished research papers, marketing reports, newsletters, technical specifications and standards, dissertations, theses, trade publications, memoranda, government reports, documents not published commercially, translations, newsletters, market surveys, or a draft version of books and articles
- Government databases: includes vital records (birth, marriage and death records), criminal and court records, property and tax records, voter election databases and immigration and customs records
- Private or closed online communities: includes discussion forums and closed Telegram groups
- Email services: includes Gmail and Outlook as well as messages on internet messaging and collaboration apps like Slack or Discord
- Cloud storage accounts: requires a login and private intranet
- Leaked information websites: for example, those which are specialized in storing leaked information, such as Pastebin and some file-sharing websites
- Any content that has been labeled not to be indexed by search engine crawlers by a developer using a site's robots.txt file
Like the surface web, deep web content does not require using any software program or particular configuration to access it. We can access its contents using web browsers over HTTP/HTTPS.
When searching within a website holding deep web content, a user typically uses its internal "search form" functionality to execute direct queries to find information buried in its database. Below are some popular deep websites and the type of information we can expect to find by searching them.
- Wayback Machine: use this service to find historical ( or previous) versions of websites
- Academia: contains 47 million PDF files in different research areas
- OpenCorporates: search more than 222 million company records worldwide
- Genealogy Bank: search genealogy records found in Newspapers
- MarineTraffic: tracks ships and maritime intelligence
- ASM: tracks trains across the U.S.
- Shodan: Internet of Things (IoT) search engine
- Annual reports: search 131,771 annual reports from 9,669 global companies
- Google Scholar: search for scholarly literature
- European Patent Office: search patents filed in Europe
- Google advanced patent search: search filed patents
Theoretically, all internet users access deep web content as a part of their daily internet usage routine. For instance, checking your email or accessing your social media account on Facebook, Twitter, or LinkedIn gives you access to deep web content.
The dark web
The dark web, or the darknet, is only a small segment of the internet that requires using specific software to access it. It is estimated that the darknet constitutes less than 1% of the entire web. Content on the darknet is explicitly hidden, and it is most often associated with contraband and illegal content.
Darknet has a bad reputation as being a place for criminals to exchange illegal products and services. This perception is due to the underground marketplaces there, such as the infamous Silk Road and AlphaBay. While the dark web does have many legitimate uses, such as bringing news to dissidents in authoritarian regimes, there are still many illegal and dangerous things that can be found on the darknet, due to its purported anonymity.
For law enforcement officials and other investigators, some commonly sold illicit items on the darknet include:
- Fake official documents, such as passports and driving licenses
- Firearms dealers
- Selling stolen credit cards and personal information
- Malware of sale and ready-to-launch cyberattacks, such as zero-days and Distributed Denial of Service (DDoS) and ransomware attacks
- Drugs
- Sex trafficking
- Terrorist organizations also use the darknet to recruit and as a secure communication channel away from authorities
Despite the bad reputation of the darknet, there are many parties that use it for legal purposes, such as:
- Journalists
- Whistleblowers
- Privacy enthusiasts who want to prevent external observers from capturing their online activities and circumvent censorship
- Political activists in oppressive regimes want to hide their online identity
The darknet is not one single network. For instance, many people may be familiar with The Onion Router, or Tor, network, but this is not the only dark web. There are several darknets, each requiring its software program or particular web browser configuration to access it. You can think of a darknet as an isolated network that resides somewhere online, and you need to configure your web browser or install a specific application to access it.
Here are the most well-known networks:
- Tor: This is the most widely known darknet network and could be the largest one. TOR uses the onion routing to conceal users IP addresses. The best method to access Tor is to use Tor Browser. This browser can be used to surf surface websites anonymously in addition to accessing Tor websites, also known as Tor services, and has the "onion" extension.
- The Invisible Internet Project (I2P): I2P is the second well-known darknet network. I2P allows its users to browse the surface web anonymously. They can also create websites (known as hidden services) and host online services on the I2P network without revealing their physical location. The hidden services are only accessible via the I2P network.
- Other darknets include Freenet and Zeronet
Tor is the most widely known among internet users and is relatively easy to access, with some security caveats in mind. The following are some onion services to start your search across the Tor network.
- Ahmia: Tor search engine
- TORCH: another Tor search engine that claims indexing 1.1 million pages
- OnionLand: Tor search engine
- TorDex: Tor search engine and directory
- The Hidden Wiki: a directory of popular Tor websites
- The Facebook Social Network on Tor
How to safely access the dark web
The web is not limited to what we find when using typical search engines; it is much broader. OSINT investigators should know the three layers composing the web and understand how each layer could be searched. This knowledge is essential for planning their search activities and knowing which online services to use within each web layer.
But accessing the dark web can come with significant risks to the researcher. While darknets attract users because they tout anonymity and privacy, tracking mechanisms still can and do occur. On Tor, in particular, the most popular of the darknets, traffic is diverted through multiple nodes to deliver multi-layered encryption to the users. But a weakness in the exit node presents a vulnerable unencrypted area that can expose investigators and present an attack surface to bad actors.
For researchers to safely access the dark and deep web, without compromising their hardware, network or personal safety, they should consider a managed attribution platform to protect anonymity and isolate their browser instance.
—> Learn more about accessing the dark web with managed attribution via Silo.
Surface, deep, and dark web FAQs
What is the difference between the surface, deep, and dark web?
The surface web is what anyone can find using Google. The deep web includes private or password-protected pages not indexed by search engines. The dark web exists on encrypted networks and requires special software, like Tor, to access. Each layer offers different visibility and security risks.
Is it legal to access the dark web?
Yes, accessing the dark web is legal in most countries, but engaging with illegal content or services hosted there is not. Investigators should use secure browsers and managed attribution tools to protect anonymity.
How can investigators safely browse the dark web?
Researchers should use isolated browser environments or managed attribution platforms (like Silo) that protect against malware, leaks, and exposure when exploring dark web content.
Why is the deep web important for OSINT investigations?
The deep web holds valuable non-indexed data—like government databases and academic resources—essential for investigative research and threat intelligence analysis.
Tags Anonymous research Cryptocurrency Dark web research Financial crime Fraud and brand misuse Phishing/malware Threat intelligence