Insidethe Dark Web
Insidethe Dark Web
Introduction
The Deep Web is any Internet content that, for various reasons, cannot be or
is not indexed by search engines like Google. This definition thus includes
dynamic web pages, blocked sites (like those where you need to answer a
CAPTCHA to access), unlinked sites, private sites (like those that require
login credentials), non-HTML/contextual/scripted content, and limited-access
networks.
Limited-access networks cover sites with domain names that have been
registered on Domain Name System (DNS) roots that are not managed by the
Internet Corporation for Assigned Names and Numbers (ICANN), like .BIT
domains, sites that are running on standard DNS but have non-standard top-
level domains, and finally, darknets. Darknets are sites hosted on
infrastructure that requires specific software like Tor before it can be
accessed. Much of the public interest in the Deep Web lies in the activities
that happen inside darknets.
There are many other reasons apart from buying drugs why people would
want to remain anonymous, or to set up sites that could not be traced back to
a physical location or entity. People who want to shield their communications
from government surveillance may require the cover of darknets.
Whistleblowers may want to share vast amounts of insider information to
journalists but do not want the paper trail. Dissidents in restrictive regimes
may need anonymity in order to safely let the world know what is happening
in their country.
But on the other side of the coin, people who want to plot an assassination
versus a high-profile target will want a method that is guaranteed to be
untraceable. Other illegal services such as the selling of documents like
passports and credit cards will also require an infrastructure that will
guarantee anonymity. The same could be said for people who leak other
people’s personal information like addresses and contact details.
1. A Data Collection module, responsible for finding and storing new URLs
from multiple sources
2. A Universal Gatewa, which allows to access the hidden resources in
darknets like TOR and I2P, and to resolve custom DNS addresses
3. A Page Scouting module, responsible for crawling the new URLs collected
4. A Data Enrichment module that takes care of integrating the scouted
information with other sources
5. A Storage and Indexing module, which make the data available for further
analysis
6. Visualization and analytic tools
System Overview
Data Collection
The first DeWA module consists on a data collection module, whereas data
consists of fresh URLs related to either:
without installing TOR, and keep publicly available statistics about what
domains are accessed the most on a daily basis;
Page scouting
For every collected URL, we perform what we call “scouting”, i.e. we try to
connect to the URL and save the response data. In case of error, the full error
message is stored, to understand if the connection failed due to domain
resolution error, server-side error, transport error, etc. In case of HTTP errors,
the full HTTP headers are stored, a practice that has already proven to be
successful to identify malware related hosts, who are known to answer only
to specific type of HTTP requests and will fail otherwise.
• We perform the full rendering of the page’s DOM (in order to get dynamic
javascript pages out of the way);
• We take a page’s screenshot;
• We compute the page’s size and md5;
• We extract the page’s metadata: title, meta tags, resources, keywords;
• We extract the text stripped of all the HTML;
• We extract all the links from the page;
• We collect the email addresses found in the page.
• The extracted URLs are “back-fed” to the data collection module and
indexed as an additional data source.
Data Enrichment
Data enrichment of the scouted data consists, for every successfully scouted
page, of the following operations:
1. The page text is tokenized in its individual words and the number of
occurrences for each word;
2. Words are filtered, only substantives are kept while other elements such as
verbs, adjectives etc. are discarded. Substantives are normalized, so to keep
only the singular form;
3. The semantic distance matrix is computed: this is a matrix containing how
“close” each word is to each other, using a so-called WordNet metric. The
WordNet metric works by measuring the taxonomical distance of every word
in the general language. As an example, words like “baseball” and
“basketball” will score fairly close to one another since both are “sports”. The
same way, “dog” and “cat” will be considered close since they are both
“animals”. On the other hand, “dog” and “baseball” will be considered pretty
far from each other;
4. Once we have the distance of every word pair, words are clustered together
starting from the closest one in increasing distance. We create this way
groups of words with similar meaning;
5. Clusters are labeled using the first word in alphabetical order as label, and
scored summing up the occurrences of every word in the cluster;
6. Using the labels and scores of the top 20 clusters, a WordCoud is
generated and drawn. This allows an analyst for a quick glance around the
main topics of a page.
UI and visualization
In order to access and manipulate the data, we rely on three different front-
end systems:
The following figure shows the language popularity according to the number
of domains containing pages in said language. In computing the statistics we
have filtered out pages smaller than 1kb (since they would not bear enough
data to perform a reliable detection) and all pages classifieds as "unknown")
In terms of raw number of domains (who, unless in case of a page hosting
provider like a "Geocities in the Deep Web" could be, almost always
correspond to the actual number of different sites) we see that English is the
language of choice here, with more than 75% domains. Second for variety
comes Russian, followed by French (which might include, of course, both
French and French Canadian sites).
In the next example, we have grouped 2-years of data according to the URLs’
scheme (e.g. http, https, ftp…). Of all the collected domains, almost 22.000
are (predictably) associated to http(s) protocol, being data hosting the
principal activity. But if we filter out those domains, the remainder shows
some interesting data, as portrayed by figure:
More than 100 domains are in fact hosting IRC(S): these are normally chat
servers that can either be used as a rendezvous point for malicious actors to
trade goods, or as a communication channel for botnets. Same concept
applies to the 7 XMPP domains (i.e., Jabber-like IMs), representing another
protocol for chat servers running in TOR.
We can’t vouch for the authenticity of the goods and services discussed here,
only for the fact that the sites advertising them do exist. We weren’t able to
cover all of the possible goods and services offered, but included several of
the major categories that should give a clear idea of the nature of transaction
that goes on in the deep web.
As is the case on the Clear Web, prices vary a lot among different sites – but
more mature offerings (such as stolen Paypal accounts below) will tend to
reach a generally accept pricing norm. Accounts such as these are sold in one
of two ways – either as “high quality”, verified accounts – where the exact
current balance is known; or as bulk amounts of unverified accounts – but
normally with a guarantee that at least a certain percentage will be valid. The
first of these two categories can normally be seen as a higher cost item, but
with greater likelihood of return of investment for a buyer – where as the
bulk account sales will be significantly cheaper.
Unverified accounts sold in bulk – 80% valid or replacement offered
http://3dbr5t4pygahedms.onion/
One offering that can be found quite readily on the Deep Web that is more
unusual to find on the Clear Web is actual physical credit cards being sold.
That is not to say these do not exist on the Clear Web criminal forums – they
most certainly do – however the sites on the Deep Web seem a bit more
professional in their approach.
Replica credit cards created with stolen details
http://ccccrckysxxm6avu.onion/
References:
[1] http://www.trendmicro.com/cloud-content/us/pdfs/security-
intelligence/white-papers/wp-russianunderground-101.pdf
[2] http://www.trendmicro.com/cloud-content/us/pdfs/security-
intelligence/white-papers/wp-russianunderground-revisited.pdf
[3]http://www.trendmicro.com/cloud-content/us/pdfs/security-
intelligence/white-papers/wp-thechinese-underground-in-2013.pdf
[4] http://paypal4ecnf7eyqa.onion - Stolen Paypal accounts
[5] http://3dbr5t4pygahedms.onion/ - Unverified stolen accounts
[6] http://ccccrckysxxm6avu.onion/ - Replica stolen credit cards
Assassination Services
Perhaps one of the most worrying services on the Deep Web – and definitely
one that would be very foolish to advertise on the Clear Web – is the service
of Hitman for Hire, or Assassination. Several such services exist on the Deep
Web. Even the sites themselves acknowledge the highly secret nature of how
they have to conduct their business – one site clearly states that as all
contracts are private they cannot offer proof of past work, give feedback from
previous clients or show any other proof of past success. Instead they ask the
person to prove upfront that they have enough Bitcoin available for the job by
placing the bitcoin with a reputable (by criminal standards) escrow service.
Only when the hitman has carried out the assassination and provided proof,
the funds be released.
C’thulu Resume – Assassination Services for Hire
http://cthulhuuap7ch47k.onion
As can be seen in the screenshot above, pricing varies based on the manner of
death or injury, but also by the status of the target. In fact Ross Ulbricht, the
man recently convicted of running the infamous Silk Road forum for illegal
drugs, attempted or order 5 assassinations of partners and others that he had
fallen out with [1].
A different take on such services, and one that we hope if not actually meant
as a real service is “crowdsourced assassination”. One site, Deadpool,
operates by users putting forward potential targets. Others can then contribute
funds via bitcoin to the “dead pool”. Assassins can then anonymously
“predict” when and how the person will die. If the person does actually die,
all the predictions are revealed and if there is an exact match – the assassin
who put it forward will claim the money. To date 4 names have been put
forward, but not money has been entered into the pools – making us believe
that this is a hoax site.
Deadpool – Crowd Sourced Assassination
http://deadpool4x4a25ys.onion
References:
[1] http://www.wired.com/2015/02/read-transcript-silk-roads-boss-ordering-
5-assassinations/
[2] http://cthulhuuap7ch47k.onion/ - Contract Killers (C’thulu Resume)
[3] http://deadpool4x4a25ys.onion/ - Crowdsourced assassination
In the case of a site like WeBuyBitcoins, they offer to exchange real cash for
Bitcoins at a competitive exchange rate compared to equivalent non-
anonymous services that exist in the Clear Web. However for criminals
willing to take on more risk for potentially more reward, another option is
available – buying counterfeit currency using Bitcoin.
Buying counterfeit 20 USD for approximately half the price of face value
http://usjudr3c6ez6tesi.onion
References:
[1] Bitcoin used to by a Tesla Model S http://www.wired.com/2013/12/tesla-
bitcoin/
[2] http://easycoinsayj7p5l.onion – EasyCoin – Bitcoin Wallet with free
Bitcoin Mixer / Laundery
[3] http://ow24et3tetp6tvmk.onion – OnionWallet – Bitcoin Wallet with free
Bitcoin Mixer / Laundery
[4] http://jzn5w5pac26sqef4.onion – WeBuyBitcoins – Sell Bitcoins for Cash
(USD), ACH, WU/MG, LR, PayPal and others
[5] http://usjudr3c6ez6tesi.onion - Counterfeit $20 USD / Euro Bills
[6] http://y3fpieiezy2sin4a.onion/ - Counterfeit $50 Euro Bills
[7] http://qkj4drtgvpm7eecl.onion/ - Counterfeit $50 USD Bills
Cloudnine Doxing site – note it requests SSN, medical & financial info and
more http://cloudninetve7kme.onion
It’s very hard to know if these details are actually correct or not – but in
many cases the supplied leaked details include DOB, SSN, personal email
addresses, phone numbers, physical addresses and more. For example one
site, Cloud Nine, lists possible “dox” for public figures such as:
- Several FBI agents
- Political figures like Bill & Hillary Clinton, Barack & Michelle Obama,
Sarah Palin, US Senators and others
- Celebrities such as Angelina Jolie, Bill Gates, Tom Cruise, Lady Gaga,
Beyonce, Dennis Rodman and more
Drugs
As we mentioned, it is common for just about every report on the Deep Web
to talk about how freely available illegal drugs, and weapons, are. In this
report we do not intend to go into major detail on this – as it has been covered
by others. But we did want to briefly highlight that fact that even after the
conviction of individuals like Ross Ulbricht – who was recently sentenced [1]
to life with no chance of parole for running the infamous drugs forum “The
Silk Road” – procuring drugs on the Deep Web is still relatively trivial.
The availability of illegal narcotics varies a lot on the Deep Web, with sites
selling everything from the relatively tame (such as contraband Tobacco[2]),
to Cannabis[3], Psychedelics[4], Cocaine and so on.
The
Peoples Drug Store – selling Heroin, Cocaine, Ectasy and more
http://newpdsuslmzqazvr.onion
Grams – the Deepwebs search engine for drug
http://grams7enufi7jmdl.onion
We’ve even found TOR sites that offer live information of an active Cannabis
grow house – showing live stats for temperature, moisture and a live camera
showing the plants growing over time.
References:
[1] http://www.forbes.com/sites/katevinton/2015/05/29/ulbricht-sentencing-
silk-road/
[2] http://cigs7cviqbi4bvuy.onion/ - Contraband Tobacco
[3] http://smoker32pk4qt3mx.onion - Cannabis
[4] http://ll6lardicrvrljvq.onion - Psychedelics
[5] http://newpdsuslmzqazvr.onion - Heroin, Cocaine and others
[6] http://grams7enufi7jmdl.onion - Grams – Deep Web drug search engine
[7] http://growboxoo2uacpkh.onion/ - Live feed from a Cannabis Growhouse
[8] http://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-
digital-threats/the-deep-webanonymizing-technology-good-and-bad - Expert
Insight video Series – The Deep Web
Malware
In many ways, the Deep Web and malware are perfectly suited for each other,
especially when it comes to hosting command-and-control (C&C)
infrastructure. It is the nature of hidden services and sites like TOR and I2P
to hide the location of servers using strong cryptography. This makes very
difficult for forensic researchers to investigate using traditional means like
examining a server’s IP address, checking registration details, and so on. In
addition, using these sites and services isn’t particularly difficult. It is then
not surprising to see a number of cybercriminals use TOR for C&C. We’ve
seen the operators behind prevalent malware families use TOR for some parts
of their setup. They simply bundle the legitimate TOR client with their
installation package. Trend Micro first wrote about this trend back in 2013
when MEVADE malware caused a noticeable spike in TOR traffic when they
switched to TOR-hidden services for C&C. Other malware families like
ZBOT followed suit in 2014.
Based on the presence of this favicon.ico file and the web-server setup of the
C&C (many of which run openresty/1.7.2.1), we are able to search in our
system for complete lists of such sites and download the latest C&C each
day.
Example of fetched HTTP headers from C&Cs
Identified TOR-based C&Cs (1)
Identified TOR-based C&Cs (2)
Another major malware family that uses the Deep Web is CryptoLocker.
CryptoLocker refers to a ransomware variant that encrypts victims’ personal
documents before redirecting them to a site where they can pay to regain
access to their files. CryptoLocker is also smart enough to automatically
adjust the payment page to account for a victim’s local language and payment
means. TorrentLocker—a CryptoLocker variant—makes use of TOR to host
payment sites in addition to employing Bitcoin as form of payment. It shows
why the Deep Web appeals to cybercriminals who are willing to make their
infrastructures more robust to possible takedowns. The following screenshots
are payment pages that the Deep Web Analyzer captured. Both are rendered
in different languages, giving us an idea of their intended victims and origin.
Cryptolocker C&C automatically formatted for a victim in Taiwan and Italy
http://ndvgtf27xkhdvezr.onion
Breakdown by Victims and Countries
The following example is related to malware that steal confidential
information. In our search methodology, we look for prevalent query-string’s
parameters in a short and recent time window – allowing us to identify new
threats as soon as they appear in the Deep Web.
[REDACTED]2xx.onion:80/si.php?xd={“f155”:”MACHINE
IP”,”f4336”:”MACHINE
NAME”,”f7035”:”5.9.1.1”,”f1121”:”windows”,”f6463”:””,”f2015”:”1”}
By counting the queries associated with the registration, we were able to
build a profile of the number of new victims per day, along with the amount
of data leaked.
Automated Analysis on Prevalent Query-String Parameters
Number of new Infections (and Leaked data, in bytes) per day.
Finally, worth to mention is a banking Trojan called Dyre that uses I2P as
backup options for its C&C infrastructure – normally ran using DGA on the
Clear Web. This malware acts as a BHO that MiTMs onlinebanking pages at
browser-level. This allows the code to back-connect from the victim to the
attacker (similar to a reverse-shell approach) with the goal of granting the
attacker the access to the banking portal of its victims. Accordingly to
DeWA, this malware campaign introduced, over the last 6 month, 2 new
operating servers and currently the number of infected victims using I2P is
increased.
Traffic to Dyre’s I2P infrastructure.
References:
[1] http://blog.trendmicro.com/trendlabs-security-intelligence/the-mysterious-
mevade-malware/
[2] http://blog.trendmicro.com/trendlabs-security-intelligence/defending-
against-tor-using-malwarepart-1/
[3] http://blog.trendmicro.com/trendlabs-security-intelligence/defending-
against-tor-using-malwarepart-2/
[4] http://blog.trendmicro.com/trendlabs-security-intelligence/steganography-
and-malware-why-andhow/
[5] http://4bpthx5z4e7n6gnb.onion/favicon.ico - Vawtrak / Neverquest C&C
[6] http://ndvgtf27xkhdvezr.onion - Cryptolocker C&C