Trawling Tor Hidden Service – Mapping the DHT

Update 2013-08-15: I have been really enthused by reactions I received to this blog post. It has been referenced from Forbes, Gawker and the Daily Mail and a number of people have been in contact about tracking the DHT for themselves. I would recommend the IEEE S&P paper, “Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization” which presents the same issues allowing the DHT to be trawled. They also present some very serious attacks allowing an adversary to locate hidden services with practical resources. It is well worth checking out if you have an interest in Tor.

Tor hidden services have got more media attention lately as a result of some notorious sites like the Silk Road marketplace, an online black market. On a basic level, Tor hidden services allow you to make TCP services available while keeping your server’s physical location hidden via the Tor anonymity network.

TL;DR#

Tor hidden service directories (HSDir’s) receive a subset of hidden service look-ups from users, allowing them to map relative popularity/usage of hidden service.
An adversary with minimal resources can carry out complete DoS attacks of Tor hidden services by running malicious Tor hidden service directories and positioning them in a particular part of the router list.
Many look-ups for Tor hidden services go to the incorrect hidden service directories which negatively affects the initial time to access the site.
Hidden services such as are popular, sites such as the Silk Road marketplace receive more than 60,000 unique user sessions a day.

Introduction#

For users to access a hidden service they must first retrieve a hidden service descriptor. This is a short signed message created by the hidden service approximately every hour contain a list of introduction nodes and some other identifying data such as the descriptor id (desc id). The desc ID is based on a hash of some hidden service information and it changes every 24 hours. This calculation is outlined later in the post. The hidden service then publishes its updated descriptor to a set of 6 responsible hidden service directories (HSDir’s) every hour. These responsible HSDir’s are regular node on the Tor network which have up-time longer than 24 hours and which have received the HSDir flag from the directory authorities. The set of responsible HSDir’s is based on their position of the current descriptor id in a list of all current HSDir’s ordered by their node fingerprint. This is an implementation of a simple DHT (Distributed Hash Table).

A client who would like to retrieve the full HS descriptor will calculate the time based descriptor id and will request it from the responsible HSDir’s directly.

Descriptor is published to a set of HSDir’s

Once the client has a copy of the hidden service descriptor they can attempt to connect to one the introduction point and create a complete 7 hop circuit to the hidden service.

This is a very brief explanation which the Tor Project has outlined much more clearly on their hidden service protocol page but it should provide enough information to understand the following information.

The Problems

Any one can set up a Tor node (HSDir) and begin logging all hidden service descriptors published to their node. They will also receive all client requests allowing them to observe the number of look-ups for particular hidden services.
The list of responsible HSDir’s for a hidden service is based on the calculated descriptor ID and an ordered list of HSDir fingerprints. As the descriptor ID’s are predictable, and the node fingerprint is controlled by an adversary. The can position themselves to be the responsible directories for a targeted hidden service and subsequently perform a DoS attack by not returning the hidden service descriptor to clients. There is nothing the hidden service can do about this attack. They must rely on the Tor Project authority directories to remove the malicious HSDir’s from the consensus.

Information Gathering from the DHT#

Hidden services must publish their descriptors to allow clients to reach them. These descriptor id’s are deterministic but over time any HSDir should have an equal chance of receiving the descriptor for any particular hidden service as part of the DHT. As the descriptor is replicated across 6 HSDir’s, any single responsible HSDir should receive up to 1/6 of all client look-ups for that hidden service providing a good means of estimating hidden service popularity.

This data can be made more accurate by running more Tor HSDir’s which have identity digest‘s in the responsible range, to the point where all the responsible HSDir’s are controlled by the observer.

There are currently about 1400 nodes with the ‘HSDir’ flag running. For the past 2 months I have run 4 Tor nodes and logged all hidden service requests that they have received. Each hidden service publishes to 6 HSDir’s every day, and will publish to 360 or approximately 25% of HSDir’s in a 2 month period. As I am running 4 nodes I statistically should have received a copy of every hidden service that was online for those 60 days. While the distribution will not be perfect, my data should contain a representative set of hidden service activity. Here are some stats from those 4 nodes:

Received 1.3 million requests for 3815 unique hidden services my nodes were responsible for.
Got 16.03 million requests for descriptors my nodes were not currently responsible for.
Received published descriptors for 40,500 unique hidden services
Received requests from clients for 25,600 unique descriptor id’s.

Even these basic stats point out a number of issues with the way hidden services currently function. 16.03 million or 92.5% of all descriptor requests my nodes received were for hidden services I did not currently have descriptors for. This could be a result of the clients having an out-of-date network consensus and not choosing the correct HSDir’s as responsible. It could also be a result of an out of sync time which causes the clients to look for the wrong/old descriptor id’s. Either way, a significant amount of time is being wasted on descriptor look-up which slows down the time when first accessing a hidden service.

Some More Stats..#

The following table contains data on requests my nodes received for some well known Tor based marketplaces. There are also requests for related phishing sites. I have confirmed that some users were directed to these phishing pages from links on the “The Hidden Wiki” (.onion). The number of requests is what my nodes observed. Descriptor look-ups from clients will be divided at random between the 6 responsible HSDir’s and clients will keep trying the remaining HSDir’s until the descriptor is found or all HSDir’s have been checked. As a result the total number of requests received by the following sites per day may be up to 6 times more than the figures below, but these still offer a relative guide to popularity.

Onion Address    Descriptor ID                    Requests
----------------------------------------------------------------
silkroadvb5piz3r cjzls3i2mbj4hjnquqmuvznihues4xh4 16387
silkroadvb5piz3r m6yz6gqrmu35twduuiixzr2mqtxdo3er 10891
5onwnspjvuk7cwvk 6t44eim223ypmb2ueokcsfco5vzvryfm 1413
silkroadvb5piz3r hadco5o7rmh2vcamg7mdzqklprqffyyh 558
silkroadxmx45vk4 6tyqo2bf7xclfbmrtrxwm7mgb3z4s5ui 197
atlantisrky4es5q hdj7wkuaigt7iicqf77gyzbo7zyvq7wf 165
atlantisrky4es5q m6y4s2utv4kxgdczv7t3gbmoloezblzf 161
atlantisrky4es5q 6r3z4tlr2vvl5z34v5lcuaqckgjvtr7s 129
silkroadxmx45vk4 m6eczdbpjse3jdfw54cv4nxcc6s34eku 107
sheep5u64fi457aw ci7dpz6emlh2pxpshovnxjbyjqnp3luo 59
silkrovafuce2ur2 6r6w7ncln5mo4hdwq6yh7f2hf3hlag2u 14
sheep5u64fi457aw rzq5pd4ehayz4yker7dhmvpubm4we7om 9
silkroadopn752dl cja5ppzzmkrzvr2pgp2c2z6mtuc7m7yk 4
silkroadr5cd6wbz m4n2afulpiln4n42l7wlg36wy6okkqrh 3
silkroadfqmteec4 cjcyh7mm6lzzuqka3vlq67upyt5zwd7b 2

Phishing site with a directory listing via forced browsing

Text file containing phished credentials

It seems a lot of people, especially scammer’s running phishing sites on Tor hidden services don’t know a lot about web security which leads to sites like this.

Please users, if you enter your credentials on a phishing page, and it doesn’t log you into the site. Don’t try again!

I also observed a large number of requests to the command and control servers of the “Skynet” Tor based botnet which got attention after an AMA with the botnet owner on Reddit. Contact him @skynetbnet on Twitter for more info.

Onion Address     Descriptor ID                    Requests
----------------------------------------------------------------
gpt2u5hhaqvmnwhr  m5t2jamzi4fht3hicqadzd3rkl57lyjj 9792
x3wyzqg6cfbqrwht  m5doen5pidde5wshaormqh6l2c4bljd5 7162
gpt2u5hhaqvmnwhr  hdjeqyqaq344rbb6vxndliobueh3v2u5 6641
4bx2tfgsctov65ch  6twxygfbtb2haivmixjqx5ag35dg72tk 6485
owbm3sjqdnndmydf  hd2j5xswo5ddvxpa2rahkg24vwhqo77y 6471
niazgxzlrbpevgvq  m63z25bfydkc6nfko4b4kz44u3jsh67k 6334
6ceyqong6nxy7hwp  m6czfa7ra6qvrfvbmg6zlsimi4braizy 2691
6tkpktox73usm5vq  6ruzifokbb6ez2qbnion7deylz4jjmq3 1827
uzvyltfdj37rhqfy  6tvqgoeu4piyu3x6dsyerhhhnko6iumy 1735
6m7m4bsdbzsflego  m5zclz4icakymh2d6fbdkpq2t3n7efql 1681
6ceyqong6nxy7hwp  hds46ckhghbg5gvfwnsgwvhvdildn2e5 1605
jr6t4gi4k2vpry5c  m5zfpvht47zgiujyjkf5en2vu6g7ndzm 1579
xvauhzlpkirnzghg  6ugswkgt5pdmhlyzjgj5bhtnclophhwa 1556
jr6t4gi4k2vpry5c  m5576obc535dffll4qwq6xh37u4dz7y2 1422
ceif2rmdoput3wjh  6s544wf5d6kuyhtiqq7wo37oisi6n5ce 1397
f2ylgv2jochpzm4c  cjore7wxv2x46qrlcvmiyctm4szckpo3 1380
uy5t7cus7dptkchs  6ttlcxahq4obesp5gze4rkjjinivgoyl 1370
7wuwk3aybq5z73m7  6urq4w6to5fmjf3hyqitzlcscxszzbrd 1199
742yhnr32ntzhx3f  t5z3anxrp3e5w5z2s5kqfftuxlw4v6ng 1178
7wuwk3aybq5z73m7  m4ejch7su2xm6zrzu5gn3cgrjsg43f6y 235
6m7m4bsdbzsflego  7lsph6grzr76j27fx4r7ir3ipbexidkt 134
6tkpktox73usm5vq  hday5dik3mzlwkpvemrorfmirfdnepnp 102
6ceyqong6nxy7hwp  lzgpbrgh6ims4fgru6ojdsea7hbruh6s 25
x3wyzqg6cfbqrwht  ciq4shgjmutzgmz2t346rvkwtzuobfqe 24
owbm3sjqdnndmydf  ciebkgj3gbl4egtwqvwssc6kvtt4x6t5 15
4bx2tfgsctov65ch  6rnq6z57hvjrf4a5yktfd7qavuvlvqjb 6

Many of the “Skynet” onion addresses above and other popular addresses are running Bitcoin mining proxies. They generally responded with a basic authentication request for “bitcoin-mining-proxy”. The other services may be Tor based bitcoin mining pools or part of Skynet and/or other botnets. It should be straight forward to find these sites by scanning the service id’s from the raw data on Github.

DoS Attacks on Tor Hidden Services#

Tor hidden service desc_id‘s are calculated deterministically and if there is no ‘descriptor cookie’ set in the hidden service Tor config anyone can determine the desc id‘s for any hidden service at any point in time.This is a requirement for the current hidden service protocol as clients must calculate the current descriptor id to request hidden service descriptors from the HSDir’s. The descriptor ID’s are calculated as follows:

descriptor-id = H(permanent-id | H(time-period | descriptor-cookie | replica))

The replica is an integer, currently either 0 or 1 which will generate two separate descriptor ID’s, distributing the descriptor to two sets of 3 consecutive nodes in the DHT. The permanent-id is derived from the service public key. The hash function is SHA1.

time-period = (current-time + permanent-id-byte * 86400 / 256) / 86400

The time-period changes every 24 hours. The first byte of the permanent_id is added to make sure the hidden services do not all try to update their descriptors at the same time.

identity-digest = H(server-identity-key)

The identity-digest is the SHA1 hash of the public key generated from the secret_id_key file in Tor’s keys directory. Normally it should never change for a node as it is used for to determine the router’s long-term fingerprint, but the key is completely user controlled.

A HSDir is responsible if it is one of the three HSDir’s after the calculated desc id in a descending lists of all nodes in the Tor consensus with the HSDir flag, sorted by their identity digest. The HS descriptor is published to two replica‘s (two set’s of 3 HSDir’s at different points of the router list) based on the two descriptor id’s generated as a result of the ‘0’ or ‘1’ replica value in the descriptor id hash calculation.

I have implemented a script calculating the descriptor ID’s for a particular hidden service at an arbitrary time and it is available on my Github account. I have also created a modified version of ‘Shallot‘ which can be used to generate keys with an identity key in a specified range. The is more usage information on its Github page.

The Attack#

The code listed above could be used to generate identity keys and identity digests for an adversary’s HSDir nodes so that they’ll be selected as 6 of the HSDir’s for a targeted hidden service. These adversary controlled hidden service directories could simple return no data (404 Response) to a client requesting the targeted hidden service’s descriptor and in turn prevent them from finding introduction nodes. As there are no other sources for this hidden service descriptor it would be impossible for a user to set up a circuit a complete circuit to the hidden service and there would be a complete denial of service until the descriptor id changes.

For an adversary to continue this attack over a longer time-frame, they would need to set up their nodes for the upcoming desc id’s of the targeted hidden service more than 24 hours in advance to make sure they will have received the HSDir flag.

An adversary would need to run 12-18 nodes to keep up a complete, persistent DoS on the targeted hidden service. Six nodes would be the “responsible HSDir’s” and the other nodes would be running with identity digests in the range of the upcoming desc id’s, to gain the HSDir flags after 24 hours of up-time. An adversary can cut the resources needed by running two Tor instances/nodes per IPv4 IP or by running the Tor nodes on compromised servers on high, unprivileged ports.

These attacks a quite a real, practical threat against the availability of Tor hidden services. For whatever reason (extortion, censorship etc.) adversary can perform complete DoS attacks with minimal resources and there are no actions hidden service owner can do to mitigate, besides switching to descriptor cookie based authentication or multiple private address. The Tor project can try deal with these attacks by removing known malicious HSDir’s from the network consensus but I don’t see a straight forward way to identify these malicious nodes.

Unfortunately there are no easy solutions to this problem at the moment. I can foresee adversaries employing these attacks in the wild against popular hidden services.

Conclusions#

Tor hidden services were originally implemented as some a simple feature on top of the Tor network and unfortunately they haven’t received the attention and love the deserve for such a popular feature. There are discussions under way to re-implement hidden services to alllow them to scale more efficiently. There are also people looking at reducing the ability of HSDir’s to sit and gather data on onion address and look-ups like I have done, by implementing a PIR protocol. A good summary of work that needs to be done is available in a blog post on the Tor Project’s website. I’d urge any developers with an interest to join the tor-dev mailing list and see if there is something you can contribute! A lot of work is needed.

Anyone interested in learning more about the issues with the current Tor hidden service implementation please check out the presentation “Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization” at the IEEE S&P conference this Monday by Alex Biryukov, Ivan Pustogarov and RalfPhilipp Weinmann will probably provide a much more in-depth, formal investigation and I too look forward to reading it. We have been researching similar areas so I’m very interested in their approach and results.

All raw data and my modified Tor clients are available from my Github repo. This also contains scripts for calculating hidden service descriptors and generating OR private keys with fingerprints in a particular range. This data includes all descriptor requests and hidden service ID’s my nodes observed. Please check it out if you are interested in analyzing a random subset of services on the Tor network.

I’d like to thank @mikesligo for having a heated debate in the pub with me about hidden services and getting me interested in how they work. I’d also like to thank @CiaranmaK for reading a draft of this post and pointing out some corrections.

Thank you for reading my first blog post. I’ll have to work on presenting things better as the information in this post is a bit all over the place. Please let me know if you have any questions or feedback in the comments below.