Peeking into p2p world
We are familar with the the client server model. A single server listens for requests and serves the content corresponding to the request. It could be a http server, ftp server etc.These servers are powerful and can handle a large amount of requests.
In Peer to Peer(P2P) systems there is no central server serving the data. The peers provide each other with the data. The p2p system architecture can be partially decentralized or completely decentralized. When we look for p2p the first words that we come across are Bittorrent, Gnutella, Bitcoin. so let’s have a look at Bittorrent protocol which is most widely used for file sharing.
Bittorrent
The detailed Bittorrent specification can be found here[1]. I don’t want to create another same copy but I will provide some key visualizations to understand better.
Some terms to know beforehand
peer : A machine that participates in file sharing can upload and download.
client : A user agent that acts as a peer on behalf of a user.
tracker : Holds information about peers in a swarm. It is a HTTP server responds to get requests.
swarm : A group of peers involved with a particular torrent.
I am considering that the readers are familiar with steps in downloading torrent.Let’s start with the torrent file. I had a torrent file of Big Bang theory episode and I opened it in a text editor.The file was partially readable with some text like ‘announce’ and links , remaining part was just weird ASCII characters.You can also have a look at any torrent file you have.The file is encoded using bencoding.
Bencoding has four datatypes string,integer and two compound types list and dicitionary.
length:string example 4:bird
iintegere example i7e
l<any datatype>e example l3:foo2:ati91ee
This list is [“foo”, “at”, 91]
A list can have elements integer,string,dict,list.
d<string which is key><any datatype which is value>e
example d7:Algebrai45e3:Engli25ei67eee
This dict is { “Algebra” : 45 , “Eng” : [25, 67] }
dictionary must be sorted by keys.
The torrent file is bencode dictionary with the keys as follows
- announce
- announce-list
- comment
- created by
- creation date
- info
info’s value is a dictionary with keys
- for single file
- name
- piece length
- pieces
- length
- for multiple files
- name
- piece length
- pieces
- files list of dictionaries
- length
- path
A file is divided into number of pieces where each piece is of length:piece length except for last piece.pieces is concatenation 20byte SHA1 hash value of all pieces as a single string and not list.For multiple files piece boundary may overlap files.Here is the big bang theory torrent printed using libtorrent library. The second image is torrent file info with some infered information like number of pieces and info hash which is hash of the info value in the dictionary.
Trackers
tracker is an HTTP service which enables peer to join a swarm and locate other peers.It does not provide the data. It accepts a GET request with following parameters
Request
- info_hash
20-byte SHA1 hash value of the “info” key in metainfo file(.torrent) - peer_id
20-byte peer ID. - port
port number used by the torrent client - uploaded
Total amount of bytes peer has uploaded - downloaded
Total amount of bytes peer has downloaded - left
Amount of bytes needed to complete - ip
- numwant
Number of peers it wants. - event
started,stopped,completed
Last three are optional. info_hash and peer_id have to be url encoded.
Response
- interval
A peer must send regular GET requests to the tracker.time in seconds - leechers
number of peers downloading and uploading i.e incomplete files - seeders
number of peers with entire file - peers
List of dictionaries with- peer id
- peer ip
- peer port
At this point I opened the torrent in transmission app and also started wireshark to sniff packets with filter http.request.method == "GET"
but there was no activity.Later searching on Internet I found that the udp tracker protocol is different.The abbove image shows that the trackers are udp service. According to the specification[2]
UDP trackers
There are four messages
- connect request
- connect response
- announce request
- announce response
Timeout’s are handled with resend of request after waiting for incremental amount of seconds.As it is possible to spoof the source ip in udp a connection id is used by tracker and given to client and for further request it is checked to verify.
connect request
offset | size | Name | value |
---|---|---|---|
0 | 64-bit integer | connection_id | |
8 | 32-bit integer | action | 0 //connect |
12 | 32-bit integer | transaction_id | |
16 |
connect response
offset | size | Name | value |
---|---|---|---|
0 | 32-bit integer | action | 0 //connect |
4 | 32-bit integer | transaction_id | |
8 | 64-bit integer | connection_id | |
16 |
announce request
offset | size | Name | value |
---|---|---|---|
0 | 64-bit integer | connection_id | |
8 | 32-bit integer | action | 1 //announce |
12 | 32-bit integer | transaction_id | |
16 | 20-byte string | info_hash | |
36 | 20-byte string | peer_id | |
56 | 64-bit integer | downloaded | |
64 | 64-bit integer | left | |
72 | 64-bit integer | uploaded | |
80 | 32-bit integer | event | 0// 0:None 1:completed 2:started 3:stopped |
84 | 32-bit integer | IP address | 0 //default |
88 | 32-bit integer | key | |
92 | 32-bit integer | num_want | -1 //default |
96 | 16-bit integer | port | |
98 |
announce response:
offset | size | Name | value |
---|---|---|---|
0 | 32-bit integer | action | 1 //announce |
4 | 32-bit integer | transaction_id | |
8 | 32-bit integer | interval | |
12 | 32-bit integer | leechers | |
16 | 32-bit integer | seeders | |
20 + 6 * n | 32-bit integer | Ip address | |
20 + 6 * n | 16-bit integer | port |
Here is the video to demostrate.
References
1] https://wiki.theory.org/BitTorrentSpecification 2] http://bittorrent.org/beps/bep_0015.html