BitTorrent Demystified: What are Torrents and How it Works
In today’s world where every third person is capable of accessing the internet, who does not know about BitTorrent? Almost everyone does. Everybody here has downloaded stuff through BitTorrent sites.
You have hunted for invites to get on famous private trackers, devised new ways for trumping your office/school/college firewall to allow open ports and played around with your internet connections to get better speed.
In this article, I will try to explain: what a torrent is and how exactly does the whole thing work.
The Torrent File
Basically, BitTorrent works on the Divide and Conquer principle. Instead of downloading the complete file in one go, Torrents divide the target into small, convenient pieces distributed over a network of peers. The Torrent file simply tells us how many pieces the file is divided into and where those pieces are to be found. They are assembled together, and now; the file is ready for use.
Technically, a torrent file can be called as a bencoded dictionary. Bencode (pronounced as B encode) is the encoding format used by BitTorrent to store, transmit and organize data in terse format (in the form of the .torrent File). Bencode supports the following types: bytes, strings, integers, lists and dictionaries (associative arrays). Bencode uses ASCII characters as delimiters and digits.
Bencode uses ASCII characters as delimiters and digits.
Strings
Bencoded strings are encoded as follows: <string length>:<string data> or key:value
Note that there is no constant beginning delimiter, and no ending delimiter.
Example: 4:blog represents the string “blog”
Integers
Integers are encoded as follows: i<number>e. Example: i3e represents the integer “3”.
The initial i and trailing e are beginning and ending delimiters. You can have negative numbers such as i-3e. Only the significant digits should be used, one cannot pad the integer with zeroes, such asi04e which would be invalid! However, i0e is valid.
NOTE: The maximum number of bit of this integer is unspecified, but to handle it as a signed 64bit integer is mandatory to handle “large files” aka .torrent for files more than 4 gigabytes in size.
Lists
Lists are encoded as follows: l<bencoded values>e
The initial l and trailing e are beginning and ending delimiters. Lists may contain any bencoded type, including integers, strings, dictionaries, and even lists within other lists.
Example: l4:blog6:solutee represents the list of two strings: [ “blog”, “solute” ]
Dictionaries
Dictionaries are encoded as follows: d<bencoded key><bencoded value>e
The initial d and trailing e are the beginning and ending delimiters. Note that the keys must be bencoded strings. The values may be any bencoded type, including integers, strings, lists, and other dictionaries. Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumeric). The strings should be compared using a binary comparison, not a culture-specific “natural” comparison.
Example: d3:bob7:builder4:blog6:solutee represents the dictionary {“bob”=> “builder”, “blog” => “solute”}
Example: d5:fruitl5:mango5:appleee’ represents the dictionary {“spam” => [ “a”, “b” ] }
Example: d9:publisher3:bob17:publisher-webpage18:www.blogsolute.com18:publisher-location6:officee represents {“publisher” => “bob”, “publisher-webpage” => “www.blogsolute.com”, “publisher-location” => “office”}
There are no restrictions on what kind of values may be stored in lists and dictionaries; they may (and usually do) contain other lists and dictionaries. This allows arbitrarily complex data structures to be encoded.
Coming back to the torrent file, everything stored within a torrent file is bencoded using the above technique. A torrent file, like I said above is a bencoded dictionary and it contains most (if not all) of the following keys. All character string values are UTF-8 encoded.
- info: a dictionary that describes the file(s) of the torrent. There are two possible forms: one for the case of a ‘single-file’ torrent with no directory structure, and one for the case of a ‘multi-file’ torrent. The info section contains following sub keys :
- piece length: number of bytes in each piece (integer)
- pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece (byte string, i.e. not urlencoded)
- private: (optional) this field is an integer. If it is set to “1”, the client MUST publish its presence to get other peers ONLY via the trackers explicitly described in the metainfo file. If this field is set to “0” or is not present, the client may obtain peer from other means, e.g. PEX (peer exchange), DHT (Distributed Hash Tables). Here, “private” may be read as “no external peer source”.
Now depending on the number of files targeted by the torrent (single or multiple), info key contains several more sub keys:
Info in Single File Mode
For the case of the single-file mode, the info dictionary contains the following structure:
- name: the filename. This is purely advisory. (string)
- length: length of the file in bytes (integer)
- md5sum: (optional) a 32-character hexadecimal string corresponding to the MD5 sum of the file. This is not used by BitTorrent at all, but it is included by some programs for greater compatibility.
Info in Multiple File Mode
For the case of the multi-file mode, the info dictionary contains the following structure:
- name: the file path of the directory in which to store all the files. This is purely advisory. (string)
- files: a list of dictionaries, one for each file. Each dictionary in this list contains the following keys:
- length: length of the file in bytes (integer)
- md5sum: (optional) a 32-character hexadecimal string corresponding to the MD5 sum of the file. This is not used by BitTorrent at all, but it is included by some programs for greater compatibility.
- path: a list containing one or more string elements that together represent the path and filename. Each element in the list corresponds to either a directory name or (in the case of the final element) the filename. For example, a file “dir1/dir2/file.ext” would consist of three string elements: “dir1”, “dir2”, and “file.ext”. This is encoded as a bencoded list of strings such asl4:dir14:dir28:file.exte
- announce: The announce URL of the tracker (string)
- announce-list: (optional) this is an extension to the official specification, offering backwards-compatibility. (List of lists of strings).
- creation date: (optional) the creation time of the torrent, in standard UNIX epoch format (integer, seconds since 1-Jan-1970 00:00:00 UTC)
- comment: (optional) free-form textual comments of the author (string)
- Created by: (optional) name and version of the program used to create the .torrent (string)
- encoding: (optional) the string encoding format used to generate the pieces part of the info dictionary in the .torrent metafile (string)
Notes
The piece length specifies the nominal piece size, and is usually a power of 2. The piece size is typically chosen based on the total amount of file data in the torrent, and is constrained by the fact that too-large piece sizes cause inefficiency, and too-small piece sizes cause large .torrent metadata file. Historically, piece size was chosen to result in a .torrent file no greater than approx. 50 – 75 kB (presumably to ease the load on the server hosting the torrent files). Current best-practice is to keep the piece size to 512KB or less, for torrents around 8-10GB, even if that results in a larger .torrent file. This results in a more efficient swarm for sharing files. The most common sizes are 256 kB, 512 kB, and 1 MB.
Overview
A torrent file essentially holds data describing the target file(s), the number of pieces it is divided into, the size of each piece, the tracker URL(s) which are supposed to be tracking the torrent and other data that tells us more information about when, where and by whom the torrent was created.
So, this was all about Torrents and How it works. In next part of Bittorrents Demystified Series, we will learn about Torrent Trackers. Before that Let us know your thoughts on this article by commenting.
This was a Guest Article by Ajitem Sahasrabuddhe, Former Admin of a Torrent Site, Passionate Programmer and Engineering Student. You can follow him on twitter @GreatDharmatma.