BEP: | 30 |
---|---|
Title: | Merkle hash torrent extension |
Version: | 9c5c1dd1b372016e05af84fb34fccac6752ef54a |
Last-Modified: | Thu Jul 21 10:45:38 2016 -0400 |
Author: | Arno Bakker <arno@cs.vu.nl> |
Status: | Draft |
Type: | Standards Track |
Requires: | 10 |
Content-Type: | text/x-rst |
Created: | 11-Mar-2009 |
Post-History: |
BitTorrent requires a torrent file containing a cryptographic digest of every piece of the content to allow the verification of pieces during the download. Large torrent files put a strain on the Web servers distributing them, and cannot be directly included in RSS feeds or gossiped around.
A related problem is the use of large piece sizes. To keep the size of a torrent file small (as to not overload the Web servers) the number of hashes for a content file is being kept small. For large files this implies that the piece size over which digests are calculated must go up (up to 2MB pieces are used). The large piece sizes affect the ability of peers to barter pieces. Only when a piece has been completely received and verified using the digest may it be traded with other peers. This means that it may be some time before a node starts bartering with others.
Our solution to these two problems is to replace the list of digests with a single Merkle hash [1]. A Merkle hash can be used to verify the integrity of the total content file as well as the individual blocks via a hierarchical scheme. It works by constructing a hash tree of the content and using just the root hash as data integrity protection. The simple root hash value also allows for smaller piece sizes to be used. A common form of hash trees is the Merkle hash tree, hence the name.
We propose a minimalistic design that does not affect the existing BitTorrent protocol and clients very much. The design is backwards compatible in the sense that clients supporting the Simple Merkle Hash extension can still be made to process regular torrent files easily.
>From the content we construct a hash tree as follows. Given a piece size, we calculate the hashes of all the pieces in the set of content files. Next, we create a binary tree of sufficient height. Sufficient height means that the lowest level in the tree has enough nodes to hold all piece hashes in the set. We place all piece hashes in the tree, starting at the left-most leaf, see figure. The remaining leaves in the tree are assigned a filler hash value of 0 (see Discussion). Finally, we calculate the hash values of the higher levels in the tree, by concatenating the hash values of the two children (again left to right) and computing the hash of that aggregate. This process ends in a hash value for the root node, which we call the root hash. The hashing algorithm used is SHA1, as in normal torrents.
The root hash along with the total size of the content-file set and the piece size are now the only information in the system that needs to come from a trusted source. A client that has only the root hash of a file set can check any piece as follows (see figure). It first calculates the hash of the piece it received. Along with this piece it should have received the hashes of the piece's sibling and of its uncles, that is the sibling Y of its parent X, and the uncle of that Y until the root is reached (uncles are marked with * in the figure). Using this information the client recalculates the root hash of the tree, and compares it to the root hash it received from the trusted source.
0* = root hash / \ / \ / \ / \ / \ 1* 2 / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ 3 4 5 6* = uncle / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ 7 8 9 10 11 12* 13 14 / \ / \ / \ / \ / \ / \ / \ / \ 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 P0 P1 P2 P3 P4 P5 P6 P7 P8* P9* P10 P11 P12 X X X = piece index = = = filler hash p s i i e b c l e i n g
The original publisher of the content-file set creates a so-called Merkle torrent which is a torrent file that contains a root hash key in its info part instead of a pieces key, see BEP 3 [2].
When a seeder starts it uses the information in the Merkle torrent and the file set to reconstruct the hash tree and registers itself with the tracker using the hash value of the info part of the Merkle torrent, as usual (see Discussion).
A BitTorrent client that supports the Simple Merkle Hash extension must also support the Extension protocol (BEP 10) [3]. In particular, it must add a Tr_hashpiece message name in the m field of the Extension protocol's handshake message. Such a client must not send piece messages but must use Extension protocol messages with type Tr_hashpiece to send pieces.
A Tr_hashpiece message consists of an index, begin, hashlist and piece. The hashlist consists of the piece's own hash, the piece's sibling hash, and the uncles of the piece up until and including the root hash (see above and Discussion). In particular, the hashlist is a list of 2-element lists. The first element denotes the node offset in the tree, the second element is the hash value. The node offset is the number of the node when numbered in a breadth-first fashion (i.e., going left to right starting at the top).
Only the Tr_hashpiece message with begin field equal to 0 must contain a filled hashlist, for all other begin values the hashlist must be empty. In other words, the message containing the first subpiece should have a filled hashlist, subsequent subpieces should not.
Formally, a Tr_hashpiece message has the following payload:
Upon receipt of a Tr_hashpiece message, the receiver recomputes the root hash using the hashlist and compares it to the root hash in the Merkle torrent. If they match, all the hash values are saved in the receiver's own hash tree, such that they can be passed on to others when the piece is downloaded from this receiver. When all subpieces have come in, the piece is checked using the hash from the hash tree.
We chose a binary tree for simplicity. Trees with larger degrees are also possible. However, the number of hashes that need to be sent with each piece is already small at about 2log of the file-set size.
Using the hash of the info part for registering at the tracker means that for a given content-file set, the swarm that use a conventional torrent file and the swarm that uses a Merkle torrent will be disjunct. The infohash value is different, hence the swarms are known under different identifiers at the trackers.
In theory we can create one swarm. In that swarm, new clients could serve pieces to old clients. For the new clients to benefit from the old clients, however, we need to add some way for the new to obtain the hashes required to check a piece. Designing a fool proof solution for this problem is not trivial.
Because we let the initial seeders recalculate the hash tree, this extension is incompatible with the proposed HTTP Seeding extensions in BEP 17 [4] and 19 [5] .
Including the root hash in a Tr_hashpiece message allows a quick sanity check.
This extension paves the way for BitTorrent URLs. The only information required for a client to commence sharing are the root hash, the total size, the piece size, and a source of peer addresses (tracker, DHT).
Development of this extension was supported by funding from:
- BSIK Freeband Communication I-Share project (Dutch Ministry of Economic Affairs)
- The European Community's Seventh Framework Programme in the P2P-Next project under grant agreement no 216217.
Thanks to Olaf van der Spek and Johan Pouwelse for ideas and suggestions.
[1] | MERKLE, R. A Digital Signature Based on a Conventional Encryption Function. In Proceedings CRYPTOâ87 (Santa Barbara, CA, USA, Aug. 1987), C. Pomerance, Ed., no. 293 in Lecture Notes in Computer Science, Springer-Verlag, pp. 369â378. |
[2] | BEP_0003. The BitTorrent Protocol Specification, Cohen (http://www.bittorrent.org/beps/bep_0003.html) |
[3] | BEP_0010. Extension Protocol, Norberg, Strigeus, Hazel (http://www.bittorrent.org/beps/bep_0010.html) |
[4] | BEP_0017. HTTP Seeding, Hoffman (http://www.bittorrent.org/beps/bep_0017.html) |
[5] | BEP_0019. WebSeed - HTTP/FTP Seeding (GetRight style), Burford (http://www.bittorrent.org/beps/bep_0019.html) |
This document has been placed in the public domain.