Last time, we managed to fetch the list of peers from the torrent tracker for our sample torrent file. I left off by simply dumping the response from the torrent tracker onto the screen, and now I would like to pick up on that and actually parse the tracker’s response so that we can get our hands on peers’ IP addresses and ports. That’s going to be our next step towards making connections to peers.

Tracker response structure

Looking at the description of tracker response, we can see that it’s a bencoded dictionary with a few fields. For us, the most interesting one is the peers field that contains a list of dictionaries, where each entry includes information about an individual peer:

ip: peer’s IP address (string);
port: peer’s port number (integer).

Indeed, we can see these values in the raw tracker response. The specification also mentions the field peer id, but skimming through the response string, I don’t see any traces of that field inside. I assume that field is optional. In any case, ip and port are the most important for us now.

A need for a more powerful decoder

Now, that’s a fairly complex structure: a dictionary that contains a field that’s a list of dictionaries. Recall that in our previous work, we implemented some decoding of bencoded values, but the functionality in this area is still quite basic. In particular, we have no way of parsing nested complex structures yet.

Moreover, I’ve been adding accessor methods to the Dict struct in a somewhat haphazard manner, guided by what data I needed at the moment, which has resulted in Dict having a few bespoke methods, such as get_string() and get_dict_sha1(). If I continue in that manner, I risk polluting the Dict interface with more methods of that kind. That’s not a good design.

It looks to me that we’ve come to the point where we need to put more effort into parsing the bencoded data:

Our Decoder must be able to handle complex nested data structures, such as dictionaries containing lists of dictionaries;
We need a more coherent data model to represent the decoded data, so that we can work with it through a relatively narrow interface.

Decoder implementation

Here I’ll describe my approach to representing bencoded data structures in the code. I think I managed to come up with a more or less robust implementation that’s also relatively easy to use. There are probably more idiomatic ways to implement these concepts in Rust, but I’m not familiar with them at my current level. As I learn more, I might revisit this implementation and reshape things.

BencValue enum

At the center of the implementation, there’s the BencValue enum, which replaces the DictValue that I introduced earlier:

pub enum BencValue {
    Int(i64),
    ByteString(ByteString),
    Dict(Dict),
    List(List),
}

This enumeration encompasses all four possible data types that we can encounter in bencoded structures: byte strings, integers, dictionaries, and lists. To access the underlying values, we could use pattern matching to destructure BencValues, but for convenience, I also implemented a few methods that do this work for us: as_int(), as_byte_string(), as_dict(), and as_list(). Each of these methods returns an Option. The idea here is that these methods will return None if they’re called on a BencValue of the wrong type. The alternative could be to return a Result, but I think Option works just fine for now.

We already saw the ByteString struct that I use to represent bencoded strings. As you may remember, we can’t use the built-in String type because bencoded strings can contain binary data incompatible with UTF-8 encoding.

The Dict struct, which we also saw before, is a representation of bencoded dictionaries. Essentially, it’s a wrapper around HashMap<ByteString, BencValue>, but with one important addition: it also carries the sha1 field, which contains the value of the SHA-1 hash of its encoded content. Recall that we need this value specifically for the info field from the torrent file to pass it as a parameter to the tracker’s announce request.

I’ve cleaned up the interface of Dict, removing those pesky get_string() and get_dict_sha1() methods. Now, there’s a single method to fetch the value by its key:

fn get(key: &str) -> Option<&BencValue>

Once the value is fetched, the client code can then use BencValue::as_* methods to access the underlying data. Here’s, for example, how we can fetch the announce URL from the torrent file content:

let announce_url = torrent_file_contents
    .get("announce")
    .and_then(|v| v.as_byte_string())
    .map(|v| v.to_string())
    .expect("Unable to retrieve announce URL");

It looks a bit wordy on the client side, but I didn’t want to introduce specific methods get_string(), get_dict(), etc. to the Dict interface, especially since similar methods would then need to be introduced to the List type. It’s a trade-off between the simplicity of the interface and ease of use. I’ll keep an eye on it in the future to see if there are any tricks in Rust that I could use to get rid of this excess verbosity.

Finally, List is just a type alias for Vec<BencValue>.

You may notice that BencValue is indirectly a recursive data structure: BencValue::Dict represents a dictionary of BencValues, and BencValue::List does the same for lists. That reflects the recursive structure of the bencoding format.

Decoder struct

The Decoder struct has also been improved: now it can recursively parse the bencoded content, building the nested structure of BencValues. It comes with quite an extensive test suite that covers various success and failure scenarios. I’m quite confident in its capabilities, but time will show if I did a good job here.

I’ve decided to keep Decoder private to the module and instead exposed a single top-level function decode_dict() from the bencoding module. Parsing top-level dictionaries has been the only use case we’ve encountered so far.

Test-drive: parsing the peer list

Now, it’s time to put our empowered decoding machinery into use and parse the tracker’s response into a more convenient data structure:

pub struct Peer {
    pub ip: String,
    pub port: u16,
}

pub fn get_peer_list_from_response(tracker_response: &[u8]) -> Result<Vec<Peer>, Box<dyn Error>> {
    let decoded_response = decode_dict(tracker_response)?;

    let peers_list = decoded_response.get("peers").unwrap().as_list().unwrap();
    let x = peers_list
        .iter()
        .map(|peer| peer.as_dict().unwrap())
        .map(|peer| {
            let ip = peer
                .get("ip")
                .and_then(|v| v.as_byte_string())
                .map(|v| v.to_string())
                .unwrap();
            let port = peer
                .get("port")
                .and_then(|v| v.as_int())
                .map(|v| *v as u16)
                .unwrap();
            Peer { ip, port }
        })
        .collect();

    Ok(x)
}

For simplicity, I’m just using unwrap() here and there; better error handling is yet to come. But the core functionality is pretty sound: we iterate over the peers list and convert it into a list of Peer structs that carry peers’ IP addresses and ports.

Running the slightly updated main, I now get the following output from the program:

[main] $ cargo run
Your announce url is: http://bttracker.debian.org:6969/announce
Peer list (total 50 peers):
Top 10 peers:
92.101.180.9:58130
185.209.199.91:51820
84.74.128.132:51413
172.116.246.83:65210
81.225.109.185:51413
136.37.73.176:51413
83.114.68.205:58630
86.115.226.162:49152
145.239.206.200:44444
178.140.191.150:50501
[main] $

What’s next?

So, I managed to get the IP addresses and ports of the peers. In theory, I can go on to start peer-to-peer communication. But I’m a bit confused now about what the best course of action should be. For one, I’m worried that the code I wrote to communicate with the torrent tracker is not in the best shape. It’s mostly ad-hoc experimental snippets of code that lack proper structure, tests, and error handling — all that constitutes good software in my opinion.

I don’t want to rush ahead. Instead, I think I need to do an intermediate reflection session and plan what to do next.