How IPFS works - Content Routing, DHT

/* Introduction */

IPFS works because… magic. Ok, that is not actually true. But how does an IPFS client ask everyone for anything and actually get a reasonable response in a reasonable amount of time?

The answer is: content resolution algorithms built on distributed database architectures. That's not helpful? Ok then how about this: The Distributed Hash Table, referred from here on out in this post as the DHT.

Why do I care?

If you're a developer who has spent most of their coding life using HTTP, then you're one of around twenty million people… almost none of whom have actually built production applications on top of a peer to peer architecture. You're going to need to know a few things in order to build, profile, debug… and to know who to curse at. If something goes wrong while you're building with IPFS, this post will ground your debugging approach in some practical knowledge of how the underlying system works.

WTF is content routing anyway and why we need it ?

If you already know roughly what this layer is supposed to do you can skip to Torrent.

Content routing is the art and science of sending data from where you have it to where you want it quickly and efficiently. The first generation of peer-to-peer networks used the approaches of flooding and random walks, which were simple and straightforward but almost impossible to scale. Then DHTs came on the scene and [benefits of DHT]. However, [still problems and therefore need accelerated DHTs]”.

Content routing by example

Let's assume you want to download a file, (I'm gonna refer to this example file as Qmfoo here).

(TODO: maybe just use a real CID and then refer to it in the example by Qm + next 3 chars or something like that. Might simplify the text, and the examples will actually work then. We traditionally use this, but is up to you 😈 -dietrich)

We need to find a node which hosts Qmfoo, connect to it, and ask her "Pls, I would like to download Qmfoo" then the other node will reply with the file.

Let's recap how much we know about that file at this point: We only know the CID, and if we use cid.ipfs.io (with a real CID, this fake one will just error).

Using Qmdd7368UxUcLM1bdCE5KHRMkeUxMx7h4QwUXsFpRivJ5s for example, we see the data inside the CID looks like this:

- base58btc
- cidv0
- dag-pb
(
    sha2-256
    256
    3165D7D9E71759F5DB82F26318F03A5BEEA4BAFD9AE1EECC5D366403CD38C16
)

Most of what you see is metadata explaining how to read the actual data. What is interesting is in between the parenthesis.

And you might notice that it's not much. We have:

the hashing algorithm to use, here sha2-256.
The length of the hash 256 bits, yes it's already included by the previous step (the -256 already mean 256 bits long), other algorithm may have varying outputs and the spec just says to always include it.
And the hash of the root block 3165D7D9E71759F5DB82F26318F03A5BEEA4BAFD9AE1EECC5D366403CD38C16 (I'm not gonna go in the details or why it's not actually the file itself, you can read more about IPLD if you want to know about that).

The important thing to realise is that nowhere in this data structure will you find an address. That is an issue because whatever transport we are gonna use (Internet Protocol, Tor, Ethernet, …) doesn't support dialing random file hashes like that.

So we need a way to get from the CID of a single file to a list of connection details to nodes who might have it. The connection details we need are the public key of the other node, IP address and Port, Tor addresses, … or whatever your transport uses. That's right, IPFS is transport agnostic - it can use many different network transports for any given interaction. I'm gonna stick to IP address + Port for this explanation.

BTW we want a list of them because each one is a single node, if multiple nodes are hosting the file we want to know all of them in order to download from as many places at once as possible, making the download potentially faster and increasing the probability that it will complete successfully.

How other protocols resolve content

HTTP

HTTP uses the URL (Uniform Resource Locator) to resolve content.

If we look at how they look for example:

https://xkcd.com/1319/

Without including more complex part of the standard, a URL is typically composed of:

a scheme (protocol): https
a domain: xkcd.com
a location:: 1319/

It's trivial to see how you would go about securely requesting that file:

First resolve the domain into an IP using DNS.
Then establish a TLS connection with the server, using your installed certificates authorities to ensure legitimacy of the server on the other side.
Request the file 1319/.
The server send you a file.

The first thing to notice is that the security model is a complex authority system that requires that you delegate trust to multiple 3rd parties. In contrast, IPFS embeds the hash of the file (*well, hash of the root which in the end contains the hashes of the leafs, security wise it's the same thing) in the CID. You can verify that what you received is actually what you asked for. Trust moves out-of-band, decoupled from the transport layer.

And they just literally include the location, without even talking about DNS and how much that protocol is cenetralised, this is a big nono for IPFS because we dont know and doesn't even care about where things are, with http and URL if your URL is on a set server that is slow or down you can't download that file even if a nearby fast server have this same file, while IPFS would just connect to an other nearby node and download it.

This paragraph need a re-working to be clearer -dietrich

In short, HTTP says where things are while IPFS says what they should be. A URL is *mostly just a path on a specific computer somewhere. A CID is a Content Identifier, representing a unique piece of content.

Torrent

Torrent is a much different story than HTTP.

Let's inspect a random torrent on my FS:

$ btcheck -lvni ubuntu-21.04-desktop-amd64.iso.torrent 
Announce URL : https://torrent.ubuntu.com/announce
    announce : https://torrent.ubuntu.com/announce
    announce : https://ipv6.torrent.ubuntu.com/announce
File Name    : ubuntu-21.04-desktop-amd64.iso
File Length  : 2818738176
Piece Length : 262144
Torrent Hash : 64a980abe6e448226bb930ba061592e44c3781a1
Comment      : Ubuntu CD releases.ubuntu.com

We see some similarities with a CID,

first we have a hash of the file 64a980abe6e448226bb930ba061592e44c3781a1 this allows us to check the integrity of the downloaded file, so an other node / server can't send us an invalid file (like by trying to inject a malware into a program for example), we would catch this in the hashing step.
the file is splited into many pieces (or shards), what IPFS call leafs 262144, we also have the hash of each individual shard, however I'm not including all 262144 of thoses for your own sanity.
This allows us to download multiple part of the file from multiple nodes in parallel and catching error sooner (if we only have the final hash we must download the complete file just to realise it's corrupted to start all over again, having all shards allows us to only throw away this 2Mib~256Kib part of the file).
But we also have a field IPFS doesn't have: announces. https://torrent.ubuntu.com/announce.

Thoses are URLs, there are many, so you might think like HTTP earlier thoses points to the file except we have an additional layer of integrity thanks to the hashing. But no!

Remember our original problem: In order to talk to a server or other node we need its location on the network (IP), then we can use TCP as usual.

So those announce URLS (usually called trackers) are servers that keep a list of who hosts the file. When someone hosts a file they share this information along with connection details (IP, Ports, Protocol version, …) with the tracker. So our host node will contact the tracker (here torrent.ubuntu.com/announce) and say something like this:

I'm the node Xyz and I'm hosting the file Qmfoo, if any one needs it my ip is 10.42.76.7:1337.

Then when you want to download the file Qmfoo. You contact torrent.ubuntu.com yourself asking something along the lines of:

Hey I'm looking for the file Qmfoo, do you know anyone who has it ?

Then the tracker responds with a list of nodes who host this file:

- Xyz 10.42.76.7:1337
- Abc 192.168.0.3:5432

Then you proceed to contact them and actually download shards from them, check them and rebuild the original files.

It's important to remember that at no point do trackers hold the file, and they never exchange shards. They only store who is interested about which file.
You can't ask a torrent for the content of the file Qmfoo, it can only give you the list it knows of nodes hosting Qmfoo.

This is a pretty good system. It works, however there are challenges: Trackers make the system less resilient. The trackers are yet another piece of infrastructure to maintain and they have huge power over the network. They know at all times everyone who is downloading and hosting files. If you want to stop a torrent from circulating you just need to shutdown those two servers, and nodes will not be able to contact each other anymore and the exchange will stop.

Notes about modern torrents

Torrent has been expended and now some clients also feature a DHT (like transmission). So the usage of trackers can be omitted.

It's also possible to use magnet links which are really close to what CIDs are.

The tracker system is generally more complex and knows about completion (FIXME: why "completion"?) of each node.

IPFS

The first thing IPFS does is it says all nodes are now a tracker too.
So you could ask any node if it knows where your file Qmfoo is located.
The issue is that there is ~20k publicly dialable nodes right now (nodes that are running on public IPs like servers or nodes with an open nat like using UPNP or port forwarding). And this number goes up.
So you could just ask anyone and hope for the best, (IPFS does that too) but most of the time that just doesn't work. This just doesn't scale.

So what we will try is to centralise the management of certain files on a few node. So instead of asking the whole network you would just have to ask 20 or so nodes. Note that thoses 20 or so node are for a specifc file, an other file will likely use 20 different nodes in the network.

So how do we do that ?
When we peaked at the CID earlier, unlike torrent which explicitely specify trackers to use the CID is just the file hash.

Warning

This is the part that makes no sense. Don't search any meaning yet, all the individual steps I'll describe are meaning less without the bigger picture, if you don't understand at first, just continue to read and retry to read it knowing the future steps.

Distance

We choose thoses nodes according to their relatedness to the CID of the file.

To do so we introduce a new notion. Distance it's the XOR result of the two sha256 hashes of the two key we compare.

WTF are the keys in the first place ?
It's whatever we are comparing, most of the time it's the CID of a file and a node.

Why hashing the keys, aren't peer ID and CID hashes already ?

Yes but some savage people like pubsub inputs UTF-8 strings as keys.
This is to have fixed length things to compare against. Let's assume I ask you between cat and cots which one is the closer to dogs, if your mind isn't filled CS concepts and you don't instantly scream Levenshtein distance you will start asking tricky questions like "how must we count adding and deleting characters ?". Using a hashing function is a cryptographically friendly way to make them always 256 bits long and remove all ambiguities comparing varying length thingy.
This also creates a good random distribution from related inputs. Some people with no respect for civilisation geuinly inputs things such as pubsub/v0.5.6/workplace as keys in the DHT, imagine many other keys such as ending by /workplaces, /working, … passing them through a hash function spreads related inputs randomly in the DHT, this helps to avoid overloading one single tracker.

(If you have been offended the civilisation thingy part, ♥ pubsub, it's awesome, and well if we hash string keys anyway it's not so bad. But I still belive you should do it anyway as matter of principles)

XOR

Then we take thoses hashes and XOR them, the result is our distance.
XOR is a bitwise operation it takes two number in input, compare their bits and for each bit if they match inputs a 0, if they differ a 1.

So for example:

    101 (5)
XOR 100 (4)
  = MMD (match, match, differ)
    001 (1)

In practice that just means keys that once hashed are similar produce a low distance number, with a really strong bias on matching the most significant bits.

Geographical distance

If you are asking something like: "How does this calculation gets the distance between two nodes, no where here we input the location data".
Well it doesn't, this is completely made up thing we created, it has no real world meaning and two close node in the DHT can be really far away or really close, it's just two different thing.

Searching

Good now that we have a way to compare keys what we will do is take the list of all nodes compare their peer ID with our CID and take the 20 closests one.

That would work and it's highly probable a DHT client similar to that will be made one day, however we still don't like this, because this require keeping the list of all nodes in the network.

That takes some space, ok for ~20k nodes that likely fine, a good database splitting disk and memory accesses can maybe get the job done, the issue is that many of thoses nodes are temporary, appears stay a bit and leave.

This makes any join / leave of the IPFS a broadcast operation that require all nodes to be updated. You could make it work maybe. BGP (the protocol that makes internet works) does this to an extend. But that not scaling good enough for us, we want to talk less.

What we do instead, is the same thing (finding the closest nodes), but not on the complete network, just on all peers we know (also called a peerset) at that time.

Then we ask them about the Key, what they will do is compare their local peerset. And if they find anyone closer they will redirect us to them instead.

Then we repeat the operation with that peer, until we find someone who doesn't know anyone closer, thoses nodes will be our trackers.

That all !
Ok this doesn't makes to me either, let's go through an example instead.

Example

We want to search for the file Qmfoo, I'm gonna ignore the extra SHA256 step in the distance calculation because that is gonna be clearer, so I'm gonna compare raw CIDs and raw peer IDs in my example.

Host

We host the file Qmfoo, so we search in our peerset the closest node, it happen to be the node Qmbar.

We ask Qmbar:

Hey, I host Qmfoo, can you remember that pls ?

And they respond:

No, sorry I'm not in charge of that, I know Qmfaa which is closer.

So now we ask Qmfaa:

Hey, I host Qmfoo, can you remember that pls ?

And they respond:

No, sorry I'm not in charge of that, I know Qmfoa which is closer.

So now we ask Qmfoa:

Hey, I host Qmfoo, can you remember that pls ?

And they respond:

Yes ok, I'll note your connection details for the next one.

Download

We want the file Qmfoo, so we search in our peerset the closest node, it happen to be the node QmGft.

We ask QmGft:

Hey, I want Qmfoo do you know about it ?

And they respond:

No I don't, but I know QmGoo which is closer.

So now we ask QmGoo:

Hey, I want Qmfoo do you know about it ?

And they respond:

No I don't, but I know Qmfoa which is closer.

So now we ask Qmfoa:

Hey, I want Qmfoo do you know about it ?

And they respond:

Yes, I know the node QmXyz with over TCP with the IP 10.42.123.7 and port 1337.

Then the downloader contacts the host and they start exchanging shards and rebuilding the file using bitswap.

Notice that the path was different, this is normal, each file has multiple way to access it as nodes know many many neighbors (the default config is 300~600) you actually skip most of nodes while routing.

Knowing your neighbors

All I've said rely on an assumption, is that nodes knows their neighbors.

If we just let thing flow randomly we could have pockets of nodes that are really close in the DHT but doesn't know each other, this is bad because if in your search you step over the pocket entrance you will find other nodes responsible for the tracking than the provider used.

This is solved by searching for all of your neighbors first and regularly.

Simply nodes are gonna search for themself, obviously since they are new they will not succeeed, however the last nodes in their journey (the one that will not been able to find closer nodes and just give up) are your closest neighbors, so a node will search for it self and remember / hold a connection with close neighbors.

Logarithm skips

Note this doesn't apply to current IPFS implementations, this is a thing Chore do, current IPFS use Kadamelia instead which has a similar thing but using a 256 GF2 dimensional vector field.

If you only knows neighbors that great because it will work, but that will be slow.

What we do is that we want to know peers all over the network, imagine you are the node Qmbar and search for the file Qmfoo, thoses keys are really far from each other so all of your neighbors are likely gonna be faraway too, all of your hops are gonna be super tiny then.

So nodes also searches for nodes that are far away, what your node could do is make friend (bucket) a node like Qmfaa, this is good because when you do a search in the Qmf region you will be able to skip all the path from Qmb to Qmf, and if you search for Qme it's still likely way closer going by Qmfaa because f is closer to e than b is.
So you search for thoses skips, for an optimal routing speed (for people asking you, not you searching), you need few skips to far away regions, and more and more skips to closer and closer regions, up to your own region where you need to know everyone.

So idealy skips follow some kind of logarithm distribution, for the far away region you just know a node for 3 regions let's say.
Then for closer one you know a node by region.
Then for closer one you know a node by departement.
Then for closer one you know a node by town.
Up to your own building where you know everyone.

(sorry for this geographical analogy, it's just me trying really hard not just bringing the equations)

The default DHT router

The main difference is in how the peer IDs are stored, accelerated just use a big table and puts all known peers there, this is CPU intensive.

While the default one create buckets of peers, this allows more smarter and less agresive searches.

Daniel Norman

2022/05/04 13:53:14

[benefits of DHT].

Is this supposed to link somewhere? I'd love read more about the benefits/tradeoffs of accelerated DHTs and their application in IPFS. (Edited)

2022/05/05 10:52:40

some savage people like pubsub inputs UTF-8 strings as keys

What do you mean with this? - PubSub takes UTF-8 strings as keys Why is that a disadvantage? (Edited)

2022/05/05 11:00:52

e track

This is the first time a tracker is introduced. Is that an IPFS node that takes this role? (Edited)

2022/05/05 11:54:09

In this scenario where we are announcing a CID to the network, how do our peers decide if they are "in charge" of the CID we are publishing? Do they also do an XOR operation with their own peer ID? (Edited)

Lindsay Walker

2022/05/05 15:27:50

I would add in here: Then DHT came into play. A DHT makes it possible for peers to share a table of information about who has what information. It stores the addresses for the content it in a graph data structure that, in the ways the data is labeled, gives a clue to the path one takes to find that data, making it quicker to search and locate information than one (Edited)

2022/05/05 15:28:03

WTF

What in the World? (Edited)

2022/05/05 16:23:35

I'm gonna refer to this example file as `Qmfoo` here

Here is an example file that is the IPFS logo: QmdDT9gCXuRy2Rcs1hwBSUHAT2cF4swarQAZM6iaSrYo7c (Edited)

2022/05/05 16:23:53

Also, maybe we will ned to pin this a few other places, as my node isn't always on (Edited)

2022/05/05 16:24:15

(TODO: maybe just use a real CID and then refer to it in the example by `Qm` + next 3 chars or something like that. Might simplify the text, and the examples will actually work then. We traditionally use [this](https://ipfs.io/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi), but is up to you 😈 -dietrich

That's terrifying (Edited)

2022/05/05 16:24:22

Qmfoo

QmdDT9gCXuRy2Rcs1hwBSUHAT2cF4swarQAZM6iaSrYo7c (Edited)

2022/05/05 16:24:26

2022/05/05 16:26:15

Add a screenshot? ipfs get this : QmUJXcmnJpLzSVmRH6RNtciEjtZbbAGjgteJ2xgYuF1QWo (Edited)

2022/05/06 15:18:23

to use, here `sha2-256

used here is sha-256 (Edited)

2022/05/06 15:18:57

And you might notice that it's not much. We have

In this metadata, you will see: (Edited)

2022/05/06 15:21:46

The length of the hash `256` bits, yes it's already included by the previous step (the `-256` already mean 256 bits long), other algorithm may have varying outputs and

The length of the hash, which is specified by this hashing algorithm, but other algorithms may have varying outputs, so this needs to be specified (Edited)

thomas-shirley

2023/06/29 08:26:04

splited

This word should be 'split' not 'splited' (Edited)

2023/06/29 08:26:57

Thoses

Those (Edited)

2023/06/29 08:27:13

thoses

these (Edited)

2023/06/29 08:29:28

Extremely hard to read and comprehend. Consider revising entire sentence. (Edited)

2023/06/29 08:31:18

torrent

should read: 'torrent tracker' (Edited)

2023/06/29 08:32:29

those (Edited)

2023/06/29 08:32:57

node

nodes (Edited)

2023/06/29 08:33:19

peaked

peeked (Edited)

How IPFS works - Content Routing, DHT

Why do I care?

WTF is content routing anyway and why we need it ?

Content routing by example

How other protocols resolve content

HTTP

Torrent

Notes about modern torrents

IPFS

Warning

Distance

Why hashing the keys, aren't peer ID and CID hashes already ?

XOR

Geographical distance

Searching

Example

Host

Download

Knowing your neighbors

Logarithm skips

The default DHT router

Read more

IPSL - InterPlanetary graph Selection Language

Const slices

Why you can't update a CID and how to do it.