We start by identifying which messages are related to a thread.
A thread starts with a "root" message A
, and we look for all messages which have referenced A
as their "root" (check the content.root
field)
Diagram showing the backlinks - references pointing back to a particular message - from thread replies to the root message of the thread.
The naive approach is to order by timestamp. We have two timestamps to work with - the received timestamp (when we first got the message) or the asserted timestamp (when the author said they published it). We can't rely on either because sometimes people's system clocks are wrong, or sometimes people lie and assert a wrong timestamp.
Diagram showing how ordering by timestamp can result in "incorrect" ordering.
This problem doesn't exist in centralised systems because there is (generally) a globally consistent time - the clock of the server. This time can be "wrong", but will still conserve relative ordering.
And improvement on this is if each message list all messages it was aware of previous to it
So we might have something like:
msg | previous messages | notes |
---|---|---|
A |
[] |
the root has no messages before it |
B |
[A] |
|
C |
[A, B] |
|
D |
[A, B] |
see that D did not know about C (did not have msg yet) |
E |
[A, B, C, D] |
Diagram showing backlinks from each message to each previous message
This is starting to reveal an order! We can be confident in this ordering because each of these backlinks is pointing to a message key, which is a hash of the message. These cannot be known ahead of time (because hashes are sensitive to message content, which includes the asserted publishing time, and this cannot be known), therefore we have a guarentee that each of these links points backwards in time
BUT, you can already see this isn't going to be sustainable. As a thread gets long, the previous messages that need to be mentioned keeps increasing. This leads us to our final solution
In the above table, we can see that there's a lot of redundancy. Check out this alternative
msg | immediately previous |
---|---|
A |
[] |
B |
[A] |
C |
[B] |
D |
[B] |
E |
[C, D] |
If we look at any message we can derive all previous messages previous to it by recurrsively following the trail of immediately previous, collecting the messages mentioned as we go.
In code, something like:
So for a message like C
that means:
Similarly with E
we can follow collect previous links recursively (following the immediately previous links [C, D]
) to get [C, D, B, A]
This is great because it shows that by listing ONLY the messages immediately previous to the message being published, we can still figure out ALL the messages which are previous to our message in the thread.
If we render these backlinks in a graph we can see a much less chaotic graph emerging:
Diagram of a tangle of messages showing backlinks only to messages that were the current most recent nodes on the graph at time of publishing
The last solution for ordering is awesome, but it also leaves some ambiguity about how to display the messages. (Personally, I would love to see a client which shows messages in a graph like this, because sometimes it's very relevant that the author of D
did not know about C
)
Most scuttlebutt clients flatten these graphs into a linear history. To do this you need to identify when it's not clear about ordering, and provide a rule for breakaing ties.
In our example above this means sorting [C, D]
.
The most common tie breaking algorithm is something like:
sort by min("asserted timestamp", "received timestamp")
or in code:
We take the min of these two timestamps because we trust our own clock more than a strangers? It limits their ability to lie about being in the future.
Diagram the same as (c) but with a solid line showing the effect of tie-breaking by timestamp.
Following the algorithm described in (c) / (d), if a new message F
was to be published (that had all the messages shown above), it would only need to list the immediately previous message(s) i.e. [E]
, because that implicitly tells us that it follows [A, B, C, D, E]
.