or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Collision Attacks on Message Identifiers
Problem
Users create signed messages and store them on eventually consistent servers called hubs. Messages must be referenced with a unique identifier in three contexts - from another message (message id), in the database (storage id) and in the sync trie(sync id).
Our proposal recommended message id's like
fid:timestamp:hash
using a 32-bit timestamp and a very short 32-bit hash. Storage id's stretch the message id to avoid collisions with other data in the db and look likeprefix:fid:postfix:timestamp:hash
. Sync ids shuffle the message intotimestamp:hash:fid
to build a time-ordered merkle trie.Unfortunately, this leaves an attack vector open where a malicious actor could generate a collision. They could produce two different messages A and B that have identical ids. A hub that saw message A would never accept message B and vice versa. A client querying these hubs might get different responses for the same query.
Modelling the Attack Vector
A Farcaster message is a ~ 1KB binary object that is hashed with Blake3 to produce a unique identifier. If two different messages produce the same identifier (hash digest), it creates a problem for the network's sync algorithms.
A birthday attack can be performed by randomly generating messages and hashing them until two distinct values are found that produce the same digest. We want to ensure that this is unlikely in a reasonable timeframe (~100 years), even for an attacker with a large bankroll.
The Blake3 spec shows that a c5.metal instance on AWS was able to hash 6866 MiB/s on a single thread. Our smallest input size is ~ 500 bytes and the Blake3 buffer size is 64 bytes, so the nearest "round" size of a message is 512 bytes. A single threaded machine should be thus capable of 6866 MiB/s divided by 512 bytes or roughly 14 MH/s. Assuming we use a c6 with 196 threads and perfect parallelism we should see 2.76 BH/s.
Using the birthday paradox, we estimate that at least 2^(n/2) hashes are required where n is the digest size in bytes. So the numbers of years of hashing on a perfectly parallelized c5.metal instance is given by the formula
(2^(n/2) / (2,760,000,000 * 3600 * 24 * 365)
A c6.metal instance costs ~ 50k per year, so a reasonably determined attacker could 10,000 machines at a cost of 500M/year. A 160-bit hash would be sufficient in that case since it still would take at least 1,000 years.
A final note – the attack is likely slower than these estimates, since in addition to finding a collision, the message must also be a valid farcaster message and a fraction of the message (around 100 bytes) cannot be changed.
Solution: Increase hash digest to 160-bit
Messages must increase the hash size from 32-bit to 160-bit to eliminate the probability of collision entirely. Other properties are held constant and identifiers are constructed as follows:
The benefits of this approach are guaranteed eventual consistency and simpler application logic. It is also possible to reduce the hash size further at a later time since hubs can always truncate hashes safely, though messages that include longer hash references must be preserved as is. ~The downside is that storage costs increase by roughly 50%.
Rejected Options
Option A: Signature Tiebreaker
Colliding messages are only generated maliciously or accidentally and it is safe to replace one with another, as long as the determination is commutative and associative. We retain the short hash for message id's and storage id's while using a 128-bit truncated signature for the sync ids, under the assumption that signatures are evenly distributed.
The benefit of this approach is that we get collision rsistance with no change in storage requirements. The downside is that we will need to do an extra db lookup when a duplicate message is received which should be rare over p2p and uncommon over rpc. It is also slightly harder to explain conceptually.
Option B: Do Nothing
Attacks can only be performed on a user's own data set (due to the signature requirements) so we can allow violation of eventual consistency for self inflicted purposes. From a sync trie pov, the hubs will appear to converge and they will never thrash. But a user who queries all messages out of one hub and compares them against another will see a difference. The benefit of this approach is that our sync logic is simple and storage requirements are low. The downside of this approach is that it violates eventual consistency which makes the overall system's state much harder to reason about.
Appendix A: Cast Flatbuffer Schema
Appendix B: Hash Benchmark
Appendix C: Collider
References