Dynamo Storage System

# Dynamo Storage System https://en.wikipedia.org/wiki/Dynamo_(storage_system) Dynamo (related to but not the same as DynamoDB) is a set of _techniques_, (largely because techniques are copywritable but not enforcable while architectures aren't) Principles (From Wikipedia): > **Incremental scalability**: Dynamo should be able to scale out one storage host (or “node”) at a time, with minimal impact on both operators of the system and the system itself. **Symmetry**: Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities. **Decentralization**: An extension of symmetry, the design should favor decentralized peer-to-peer techniques over centralized control. **Heterogeneity**: The system should be able to exploit heterogeneity in the infrastructure it runs on. For example, the work distribution must be proportional to the capabilities of the individual servers. This is essential in adding new nodes with higher capacity without having to upgrade all hosts at once. Techniques (also from Wikipedia): Problem | Technique | Advantage --- | --- | --- Dataset partitioning | [Consistent Hashing](https://hackmd.io/wqIdu3ErS4GR6yQvkYfGXw) | Incremental, possibly linear scalability in proportion to the number of collaborating nodes. Highly available writes | [Vector Clock](https://hackmd.io/IbIuVox5QmGi5le5rJqtNw) or Dotted-Version-Vector Sets, reconciliation during reads | Version size is decoupled from update rates. Handling temporary failures | [Sloppy Quorum and Hinted Handoff](https://distributed-computing-musings.com/2022/05/sloppy-quorum-and-hinted-handoff-quorum-in-the-times-of-failure/) | Provides high availability and durability guarantee when some of the replicas are not available. Recovering from permanent failures | Anti-entropy using [Merkle tree](https://hackmd.io/zCYrwlRtT-GXrYm7FxL-og) | Can be used to identify differences between replica owners and synchronize divergent replicas pro-actively. Membership and failure detection | [Gossip-based membership protocol and failure detection](https://en.wikipedia.org/wiki/Gossip_protocol#:~:text=A%20gossip%20protocol%20or%20epidemic,all%20members%20of%20a%20group.) | Avoids having a centralized registry for storing membership and node liveness information, preserving symmetry. Only some of these things are implemented in different distributed databases, but these are the major principles identifed by the first people desiging large-scale distributed databases (back in the early Web 2.0 days.) This is where I first saw Merkle Trees being really discussed for use in Production! They were never implemented because... they're really expensive to implement unless you have a good reason to do so. Riak, a company full of ex-amazon employees, is really the major company outside of Amazon who implemented this (and spoke about it's implementation, they sent their devs everywhere!) Otherwise, Amazon would have kept these principles and techniques as internal secrets.