# Ceph Replication Factor ### Overview The default and most commonly used replication factor for Ceph deployments is 3x. 2x replication is not unheard of when optimizing for IOPS. IOPS optimized clusters normally utilize higher-end [TLC](https://www.micron.com/products/nand-flash/tlc-and-qlc-devices) NVMe drives. The justification for this choice is often the higher [MTBF](https://www.backblaze.com/blog/how-reliable-are-ssds/) for flash vs hdd. The trade off when choosing 2x over 3x replication will be the loss of high-availability and potentially increased risk of data loss. ### Ceph min_size **min_size** is the minimum number of replicas required for an object in a Ceph data pool to receive IO. When the number of replicas is lower than min_size all IO for the object will be rejected. min_size needs to be lower than the replication factor for **high-availability**. For instance, with 3x replicated pools min_size is normally set to 2. This allows for up 1 replica to be down. IO for an object will be rejected when 2 replicas are down. ### 2x Replication min_size Trade-Off With a replication factor of 2x min_size needs to be 1 to provide high-availability. With min_size set to 1 Ceph will continue to allow IO to an object when only 1 replica exists. This increases the risk of **data loss**. Additionally, high-availability is lost with only 1 replica. It is possible to use min_size 2 with 2x replication. The trade-off here is the loss of high-availability. A single drive being unavailable (due to failure, node reboot, etc) will cause **downtime**. ### Additional Factors While the higher MTBF for flash relative to hdd reduces the risk of failure due to drive hardware there are other factors to consider. These factors include (but are not limited to) firmware updates and human error. ### 2x Replication Reference Architectures [Micron 9300 Max NVMe](https://media-www.micron.com/-/media/client/global/documents/products/other-documents/micron_9300_and_red_hat_ceph_reference_architecture.pdf?la=en&rev=3e2fcdc4b63e49cd81b036fb635c5b71) [Samsung PM1725a NVMe](https://www.samsung.com/semiconductor/global.semi/file/resource/2020/05/redhat-ceph-whitepaper-0521.pdf) [Supermicro NVMe](https://www.supermicro.com/white_paper/white_paper_Ceph-Ultra.pdf) Note that these Reference Architectures are all for performance optimization in non-hyperconverged environments. ### Community Views on Ceph Replication Factor [NVMe and 2x Replica](https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/U6UREWXSN4JMZLT6PXVJR5W2WRWAC6B3/) [Ceph NVMe 2x Replication](https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RWK5V4GDXUY22HU7KK24PPR76IKOYDJ5/) [2x Replication: A BIG warning](http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html) ### Additional Reading [The Probability of Data Loss in Large Clusters](https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html) [Ceph Replication 2x vs. 3x: Micron 9200 MAX Reference Architecture Block Performance](https://www.micron.com/about/blog/2018/april/ceph-replication-2x-vs-3x-micron-9200-max-reference-architecture-block-performance) ### Summary and Opinion Ceph with a replication factor of 2x can be a viable option when higher-end flash storage is used. The trade-off vs 3x replication will be a loss of high-availability and potentially increased risk of data loss. This trade-off may be acceptable in non-hyperconverged, performance oriented environments. Deploying Ceph in a hyperconverged Kubernetes environment where frequent reboots and outages are expected to be seamless make this trade-off difficult to justify. **2x replication with a min_size of either 1 or 2 will mean decreased availabilty vs 3x. Using a min_size of 1 additionally increases the risk of data loss.**