*N* copies needed for given SLA

# *N* copies needed for given SLA ## Part of research note on perpetual storage --- **Context:** discussing pertpetual storage with jnthnvctr and Axel. Meeting [notes](https://docs.google.com/document/d/1aWYwpN9lvgEDHtfnDmMWRtFlOTb6iRtf7FjVBJeYRvM/edit) **Question:** How many copies have to be stored to be sure that a minimum number of copies need to be persisted? **Approach:** target some SLA ### Introduction Every storage system (e.g. [AWS](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/example-implementations-for-availability-goals.html)) has a set of guarantees regarding data resiliency. Where some clients may prefer to pay for higher redundancy factors to increase the 9s of resiliency for their data, others may prefer to save on cost and operate with weaker guarantees. Unique to the design of Filecoin is the flexibility for users and protocol designers to specify replication factors - allowing builders to tune risk and resiliency appropriately for their use case. In this analysis, we aim to provide a framework for selecting replication factors on Filecoin and empower clients to reason about the trade offs between cost and resiliency. **Definitions and Assumptions:** * Definitions: * ***Permanent loss***: We define permanent loss to be the scenario where all copies of data are dropped off the network within a given timeframe. * ***Temporary loss***: We define temporary loss to be the scenario where a single copy of data is lost within a given timeframe, but upon detection can be restored from one of the remaining copies. * ***Faults***: A fault occurs when a storage provider fails to provide a proof for their sector in time. A fault does not necessarily mean data is lost (repeated faults would be a better indication of this) - but for our analysis, we take the most conservative case of assuming a fault = data is lost. * Assumptions: * Failure is binomially distributed – i.e. selection with replacement. Given we're able to restore datasets, this seems reasonable. * Failures are independent. While there is some evidence sector failure events occur in clumps, we assume a strategy in which clients intentionally store their data across distinct storage provider actors and across distinct geographies to mitigate this. * This can be manually tuned today using reputation systems and the [FVM](https://www.fvm.filecoin.io) can enable more on-chain information to enable automated processing. * While not explicit, data recovery requires storing additional copies of data on the network. Today, this involves manual intervention, but the [FVM](https://www.fvm.filecoin.io) can enable simple bounty systems for maintaining minimum levels of redundancy. We assume this function exists either via manual replication or bounties. _Additional Notes_ * Filecoin is unique in that: * (a.) Storage providers have collateral at stake with keeping data online. * (b.) Zk-proofs are run over all sectors every 24 hours. * (c.) Proof of replication makes it possible to verifiably store multiple distinct copies of data (making it irational for an attacker to pretend to store multiple copies) * Given (a.) we can restrict our analysis to "inadvertant" data loss, given storage providers have "skin in the game" simply dropping off the network comes at a heavy cost - making it irrational to do so even in periods of token volatility. * Given (b.) the Filecoin network discovery of data loss happens at a granular level, allowing for recovery actions to take place. ### How many copies of data need to be stored to achieve a target SLA? If there are $n$ copies, the probability of losing $k$ copies on a given day is \begin{align*} p\left(k\,\text{fail}\right)=\left(\begin{array}{c} n\\ k \end{array}\right)p_{1}^{k}\left(1-p_{1}\right)^{n-k} \end{align*} where $p_{1}$ is the probability one fails on a given day. The probability all $n$ fail on a given day (catastrophic failure) is \begin{align*} p\left(n\,\text{fail}\right)=p_{1}^{n}\,. \end{align*} To achieve five 9s SLA we therefore require \begin{align*} n=\left\lceil \frac{\text{log}\left(1-0.99999\right)}{\text{log}\,p_{1}}\right\rceil \, \end{align*} Simply put, we can calculate the required copies of data to store, $n$, based on our target resiliency and the probability of losing a single copy, $p_1$. ### Strategies for selecting storage providers and finding a value for $p_1$ Given the above, we must now pick a value for $p_1$. _Note - we can conservatively approach this calculation by assuming a fault (a miner failing to submit a proof on time) is equivalent to data loss. In practice, faults can occur for a number of reasons - e.g. during an upgrade._ One method for calculating $p_1$ might be to look at the network in aggregate - assuming the average case for faults can be applied equally to all miners. To do this calculation, we can take the ratio of the *current faults outstanding* (around 61 PiB) to *network capacity* (around 15 EiB) as our $p_1$ - this comes out at 0.000246 (about 99.97% 'uptime'). Therefore the number of copies needed (using our formula above) for five 9s SLA is $n=2$. Low number! As a base framework this seems like a reasonable approach, and likely conservative - assuming the client applies some strategy in choosing the SPs with whom they store their data. Based on network stats, we can actually better select our risk: ![](https://i.imgur.com/ddYPwHT.png) Since the vast majority of miners have negligible faults (note the log scale for counts) and there are a few outliers with higher fault rates drive the average faults up, users can easily adopt a strategy based on the verifiable history of miner performance. For example, the top quintile of miners by fewest number of faults per storage power, have 0 faults in the month of January. As the network matures, the reputational history grows giving clients higher signal in the operational excellence of the SPs with whom they store data.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.