# Data Retrieval Consortium Analysis
###### tags: `Retrievability Pinning`
Link: https://www.notion.so/pl-strflt/Data-Retrievability-Consortium-252f52f47b3944e8a54e8d0f553c5cd7#2023f5f015ef44719922402706e2ba95
:::info
Up to date by August 2022
:::
*Authors: Gabriel Lefundes (BlockScience), Danilo Lessa Bernardineli (BlockScience)*
## Conclusions
- The `deposit_margin` ($m_d$) is not described by this formalism as of now. **Ideally**, it should be decided by the client when proposing an deal.
- If fixed, the optimal value will depend on the empirical distribution of the `payment` variable. **Long-tailed distributions will require lower values ($m_d\in[2 m_s,3 m_s]$ can be acceptable)**. If however we expect the distribution to be short-tailed, then **$m_d=10 m_s$** is a safer bet.
- **Numerical suggestion: $m_d = 5 m_s$, with an acceptable region being between 2 and 20.**
- The `committee_multiplier` ($m_c$) is upper bounded by the client belief about the retrieval utility, how much appeals he needs to do and the associated probabilities.
- **Numerical suggestion: $m_c=0.2$ with an acceptable region being between 0.1 and 1.0**
- Rationality constraint: $m_c < \frac{\frac{\delta}{\alpha} + 2P(slash)-1}{\langle N_{appeals} \rangle}$, where $\delta$ is the marginal increase in probability of retrieving the deal and $\alpha$ quantifies how much the `payment` captures the retrieval utility.
- For most cases, it means that $m_c$ can be made relatively high if we assume that the levarage is low (usually on the order of 10 to 1000)
- **$m_c > 1$ will feel weird to the client UX**, as it will cost more to appeal rather than to insure. This has an potentially damaging effect to the protocol adoption. **We recommend therefore setting $m_c$ way lower than the equilibria** condition in order to foster demand.
- Setting up **higher than 1.0 has an potential for misunderstanding and the biggest risk of it being too low is associated to non-honest referees.**
- The above risk is partially mitigated by having an maximum number of appeals
- The `slashing_multiplier` ($m_s$) is upper bounded by the provider belief about his chance of being slashed and there's no trivial lower bound. We suggest setting this parameter as high as possible.
- **Numerical Suggestion: $m_s=100$, with an acceptable region being between 10 and 1000.**
- Rationality constraint: $m_s < \frac{1}{P(slash)}-1$
- For a $P(slash)=0.1\%$, $m_s<999$
- The maximum number of appeals ($N_{a, max}$)
- **We recommend between 3 and 5.**
## Next Steps
- Characterizing the empirical distributions around the observable variables
- Blocked by having representative datasets
- Building proxies for the inobservable variables
- Relaxing existing assumptions through additional formalisms
## Formalism & Analysis

*Diagram of the decisions and processes behind the client-facing DRC protocol*
### Key assumptions
- Time value and discounting will not be taken into account.
- The interval duration between decisions and processes will be immediate
- There's no deposit costs for the provider
### Development
#### Decision Structure
The base decision variables (which can assume 0 or 1) are:
- $d_1$: Client decides to send proposal $\zeta$
- $d_2 | \zeta$: Provider decides to accept proposal $\zeta$
- $d_3$: Client decides to appeal
- $d_4$: Referee Comittee decides to slash provider
Auxiliary variables
- $\pi_r^C \in \mathbb{R}$: File retrieval utility to the client.
- $r \in \{0, 1\}$: file is retrieved
- $p \in \mathbb{R}_+:$ payment
- $m_c$: Multiplier for appealing
- $m_d$: Multiplier for Provider Deposit Margin
- $m_s$: Multiplier for Slashing
The probabilities of the decision variables will be taken by assuming an Best Response form, eg:
$P(d_1|\zeta, I) = (\pi_{d_1} > \pi_{\bar{d_1}})$
Given that, we can assume the following payoffs for each base decision:
- $d_1$
- $\pi_{d_1, \zeta}^C = \pi_r^C P(r|d_2) + p\cdot P(d_2|\zeta, d_1, I) \cdot [2 P(d_4|d_2, I) - \langle N_a\rangle m_c - 1)]-\pi_{\bar{d_2}, \zeta}^C P(\bar{d_2}|\zeta, d_1, I)$
- Equivalent form: $\pi_{d_1, \zeta}^C = \pi_r^C P(r|d_2) + p\cdot P(d_2|\zeta, d_1, I) \cdot [P(d_4|d_2, I) - \langle N_a\rangle m_c - P(\bar{d_4}|d_2, I))]-\pi_{\bar{d_2}, \zeta}^C P(\bar{d_2}|\zeta, d_1, I)$
- $\pi_{\bar{d_2}, \zeta}^C$ (The opportunity cost of having the proposal rejected) will be assumed to be zero.
- $\langle N_a \rangle = \sum_{n_a} P(\prod_{j < n_a} d_{d_3, j} | \prod_{i<j} d_{4, i}, d_2, d_1, I)$
- $\pi_{\bar{d_1}, \zeta}^C = \pi_r^C P(r | \bar{d_2})$
- This inspires an assumption: $P(r|\bar{d_2})<P(r|d_2)$, which leads to the following definition (the marginal increase in retrieval probability): $\delta_r = P(r|d_2)-P(r|\bar{d_2})$
- The above says that the client will propose an deal if $\pi_{d_1, \zeta} > \delta_r$
- If $P(d_2)=1$, then the rationality constraint becomes $\pi_r^C \delta_r + 2p P(d_4)> p m_c \langle N_a \rangle + p$
- Alternate form:
- $\pi_r^C = \frac{p}{\alpha} \leftrightarrow \frac{\delta_r/\alpha + 2 P(d_4) - 1}{{\langle N \rangle}} > m_c$
- $\alpha$ can be understood as an "insurance leverage". Values lower than 1.0 means that the client is under insuring in regards to the retrieval utility
- This means that the following things:
- Increasing the number of expected appeals reduces the acceptable value of $m_c$
- Increasing the a priori slash probability increases the acceptable value of $m_c$
- Increasing the marginal retrieval probability increases the acceptable value of $m_c$
- Under-leveraging increases the acceptable value of $m_c$
- $d_2$
- $\pi_{d_2, \zeta}^P = p [P(\bar{d_4}|d_2, I) - m_s P(d_4|d_2, I)]$
- Note: $P(\bar{d_4})=1-P(d_4) \leftrightarrow \pi=p(1-[P(m_s+1)])$
- $\pi_{\bar{d_2}, \zeta}^P = 0$
- The above payoffs implies that **the payment value is unimportant for the provider to accept an deal** and the sole criteria for acceptance is $P(d_4|d_2, I) < \frac{1}{m_s + 1}$
- Equilbrium: $m_s = \frac{1}{P(d_4)}-1$. Interpretation: the maximum value of $m_s$ for the deal to be rational to the provider.
- If $P(d_4)$ for an honest provider is 0.1%, then the equilibrium value of $m_s$ is 999.
- Another expansion is $P(d_4|d_2) = \langle N_a \rangle P(d_4|d_3, d_2)$
- $d_3$
- $\pi_{d_3}^C=\pi_r^C P(r|d_3, I) + p P(d_4 | d_3, I) - p m_c$
- $\pi_{\bar{d_3}}^C=\pi_r^C P(r | \bar{d_3})$
- Note: those are similiar to $d_1$, but with more immediate priors.
- $d_4$
- $\pi_{d_4}^R= \delta_sP(\bar{r}|d_3)$
- $\pi_{\bar{d_4}}^R= \delta_\bar{s} P(r|d_3)$
#### Optimization Objectives
- The system goals are to maximize:
- Adoption of the protocol
- $\pi^g \propto P(d_2|d_1) + P(d_1)$
- Retrievability when taking a deal
- $\pi^g \propto \frac{P(r|d_2)}{P(r|\bar{d_2})}$
- Cost mitigating when data is not retrievable on a deal
- $\pi^g \propto \frac{p P(\bar{r}, d_4|d_2)}{\pi_r^C}$
- Referee fairness
- $\pi_g \propto P(d_4|\bar{r})-P(d_4|r)$
- Taking all together, the global utility is defined as:
- $\pi^g = \beta_0 P(d_2|d_1) + \beta_1 P(d_1) + \beta_2 \frac{P(r|d_2)}{P(r|\bar{d_2})} + \beta_3 \frac{p P(\bar{r}, d_4|d_2)}{\pi_r^C} + \beta_4 P(d_4|\bar{r})- \beta_5 P(d_4|r)$
___
# Old Stuff
## Variables
- Inputs
- `payment`
- Unit: Filecoin
- $p \in \mathbb{R_+}$
- Protocol Parameters
- `committe_multiplier`
- Controls how much the client should pay the referee commitee
- $m_c \in \mathbb{R_+}$
- `slashing_multiplier`
- Controls how much the provider should be slashed if the client appeals and the referees are not able to consensually retrieve the file
- $m_s \in \mathbb{R_+}$
- `deposit_margin`
- Controls how much the provider should have in order to participate on the consortium. Should be bounded by the slashing multiplier and the payment distribution someshow.
- $m_d \in \mathbb{R_+}$
- `referee_count` (or `n`)
- Controls simultaneously:
- How much eligible leaders there are
- How much referees should validate the file retrieved by the leader
- $c_r \in \mathbb{Z}$
- `appeal_round_count` (or `k`)
- Controls how much leaders should be elected and the process repeated
- $c_a \in \mathbb{Z}$
- `slashing_threshold`
- $c_s \in \mathbb{Z}$
- `round_duration`
- Unit: epochs
- $d_r \in \mathbb{Z}$
- `leader_waiting`
- Unit: epochs
- $d_w \in \mathbb{Z}$
- `referee_waiting`
- Unit: epochs
- $d_r \in \mathbb{Z}$
## Utilities
- $U_{c} = aP()$
___
- Decisions:
1. Client to insure or not insure
2. Provider to provide data or not provide
3. Referee cycle
4. Provider to provide data or not provide
5. Referee to slash or not
- Client payoff
- Data is retrieved
- Retrieve data without insurance: $\pi_C = U_c(t)$
- Retrieve data with insurance: $\pi_C = U_c(t)-p_i$
- Retrieve data with appeal: $\pi_C = U_c(t+\tau)-p_i-f$
- Data is not retrieved
1. $\pi_C = 0$
2. $\pi_C=-p_i$
3. $\pi_C=-f$
- Provider Payoff
## Sketch for an simplified game
### Terminology
- $O_c$: Client opportunity cost in regards to getting the data
- $O_c(t) = DataUtility(t) - BaseUtility \in \mathbb{R_+}$
- $BaseUtility$: utility of not having the data
- $p \in \mathbb{R_+}$: payment
- $f \in \mathbb{R_+}$: appeal fees
- $d \in \mathbb{R_+}$: host deposit
- $\alpha \in \mathbb{R_+}$: deposit margin
- $c \in \mathbb{R}$: host opportunity costs for delivering the data
- $c = $Cost of Data Retrieval - Marginal Cost of Holding
#### Client Payoff Matrix
| - | - | Client | - |
| -------- | -------- | -------- | - |
| - | - | Don't Insure | Insure |
| Host | Hold Data | $-O(t)$ | $-O(t)-f$ |
| - | Deliver Data | $0$ | $-p$ |
#### Host Payoff Matrix
| Agent | - | Client | - |
| -------- | -------- | -------- | - |
| - | Action | Don't Insure | Insure |
| Host | Hold Data | $0$ | $-\alpha p$ |
| - | Deliver Data | $-c$ | $p-c$ |
#### Possible scenario