# DRC Reputation Score Brainstorming ###### tags: `Retrievability Pinning` :::info Up to date by August 2022 ::: *Authors: Irene Giacomelli (Protocol Labs), Danilo Lessa Bernardineli (BlockScience)* ## References - Simulation on the dynamic case: [Dynamical Simulation for Reputation Score](/B2AUtFqhSTWRO4T0uW7pCg) - Simulation on the static case: https://github.com/danlessa/filecoin-drc-research/blob/main/notebooks/reputation_score_static.ipynb ## Notes ### 11ago22 #### KPIs for the Stochastic Simulations - ### 05jul22 #### Non-linear mapping for the Dealer Reputation - Percentile to 5 star map - 4.9 to 5.0: Percentile between 99 and 100 - 4.5 to 4.9: Percentile between 95 and 99 - 4.0 to 4.5: Percentile betwenn 80 and 95 - 3.0 to 4.0: Percentile between 50 and 80 - 2.0 to 3.0: Percentile between 20 and 50 - 1.0 to 2.0: Percentile between 0 and 20 #### Dual Reputation Score - Component 1: High Value at Risk - Possible Names: Collateral Score, Skin In The Game Score, Risk Score - Goal: Incentivize providers that are willing to put an high volume of aggregate collateral - Why: Locking-up collateral represents an definitive opportunity cost for providers. As such, we can hypothetize that the more there is, the lower it is the likelihood that the provider is engaing in self-dealing. - Formula: - Deal Payoff: $\pi_{d, 1}= T_d (F_d + k_a \bar{F_d}) (k_c C_d [S_d=F] - k_s[S_d=T])$ - Suggestion: $k_a=0.1 (?)$ - Suggsetion: $k_c=1 (?)$ - Suggestion: $k_s=10 (?)$ - Reputation Payoff: $\pi_{P, 1} = \sum_\mathcal{D_P} \pi_{d, 1}^\alpha$ - Suggestion: $\alpha=1$ - Dealer Reputation: $R_{P, 1} = Percentile(\pi_{P, 1}; \Pi) \to [0, 100]$ - Notes - M - Component 2: High Deal Volume - Possible Names: Capacity Score - Goal: Incentivize providers that are willing to take an large volume of deals - Formula: - Deal Payoff: $\pi_{d, 2}=T_d (\frac{C_d}{p_d} (k_c [F_d=T] [S_d=F] + \frac{k_e}{N_{D, E}} E^*_d) - k_s[S_d=T])$ - $E^*_ d=[E_d=T]([F_d=T], [S_d=F] \text{ or } [F_d=F])$ - Suggestion: $k_c = 1 (?)$ - Suggestion: $k_s = 10 (?)$ - Suggestion: $k_e = 0.01 (?)$ - Suggestion: $E_d=T$ if the deal mets all of the following conditions - It is one of the first $M_{d, P}$ deals of the provider - There is not more than $M_{d, F}$ finished deals on the protocol - ~~It is one of the first $M_{d, D}$ deals of the protocol~~ - Numerical Suggestions: - $M_{d, P}=3 (?)$ - $M_{d, F}=10 (?)$ - $M_{d, D}=1000 (?)$ - Reputation Payoff: $\pi_{P, 2} = \sum_\mathcal{D_P} \pi_{d, 2}^\alpha$ - Suggestion: $\alpha=1/2$ - Dealer Reputation: $R_{P, 2} = Percentile(\pi_{P, 2}; \Pi) \to [0, 100]$ - Notes - Maximizing this component requires creating and finishing deals as much as possible. - $\alpha=1/2$ penalizes too high values of a deal payoff. This desincentivizes sucessful long deals but also softens the negative impact associated with slashing one of them - Terminology - Deal - $\pi_d$ ($\mathbb{R}$): Deal Reputation Payoff - $E_d$ ($\{F, T\}$): Deal is Early - $F_d$ ($\{F, T\}$): Deal is Finished - $S_d$ ($\{F, T\}$): Deal is Slashed - $T_d$ ($\mathbb{R}_+$): Deal Duration - $C_d$ ($\mathbb{R}_+$): Deal Collateral - $p_d$ ($\mathbb{R}_+$): Deal Payment - Provider - $\pi_P$ ($\mathbb{R}$): Provider Reputation Payoff - $R_P$ ($\mathbb{R}_+ \in [0, 100]$): Provider Reputation Score - Misc - $N_{D, E}$: Number of Deals Contained Inside Early Deal Set. - $k_a$ ($\mathbb{R}$): Active Deal Reputation Multiplier - $k_c$ ($\mathbb{R}$): Collateral Reputation Multiplier - $k_s$ ($\mathbb{R}$): Slashing Reputation Multiplier - $k_d$ ($\mathbb{R}$): Deal Reputation Multiplier - $k_e$ ($\mathbb{R}$): Early Deal Reputation Multiplier - $\alpha$ ($\mathbb{R}$): Reputation Score Curvature ### 21jun22 - Should we include the data storage on the payoffs? - Obs: we can't detect - We won't be able to detect self-deals. #### About self-deals and possible countermeasures (@irene, wip) - What does the reputation represent? - Option 1 : it represents the market position of the provider, high reputation = many real clients . In this case we need to check the clients identity and we can use only Solution 1 or 2 below. - Option 2: it represents the service quality of the provider, high reputation = retrievability is guaranteed. We believe that Option 2 is more interesting and we brainstormed about 3 possible solutions to avoid that self deals are used to create a high score for a provider with a bad service. - Solution 1: verified clients - 1.a: verified by "notaries" - Pros: no change in the protocol; works for bot defintions of reputation score; - Cons: how do we implement "notaries"? High extra cost! - 1.b: verified by an automatic procedure (like linking the twitter account). This alone is not enough, it requires the help of network analytics; - Pros: no change in the protocol; - Cons: same as before + the ones in solution 2 - Solution 2: network analytics - Pros: no change in the protocol; - Cons: - require protocol maintainer work, high cost - not sure it will work - Solution 3: random checks - Pros: - it is automatic, no maintainer work - produce valid data - no subjective - Cons: - need to change the protocol - need design (eg, who pay for the random checks? what happens if appeal fail, do we slash? maybe the client was fine no retrieving the data for a small period...) At the end, it seems that we have a "trilemma", in the sense that we have 3 vertices of complexity that any solution for self-deals is going to touch: - maintainer work - user experience - protocol complexity The open question now is: we can not get all three of them low, so which one we should leave out from our solution? ### 14jun22 - Simulation on the static case: https://github.com/danlessa/filecoin-drc-research/blob/main/notebooks/reputation_score_static.ipynb - On describing reputation during the first weeks when there's not a lot of data: - Option 1: Using an different normalization procedure, like mapping the percentiles to $[50, 80]$ rather than $[0, 100]$. - Option 2: include an deal reputation payoff term for early non-finished deals which could or not carry on for after the early period. - Suggestion: $t_{early}$ is expired when there's more than 10 providers AND 50 finished+active deals. - Potential issue: the lag between $t_{early}$ and the first finished deals coming in. - If the lag is on the 3-7d scale, then probably is fine. More than that will make the RS static for a long time. - Option 3: Drop the $F_d$ term initially for calculation purposes. - Maybe dropping it until we have ~10 to ~100 finished deals? - It is desirable to maximize the finished deals thresholds as this will smoothen the provider payoff vs provider score curve at the beginning - Should we include the data storage on the payoffs? - We won't be able to detect self-deals. - Strategies: - An self-dealer could maximize his number of deals.This tends to be the optimal action when $\alpha<1$ as the finish time is low and the value at risk is low. - An self-dealer could maximize his finish time - An self-dealer could maximize his collateral - An self-dealer could minimize his payment - Possible line of defense: forcing the self-dealing provider to make obvious actions? - For instance, we can induce the self-dealing strategy to bias towards maximizing the number of deals, and perform random retrievability checks without an client appealing. - Possible mitigation: put an premium on appeals with non-slashing outcomes? - This is the most net-neutral direct check on the provider capacity. ## Formalism - $P \in \mathcal{P}$: set of providers - $d \in \mathcal{D}$: set of deals - $\mathcal{D}_P$: set of deals associated with provider $P$ - $\pi_{P}(t) = \sum_{\mathcal{D}_P} \pi_{d}^\alpha(t)$ - $\alpha = 1/2$ - $\pi_{d} = T_d F_d ([S_d = F] \frac{C_d}{p} + [S_d = T] k_s) + k_{early} [F_d=T, S_d=F | F_d = F]$ - $p_d$: Payment associated with the deal - $T_d$: deal duration - $C_d = p * \bar{m_s}$: collateral associated with the deal - $F_d$: 1 if finished else 0 - $S_d$: 1 if slashing, 0 if otherwise - $k_s = -10?$ or $k_s=m_s$ ? - $k_{early} = 1: t < t_{early}, 0 : t > t_{early} | \sum_{\mathcal{D}_P} 1 > MaxEarlyDealsPerProvider$ - MaxEarlyDealsPerProvider = 3? - $R_{provider} = Percentile(\pi_{provider}; \Pi) \to [0, 100]$ - $\Pi$: set of all $\pi_{provder}$ ## Old stuff ____ - What is is: - An function that maps an provider identity into an number or category that measures "reputation". Eg, $R(p; \mathcal{D}) \to \mathbb{R} \lor \mathbb{C}$ - $p$: Provider UUID - $\mathbb{C}$: Set of categories - $\mathcal{D}$: Data - Typical users: - Storage Clients who want to minimize unretrievability risks. - Goal: Reputation is predictive of non-retrieval probability when proposing an deal - Desirable: $f(R(p))=P(r |d_2)$ where f is an known auxiliary function to the client. - Storage Clients who needs to optimize their insurance proposals for risks. - Goal: - Desirable: - Storage Providers that wants to introduce an service differentiator. - Goal: To optimize for reputation - Properties (being brainstormed!) - The reputation should give heavier weight to immediate data points. - Eg. past bad providers should be able to become good providers (typical transition scale: 3mo?) - We need to have extra sensitivity to good providers becoming bad providers. (typical transition scale: 1d?) - The above points may suggest that building up reputation should be slow, and losing it should be fast. - One heuristic based on the above magnitudes is that an negative action should be equivalent to 100 positive actions. An more fair criteria would be to make it quantile dependent. - The reputation should given an heavier weight to longer deals. - Rationale: The longer the deal, the higher the uncertainty. - **Suggestion**: $\Delta R(d) \propto \tau_d^\alpha$ - The reputation should have an light or non-existing dependence on the deal payment - Rationale: lower paying client have special properties: - 1: For a long-tailed distribution, they will not require marginally increased collaterals from the providers - 2: They incur less penalties than an very high paying client. Therefore, if the provider needs to prioritize not delivering the data to someone, the lesser payments will be chosen - Also, making the reputation agnostic on payment will increase the bargaining power of the clients, which will amplify the incentives of an highly paying deal. - The reputation should put an heavier weight to under-collaterized deals - Rationale: Under-collaterized deals amplifies incentivizes to dealers for not delivering the data. - **Suggestion**: Amplify negative score differentials - $\Delta R(d) \propto min(0, a)^{\frac{S}{R}}$ - The reputation should desincentivize under-collaterazation - Rationale: under-collaterazation is not an desirable property on the client perspective and it should be minimized. - **Suggestion**: $R(d) \propto (1 - \frac{S}{R})$