**Author**: Eray Sabancilar **Last updated**: 06.04.2020 # Paper Review - Balance: Dynamic Adjustment of Cryptocurrency Deposits by D. Harz et al. ## Motivation In order to incentivize the agents in a network so that they carry out the desired actions, a deposit may be locked when they participate and is returned when they complete tasks successfully. However, if some of the agents act maliciously, they lose their deposit. We would ideally like to minimize the deposit amount to create good incentives. In fact, we would like the agents to deposit smaller and smaller amounts as they build up their reputation by performing the desired tasks. As soon as they act maliciously, however, they have to start building their reputation from zero, and hence, deposit the highest amount. In Ref. [1], a protocol called Balance is proposed to achive decreasing deposit amounts for agents whose reputation increase over time. ## An Optimal Decision Analysis: Dynamic Programming Approach We analyze the optimality of a set of actions for the Markov decision process defined in the Balance protocol using the dynamic programming framework. We first define our notation to set the ground for our analysis. **State:** We define the state as the agent's level at time $t$, $L_t$, that determines both its reputation score $S_t = (L_t - L_0)/L_0$, and the deposit amount, $D_t = D_0/(1+S_t)$. The initial state is by default the lowest layer, $L_0$ accompanied with the highest deposit, $D_0$, and the zero reputation score, $S_0 = 0$. There exits a highest level, $L_{N}$ with $S_N = (L_N - L_0)/L_0$ and $D_{N} = D_0/(1+S_N)$. **Action:** An agent can follow two possible actions, $a_t^{i}$ at layer $L_t$: i) Carry out a desirable action, $a_t^{d}$, and move to a higher layer, $L_{t+1}$ with $S_{t+1}$ and $D_{t+1}$, ii) Carry out an undesirable action, $a_t^{u}$, and move to the lowest layer $L_0$ with $S_{0} = 0$ and $D_{0}$. **New State:** After taking an action, $a_t^{i}$, the new state is $T(L_t, a_t^i) = L_{t+1}^i$: \begin{align} T(L_t, a_t^d) &= L_{t+1}, \\ T(L_t, a_t^u) &= L_{0}, \end{align} where the reputation and the deposit are determined by the state as \begin{align} S_{t+1} &= \frac{L_{t+1} - L_0}{L_0}, \\ D_{t+1} &= \frac{D_0}{(1+S_{t+1})}. \end{align} **Utility:** The utility from taking an action, $a_t^{i}$, is $U(T(L_t, a_t^i))$: \begin{align} U\left[T(L_t, a_t^d)\right] &= p - c_A - r D_t, \\ U\left[T(L_t, a_t^u)\right] &= v - c_A - (1+r) D_t. \end{align} **Value Function:** We define the value function as a sum over the discounted utilities that an agent gets given its initial state: \begin{align} V(L_0, S_0, D_0) = \max_{\{a_t\}_{t=0}^{T}} \sum_{t=0}^T \left( \frac{\delta}{1+r}\right)^t U\left[T(L_t, a_t)\right], \end{align} subject to \begin{align} a_t &\in \Gamma(L_t) \equiv \left\{a_t^u, a_t^d\right \}, \\ L_{t+1} &= T(L_t,a_t), \end{align} where $\delta \in (0,1)$ and $r$ is the deterministic interest rate corresponding to the opportunity cost of capital of the performing agent. **Bellman Equation:** We suggest that the optimal solution for this Markov decision process can be obtained iteratively using the Bellman equation for each step $j$ starting from the final state at $T$: \begin{align} V(L_j) = \max_{a_j \in \Gamma(L_j)} \left[U[L_j,a_j] + \frac{\delta}{1+r} V\left(T(L_j,a_j)\right) \right], \quad \forall~ t=0,1,2,...,T-1,T, \end{align} so that the optimal amount of deposit values and layers can be obtained taking into account possible transitions from any layer to an higher layer by performing a desired action or going back to layer zero by performing an undesired action. Reference: [1] D. Harz, L. Gudgeon, A. Gervais, W. J. Knottenbelt, "*Balance: Dynamic Adjustment of Cryptocurrency Deposits*", (2019) [ia.cr/2019/675].