Blinder – MPC Based Scalable and Robust Anonymous Committed Broadcast - 2020

# Blinder – MPC Based Scalable and Robust Anonymous Committed Broadcast - 2020 ###### tags: `Tag(HashCloak - DC Net Readings)` Author(s): Ittai Abraham, Benny Pinkas, and Avishay Yanai Paper: https://eprint.iacr.org/2020/248.pdf ### Table of Contents [toc] :::info >Abstract: Anonymous Committed Broadcast is a functionality that extends DC-nets and allows a set of clients to privately commit a message to set of servers, which can then simultaneously open all committed messages in a random ordering. >Anonymity holds since no one can learn the ordering or the content of the client’s committed message. >We present Blinder, the first system that provides a scalable and fully robust solution for anonymous committed broadcast. Blinder maintains both properties of security (anonymity) and robustness (aka. ‘guaranteed output delivery’ or ‘availability’) in the face of a global active (malicious) adversary. Moreover, Blinder is censorship resistant, meaning that a honest client cannot be blocked from participating. >Blinder obtains its security and scalability by carefully combining classical and state-of-the-art techniques from the fields of anonymous communication and secure multiparty computation (MPC). Relying on MPC for such a system is beneficial since it naturally allows the parties (servers) to enforce some properties on accepted messages prior their publication >In order to demonstrate scalability, we evaluate Blinder with up to 1 million clients, up to 100 servers and a message size of up to 10 kilobytes. In addition, we show that it is a perfect fit to be implemented on a GPU. A GPU based implementation of Blinder with 5 servers, which accepts 1 million clients, incurs a latency of less than 8 minutes; faster by a factor of > 100 than the 3-servers Riposte protocol (SOSP ’15), which is not robust and not censorship resistant; we get an even larger factor when comparing to AsynchroMix and PowerMix (CCS ’19), which are the only constructions that guarantee fairness (or robustness in the online phase). ::: ## 2 Notation and Problem Definition ![](https://i.imgur.com/mOorNex.png) ## 3 Preliminaries We fix a finite field $/mathbb{F}$ over which subsequent computations will be done. Concretely, think of a prime field $\mathbb{F}_p$ with a prime $p > n$. Every server $\mathcal{S}_q, \space q ∈ [n]$, is identified with the integer $q ∈ \mathbb{F}_p$. By a d-polynomial we mean a polynomial $f(X) ∈ \mathbb{F}[X]$ of degree at most $d$. ### 3.1 Shamir’s Secret Sharing ![](https://i.imgur.com/DG7Ulcn.png) ### 3.2 Secure Computation of Arithmetic Circuits (MPC) ![](https://i.imgur.com/BN749GX.png) ![](https://i.imgur.com/6qu1uJe.png) ## 4 The Basic Blinder In this section we describe a basic implementation of the ACB functionality, which relies on $\mathcal{F}_{mpc}$. We argue that the security guarantees of the basic protocol depend on the security guarantees of the underlying implementation of $\mathcal{F}_{mpc}$: * If $\mathcal{F}_{mpc}$ has full security then the basic protocol has robustness and censorship resistance. * Otherwise, if $\mathcal{F}_{mpc}$ has security with abort then the basic protocol is neither robust nor censorship resistant. Let $N$ be the number of messages submitted to the system in a given epoch (i.e. one message from each client). * The servers distributively maintain a matrix $A$ with $R$ rows and $C$ columns such that $R × C = c_1N$. * Denote the entry in the $i$-th row and $j$-th column by $A_{(i,j)}$. * Specifically, for every $i, j$ the servers maintain the sharing [$A_{(i,j)}$], so server $S_q$ holds $A_{(i,j),q}$. * We denote by $A_q$ the whole matrix of shares held by $S_q$. * For simplicity, in the following we assume that messages consist of a single field element, i.e. $L$ = 1. ### 4.1 Protocol Template We first assume that clients are honest. The protocol follows a template of 4 steps: 1. Submitting a message. To submit message m ∈ F, a client C picks random indices i ? ∈ \[R\] and j ? ∈ \[C\] and prepares a matrix M of size R × C such that: ![](https://i.imgur.com/0XEsysq.png) * Then, $C$ calls Share$_t(M(i,j))$ for every $(i, j)$ by which server $S_q$ obtains $M_{(i,j),q}$. * During submission time, $N$ clients prepare blind messages, so that client $C^k$ with message $m^k$ prepares a matrix $M^k$ as its blind message and shares it entry-wise toward all servers. * We denote server $S_q$’s share of the $(i, j)$-entry by $M^k_{(i,j),q}$ * And denote by $M^k_q$ the entire matrix of shares that $S_q$ received from $C^k$. 2. **Format verification:** Once all messages are submitted the servers verify the format of all of them. * *e.g.* that the matrices are of hamming weight at most 1. 3. **Processing:** The servers reveal only the aggregation of all blind messages. * Given $M^1_q , . . . , M^N_q$, i.e. Sq’s shares of blind messages from all $N$ clients, it computes the sum of the shares. * Namely, for every $(i, j)$ ![](https://i.imgur.com/0eK7pFZ.png) 4. **Open:** The servers run $A_{(i,j)} ←$ Reconstruct $(\{A_{(i,j),q\}q∈[n]})$for every $(i, j)$ and output matrix $A = \{A_{(i,j)}\}$$_{i∈R,j∈C}$, which contains all clients’ messages. ### 4.2 Reducing Collisions via Redundancy Instead of only one matrix, each client $\mathcal{C}$ prepares two matrices: * $M$ as before. * And a new matrix $\hat{M}$ such that: * $M_{(i^*,j^*)} = m, \hat{H}_{(i^*,j^*)} = m^2$ and $M_{(i,j)}$ = $\hat{M}_{(i,j)} = 0$ * For all $(i, j) \neq (i^*, j^*)$. The servers now distributively maintain two matrices: * The matrix $A$ that aggregates blinded messages $M^k$ and the matrix $\hat{A}$ that aggregates blinded messages $\hat{M}^k$. * Now, if there are two clients $C^{k_1} , C^{k_2}$ who picked the same entry $(i^*, j^*)$ then we have $A(i^*,j^*) = m^{k_1} + m^{k_2}$ and $\hat{A} (i^*,j^*) = (m^{k_1})^2 + (m^{k_2})^2$. * Using those two equations we can find $m^{k_1}$ and $m^{k_2}$: * In a prime field $\mathbb{F}_p$ when $p$ mod $4 = 3$, for a given $b ∈ \mathbb{F}_p$ we can solve $b = x^2$ by computing $x_1 = b^{(p+1)/4}$ and $x_2 = −x_1$. * This incurs a cost of $log p − 2$ multiplications by computing $b^{(p+1)/4}$ recursively, * *i.e.* $b^{2i} = (b^i)^2$. * This technique works as long as at most two clients wrote to the same entry and fails for three or more. ### 4.3 Excluding Malformed Messages A malicious client, or a coalition of malicious clients, might try to disrupt the operation of Blinder in various ways: * A coalition of clients can pick the same $(i^*, j^*)$ for their messages, which might damage the reconstruction success rate analysis mentioned in 4.2. * A malicious client might fill two or more entries, instead of one, in its blind message $M$. * Even a single client who writes to a single entry may damage the matrix and decrease reconstruction success rate by writing $m$ to $A(i^*,j^*)$ and $m' \neq m^2$ to \hat{A} (i^*,j^*). * Finally, a client may deal inconsistent shares to the servers such that when trying to reconstruct a message in some entry, even robust reconstruction will fail. * To this end, before the servers aggregate a blinded message $M^k$ from client $C^k$, they perform a format verification sub-protocol that detects any kind of deviation from the message format dictated in 4.1. #### 4.3.1 Format Verification Circuit To ease the presentation we treat $M$ and $\hat{M}$ as vectors of size $\ell = c_1N$ rather than matrices, e.g. we write $[M_1], . . . , [M_\ell]$ to refer to the entries of $M$ and write $i^*$ to refer the index chosen by the client (rather than $(i^*, j^*))$. Upon receiving a sharing of a blind message $(M, \hat{M})$ (see 4.1) the servers ensure the following: 1. **Random index of message:** We want the index of the message $(m, \hat{m})$ hidden in $(M, \hat{M})$ be uniformly random in order to fit the success rate analysis (see 4.2). 2. **Single non-zero entry:** The entries of $M$ and $\hat{M} must be all zero, except one entry that contains the message $m$ and $m^2$ in $M$ and $\hat{M}, respectively. * *i.e.* HW$(M)$ = HW$(\hat{M}) = 1$. 3. **Non-zero in the same entry:** The non-zero entries in $M$ and $\hat{M}$ must be at the same index $i^*$ 4. **Squared message:** If $M_{i^*} = m$ then $\hat{M}_{i^*?} = m^2$. * This is necessary for being able to recover from a collision (see 4.2). ![](https://i.imgur.com/IyAXdbL.png) ### 4.4 Compressing Blind Messages For simplicity, suppose that $c_1N$ is a perfect square and $R = C = \sqrt {c_1N}$. * **Blind message format:** . We can compress matrix $M$ of size $c_1N$ to only two short vectors: a row vector $r ∈ \mathbb{F}^\ell$ and a column vector $c ∈ \mathbb{F}^\ell$ where $\ell = \sqrt {c_1N}$ * Denote by $r_i, c_i$ the $i$th coordinate of $r$ and $c$, respectively. * A client $\mathcal{C}$ with a message $m$ randomly picks $(i^*, j^*)$ and assigns $c_{i^*} = 1$, and $c_i = 0$ for every $i \neq i^*$; * likewise, $r_{j^*} = m$, and $r_j = 0$ for every $j \neq j^*$. * Observe that $c × r$ is exactly the matrix $M$ from Eq.(1). * Similarly, $\mathcal{C}$ prepares vectors $\hat{r}$ and $\hat{c}$ instead of the matrix $\hat{M}$, where $\hat{c}_{i^*} = 1$, and $\hat{c}_i = 0$ for every $i \neq i^*$; * Likewise $\hat{r}_{j^*} = m^2$, and $\hat{r}_j = 0$ for every $j \neq j^*$. * Vectors $(r, c)$ and $(\hat{r}, \hat{c})$ constitute the new blind message. * Notice that $c = \hat{c}$, so the client essentially prepares and sends a blind message with only 3 vectors $(r, \hat{r}, c)$, which are used to compute both $M = c × r$ and $\hat{M} = c × \hat{r}$. * $\mathcal{C} shares those vectors toward all servers. * By $x^k_{i,q}$ we denote the share that $C^k$ sends to server $S_q$ for the $i$-th coordinate of a vector $x$. **Format verification:** The format verification works in the same manner as in the non-compressed version. We observe, however, that we can verify the format before de-compression of the blind message. ![](https://i.imgur.com/DgyvUUO.png) **Processing:** After the format verification the servers decompress the blind messages. * Specifically, let $H ⊂ [N]$ be the indices of blinded messages that passed the verification test, namely, $(c^k, r^k, \hat{r}^k )$ for which $\beta^k = \gamma^k = \delta^k = 0$. * Then, for each $k ∈ H$ the circuit computes $M^k = c^k × r^k$ and hat{M}^k = c^k × \hat{r}^k$, which requires a single multiplication layer. * Given $M^k$ and $\hat{M}^k, the processing continues by aggregation all blinded message to the matrices $A$ and $\hat{A}. ![](https://i.imgur.com/C4s5ZPN.png) #### 4.4.1 Extending to an Arbitrary Message Length We describe the required changes in the format of the blind message and its format verification when the message is of length $L > 1$. **Format:** We have $|c| = |r| = |rˆ| = \sqrt {c_1NL}$ so the total number of field elements in matrices $c×$r and $c×\hat{r}$ is $c_1NL$, and we treat the row vectors $r$ and $\hat{r}$ as vectors from $(\mathbb{F}^L)^v$ where $v =$ $\sqrt {c_1NL} \over L$. * Let the message be $m = m_1, . . . , m_L$, the client chooses $(i^*, j^*)$ as before, where it sets $c_{i^*} = 1$ and $c_i = 0$ for other $i \neq i^*$. * In addition, it sets $r_{j^*+k} = m_k$ and $r_{j+k} = 0$ for $j \neq j^*$ and $k ∈ [L]$. * Likewise, it sets $\hat{r}_{j^*+k} = (m_k)^2$ and $\hat{r}_{j+k} = 0$ for $j \neq j^*$ and $k ∈ [L]$. **Format Verification:** We describe the changes required in Circuit 4.2. * Obviously, the shift procedure for the row vectors $r$ and $\hat{r}$ operates on blocks of $\mathbb{F}^L$ rather than on $\mathbb{F}$. * Then, in the *Combine* step, after obtaining the vector $w^k_1 , . . . , w^k_\ell$ for $\ell = \sqrt {c_1NL}$, for $i ∈ [L]$ the circuit computes $[m^k_i] = \sum^v_{j=1}[r^k_{j+i}]$ and $[\hat{m}^k_i]$ = $\sum^v_{j=1}[\hat{r}^k_{j+i}]$. * Then, we redefine $w$ to have only $v =$ $\sqrt {c_1NL} \over L$ rather than $\ell =$ $\sqrt {c_1NL}$ entries by randomly combining each block of $L$ entries to a single one. * That is, for $j ∈ [v] $we redefine $w^k_J \leftarrow \sum^L_{i=1} w^k_{(j−1)L+i}$, then, we can compute $[m\star^k] = \sum^v_{i=1}[w^k_i]$ as before. * The square verification now verifies $L$ entries rather than 1, so $(\gamma ^k) = \sum^L_{i=1}[m^k_i] · [m^k_i] − [ \hat{m}^k_i]$. ### 4.5 Security Our security argument is twofold, depending on the underlying implementation, $π$, of $\mathcal{F}_mpc$. That is: 1. If blind messages are submitted by Shamir’s secret sharing scheme and Blinder invokes $π$ that provides ‘security with abort’ as the underlying implementation of $\mathcal{F}_mpc$, then only correctness is guaranteed. * Where correctness in our context means anonymity, as the location $(i^*, j^*)$ of a particular client is hidden among the locations of all clients in the anonymity set. 2. If blind messages are submitted by VSS (Verifiable Secret Sharing) and Blinder invokes $π$ that provides ‘full security’ as the underlying implementation of $\mathcal{F}_mpc$ then we have robustness and censorship resistance as well. * This is due to the fact that when output depend on inputs that are all valid sharings, the adversary cannot prevent honest servers from obtaining them. ![](https://i.imgur.com/UQKvCzC.png) ## 5 Robust and Efficient Blinder In this section we describe a robust and censorship resistant construction of Blinder. ![](https://i.imgur.com/Gku0oIa.png) ![](https://i.imgur.com/OEJcJIQ.png) ### 5.1 Player Elimination In the field of secure computation, player elimination is a technique that allows the participants of a cryptographic protocol to agree on a set of one or more parties and ignore them to the rest of the protocol. The main idea in the player elimination technique is that if the agreed upon set consists of $k$ honest parties and $k'$ corrupted parties then it holds that $k' ≥ k$. * This ensures that we preserve the invariant of the ratio between corrupted parties and n to be less than $1/4$. Our protocols for the generation of robust random double sharings and for robust input rely on the following observations, separated to two cases, depending on whether a sharing is given by a server or by a client. **A sharing $[x]$ from a server $S_i:$** By running Reconstruct$([x])$ each server $S_q$ broadcasts its share $x_q$ (using a secure broadcast protocol). Then each server attempts to find a $t$-polynomial that agrees with at least $3t + 1$ points. There are 3 cases: 1. If the attempt fails then we can safely eliminate party Si (the dealer). 2. Otherwise, if attempt succeeds, but there is a set of parties $S' ⊂ S$ such that their shares disagree with the reconstructed polynomial, then, let $S_{min}$ be the server with the minimal index in $S'$. * Eliminate both $S_i$ (the dealer) and $S_{min}$. 3. Attempt succeeds and there are no bad points, no server is eliminated. The sharing is perfectly consistent. **A sharing $[x]$ from a client $\mathcal{C}:$** This is a more complicated scenario, because in case that attempt succeeds but we have a non empty set $S'$ (of bad points), we cannot decide to eliminate the dealer and one of $S'$ (as done above), because it might be that the one from $S'$ is honest, which means that a corrupted client has the power to eliminate a honest server. Alternatively, we rely on the fact that there are at most $(1 − ρ)N$ corrupted clients, which means that corrupted clients can ‘blame’ honest servers (by dealing them bad points) at most $(1 − ρ)N$ times. * Specifically, we initialize a counter ctr$_q$ = 0 for server $S_q$. * Then, by running Reconstruct$([x])$, each server $S_q$ broadcasts its share $x_q$. * Then each server attempts to find a $t$-polynomial that agrees with at least $3t + 1$ points. There are 3 cases: 1. If the attempt fails then we can safely eliminate party $\mathcal{C}$ (the dealer). This is the same as the first case in the previous paragraph. 2. Otherwise, if attempt succeeds, but there is a set of parties $S' ⊂ S$ such that their shares disagree with the reconstructed polynomial, then, increment ctr$_q$ for every $S_q ∈ S'$. * If there is a server $S_q$ with ctr$_q > (1 − ρ)N$ then eliminate it. 3. Attempt succeeds and there are no bad points, no one is eliminated. The sharing is perfectly consistent. **Remarks:** * We assume a synchronous network, therefore, if a client does not send a share at all to a server then this is treated the same way as if the client sends a bad share, the server will not be harmed by that. ### 5.2 Robust Random Double Sharing Note that when applying the player elimination technique, the number of parties and number of corrupted parties may change, we denote the updated numbers by $n'$ and $t'$ respectively. Let $V AN_{n'×(n'−t)} ∈ \mathbb{F}' n'×n'$ be a Vandermonde matrix ${i^j}_{i=1,...,n' ;j='',...,n'−t}$. If there are required $\ell$ random double sharings then the parties call this procedure with parameter $\ell·n \over n−t$. ![](https://i.imgur.com/R3SR82r.png) ### 5.3 Robust Input Sharing In the following we show how to achieve the second precondition. In Protocol 5.2 the set $\hat{C}$ represents the set of corrupted clients, who are eliminated. * The set $\hat{C}$ represents the set of (potentially) honest client that are blocked by the corrupted servers. * The functionality allows the adversary to block up to $t(1 − ρ)N − |\tilde{C}| ≤ (t + 1)(1 − ρ)N$ clients. * For all clients in $H$ (the set of honest clients), we have that their input sharings are perfectly consistent. Instead of verifying the consistency of the sharings of each client individually, for efficiency, Protocol 5.2 performs the consistency verification in a batched manner, utilizing the fact that addition of perfectly consistent sharings results in a perfectly consistent sharing. * However, if the addition is not perfectly consistent, we have to find which specific sharing caused that. * To this end, we perform a binary search for the inconsistent sharings. * The servers first randomly combine the blind message of each client C k to a single value α k , then, they arrange the α’s in a binary tree and verify aggregations of them from the root (which aggregates all α’s) to the leaves (which store individual α’s). We proceed with a formal description of the protocol. * Let $\ell$ be the size of the vectors in the blind message, that is, $\ell = \sqrt {c_1N}$. * In addition, redefine the vector $M^k$ to be the concatenation of the 3 vectors given by client C k , that is, $M^k = r^k ||\hat{r} k ||c^k$. ![](https://i.imgur.com/rl21Ap0.png) ## 6 Instantiation and Efficiency Analysis The computational complexity of the protocol is dominated by the de-compression of blind messages, which incurs $c_1N^2L$ finite field multiplications. The computation complexity for the rest of the protocol is insignificant and incurs $O(NL)$ overhead. ### 6.1 The Non-Robust Protocol *Client-Server Communication:* Both protocols begin by accepting $N$ messages from the clients, each message is of size $3 \sqrt {c_1NL}$ field elements, so a server receives a total of $N · 3 \sqrt {c_1NL}$ from all clients and a client sends a total of $n · 3 \sqrt {c_1NL}$. *Format Verification:* The procedure requires $3 \sqrt {c_1NL} + 2N$ public random values that are obtained by invoking $F_{coin}$. * Then, to obtain $β^k , γ^k , δ^k$ the servers invoke $F_{products}$ 3 times, and to output them they invoke Reconstruct for 3 times. * An overall of $3N$ calls to $F_{products}$ and $3N$ calls to Reconstruct. **Aggregation and Output:** The aggregation process invokes $F_{products}$ for $2c_1NL$ times, then to output all entries in matrices $A$ and $\hat{A}$ it takes additional $2c_1NL$ calls to Reconstruct. ### 6.2 The Robust and Censorship Resistant Protocol ***Client-Server Communication:*** This is exactly the same as the in the non-robust protocol, a server receives a total of $N · 3 \sqrt {c_1NL}$ field elements from all clients and a client sends a total of $n · 3 \sqrt {c_1NL}$. ***Random Double Sharing:*** This is separated to the Share and Reconstruct phases: * The Share phase is exactly as in the non-robust protocol, that is, to obtain $x$ random double sharings, each party shares $2x$ $n \over n−t$ random values. * Then, the parties reconstruct the sharings [u_q] and $hUqi$ for each server Sq. This incurs a communication of 2n broadcasts per party, which has a communication of $2nB$. * Thus, overall communication of $2x$ $n \over n−t + 2nB$ per party. * *Consistency Verification:* In the batched consistency verification and format verification there are log $N$ rounds, each incurs at most $t(1 − ρ)N$ broadcasts, thus, an overall of $t(1 − ρ)N$ log $N · B$ per party. * In addition, the protocol uses a random sharing $[\tilde{r}^k] for each client $\mathcal{C}^k$. ***Format Verification, Aggregation and Output:*** These parts of the protocol are exactly the same as in the non-robust version, which incurs $3 \phi N$ calls to $F^{double}_{rand} and $6 \phi N$ times Reconstruct for the format verification, $2c_1NL$ calls to $F^{double}_rand$ and $4c_1NL$ times Reconstruct for the aggregation and output. * This adds $3 \phi N + 2c_1NL$ to the parameter given to $F^{double}_{rand}$ and $n · (6 \phi N + 4c_1NL)$ field elements per party for reconstructions. ![](https://i.imgur.com/TRwyjcg.png) ## 7 Utilizing a GPU In this section, we argue that the computation that Blinder performs can be accelerated by using a GPU. The rational of using GPU in Blinder is that the $O(N^2)$ computational bottleneck consists of simple arithmetic operations (addition and multiplication) over a finite field. For such tasks, GPUs demonstrate a much higher throughput, i.e. integer operations per second (IOPS). The main bottleneck when working with a GPU is the link capacity between the CPU and GPU. We characterize applications that fit a GPU deployment as follows: 1. The input size to the algorithm it has to run should be relatively small. 2. The algorithm itself has to be highly parallelizable, since a GPU has up to thousands of independent cores, each of which can perform some simple task. 3. The output size should be small as well. We observe the following: Let $C = (c_1, . . . , c_N)$ be a matrix with $c_k$ being its $k$th column and $R$ be a matrix with $r_k$ being its $k$th row, s.t. $c_k, r_k$ are part of client $\mathcal{C}^k$’s blind message. Then Blinder essentially computes matrix multiplication! That is ![](https://i.imgur.com/a0xEztc.png) Where $R$ is the matrix with $\hat{r}_k being its $k$th row. Note that A and Aˆ are the matrices from 4.1. * To efficiently divide that task across many cores, Blinder hands vectors $C' = c_i , . . . , c_j and R' = r_k, . . . , r_\ell$ to one GPU core, then that core computes a small matrix multiplication that produce the final entries in matrix A at positions $(i, k), . . . ,(j,\ell)$. * Thus, that GPU core hands back exactly $(j − i) · (\ell − k)$ entries to the CPU. * This way, the overall output handed back from the GPU to the CPU is of size only $|ci | × |ri | = |A|$ (rather than $N|A|$), which fits well point (3) above. ## 8 Implementation & Evaluation ### 8.1 General Implementation Details Our implementation is divided into three modules where the performance of each of the modules measured separately. 1. ***Secure Broadcast Infrastructure:*** This is a basic requirement of Blinder, on which our protocols for generating random double sharing and input sharing protocols rely (Protocols 5.1-5.2). In this module we measure the additional overhead incurred by the adversary, depending on the number of corrupted servers t and corrupted clients $(1 − ρ)$. We do this by deploying the concordBFT (github.com/vmware/concord-bft) system that implements SBFT. 2. ***Client-Server:*** Consists of a client side and a server side software, where the client generates a blind message and sends it to the servers whereas the server is configured to accept $N$ clients. 3. ***Processing:*** g. Consists of all steps of the protocol, consequent upon receiving blind messages from all clients. The measurement of this module deliberately uses an unreliable broadcast channel (i.e. which may lead to an abort of the protocol), since according to our analysis in Ch 6 the performance loss of achieving robustness and censorship resistance by using a reliable broadcast channel depends on both $N$, $t$ and $ρ$. * Thus, we decide to measure that separately so one could extrapolate the correct overall performance by adding the number of these two modules. **High-level architecture:** Blinder’s servers runs four processes (described top down in Figure 1): 1. The process that is responsible on all Blinder’s computational tasks. 2. A network P2P client and server: the server that waits for the $N$ clients to submit their blind messages and also used for receiving messages from other servers, the P2P client sends MPC related messages to other servers. 3. SBFT replica which is a node in a consensus algorithm that is used for receiving message from clients and participating in a protocol by which all honest nodes (replicas) agree on that message, and 4. A SBFT client, which is used to send messages to the consensus algorithm and receive ones that are agreed upon by honest replicas ![](https://i.imgur.com/p7ASJYL.png) **Statistical security parameter and field type:** When the field is too small, we have to repeat these verification checks for $\phi$ times. * In our implementation the field we chose is the Mersenne field $F_p$ with prime $p = 2^{31}−1$, yet we do not repeat the verification checks, i.e. $\phi = 1$, which means that we use $λ = 30$. * We use this field for the efficiency of its multiplication operator, namely, modular multiplication in a Mersenne field does not require performing divisions, even when the product is greater than the prime. * We pick the field to be 31 bits as it best fits registers of both the CPU and GPU. > Although it became standard in the literature to have $λ = 40$ we argue that setting $λ = 30$ is sufficient for our application. ### 8.2 Microbenchmarks #### 8.2.1 Secure Broadcast Infrastructure Our assumption is that servers are discouraged from getting detected and being eliminated, therefore, their best strategy is to slow down the system. This could be done by maximizing the number of inconsistent reconstruction in Protocol 5.2 (consistency verification) according to parameter $ρ$. In the following we measure the maximal slow down that an adversary may cause Blinder when corrupting $(1 − ρ)$ fraction of the clients, in various settings that depend on $n$ and $N$. The three figures show the latency, network load and payload size per server over LAN according to 3 variables: 1. Number of servers. 2. Number of clients 3. And the ratio 4(1 − ρ)$ of corrupted clients. * We can see that the network load is about ≈ 1000× the payload size. * We can see that even in the most severe setting we checked, when the ratio $(1 − ρ) = 1/10$, it takes less than a 1.5 minutes to complete the batched consistency verification of Protocol 5.2. As expected, payload size and network loads become significant as the ratio between corrupted and total number of clients increases. In the most severe case we tested $(N = 10^6$ and $(1 − ρ) = 1/10$ this approaches 0.5 GB. ![](https://i.imgur.com/z2Q8XaI.png) #### 8.2.2 Client-Server Module We measured the time and network load incurred by the collection shares of the blind messages from $N$ clients for different $N$ and $L$ (note that this does not depend on n since each client directly interact with each server in the same manner). * Each server runs on EC2 m5.24xlarge machine with network bandwidth of 25 Gbps. * To model $N$ clients we used 10 weaker client machines EC2 c5.xlarge with each spawning $N/10$ clients in multi-threading. * The results in Table 2 refer to the setting were all machines are instantiated in USN.Virginia (LAN) ![](https://i.imgur.com/S6dqqYh.png) #### 8.2.3 Processing Module In this module we measure the latency and network load of Blinder’s processing phase, broken down to the specific sub-tasks. We implemented two versions, CPU and GPU. First, we present the latency when run over 5 Blinder servers and varying number of clients and message lengths over LAN (USN.Virginia) in **Figure 3**. The latency is broken down to the sub-tasks: 1. The offline phase that produce the double random sharings. 2. The ShiftLeft procedure that ensures a uniform spread of messages. 3. Evaluation of FormatVerification Circuit 4.2 on all messages to obtain $(β^k, γ^k , δ^k)$ for all $k ∈ [N]$. 4. Decompression and aggregation of all blind messages. 5. Reconstruction of aggregated matrices $A$ and $\hat{A} 6. Solving the quadratic equations $m_1 + m_2 = A_i$ and $m^2_1 + m^2_2 = \hat{A}_i$ for all $i ∈ [c_1N]$. 7. In addition, we present (7) the time incurred by adding the robust generation of random double sharings and robust input consistency verification of Ch 5. In addition to the bars that represent tasks (1)-(7) above, we plot a separate bar that represent the time it takes to collect the $N$ messages. Since the messages received at the servers at the $T−1$-th epoch are used in the $T-$th epoch, it is desirable that the time it takes to collect $N$ messages is less than the time it takes to perform tasks (1)-(7). As shown in the figure, this is indeed what happens in all cases, meaning that message submission is not the bottleneck. ![](https://i.imgur.com/lKH792G.png) **Figure 4** presents the network load for various instantiations of Blinder, when taking the severe mode of $(1 − ρ) = 1/10$, and varying $n, N$ and $L$. The parts labeled with ’sub-tasks (1)-(6)’ refer to the networking load of all tasks from Ch 4 for the cases of $N = 10^5$ and $10^6$ whereas the additional stripes refer to the additional overhead of using secure broadcast for the generation of random double sharings and consistency verification. Consider a certain message size $L$, note that the network load per server decreases as we increase the number of servers $n$ from 5 to 10 and 20: ![](https://i.imgur.com/xEKbLxZ.png) We conclude that the additional tasks by which we achieve robustness have marginal effect on both latency and network load. We turn to the scalability of message length. AsynchroMix and PowerMix were not implemented with messages larger than 32 bytes and therefore these are omitted from the following comparison. We evaluate Blinder and Riposte with messages of length up to 2 and 10 kilobytes, respectively. The results are in Fig. 6. ![](https://i.imgur.com/KJtj02Y.png) ### 8.3 Comparison Let us examine the suitability of Blinder to a system like Zcash. * The block rate is 2.5 minutes and the rate of shielded transactions is 6 per second, or 150 per block. * A standard shielded transaction (including a zk SNARK) is of 1-2 kilobytes and the current number of users is much less than 10 thousands. * The figure shows that supporting 10 thousands users within a block rate is possible already with the CPU variant, whereas the GPU variant can support even hundreds of thousands of users in that rate. * Furthermore, considering more complex and heavyweight transactions, the GPU variant can support transactions of size 10 kilobytes in the required rate. * In contrast, Riposte could support 2 kilobytes transactions with that rate, but for up to 10 thousands users. *Multi-core utilization:* To evaluate how well the CPU-based Blinder with 5 servers in N.Virginia, and Riposte with 3 servers utilize hardware, we ran them with different numbers of clients, $N$, and different message lengths, $L$. *Effect of bandwidth:* In this experiment we evaluate how scaling between LAN, WAN1 and WAN2. Our hypothesis is that since the bottleneck of the protocol is the task of de-compressing the blind messages, then ‘slowing down’ the network would not have much impact. ![](https://i.imgur.com/SEYiI2I.png) ### 8.4 Monetary Cost **Figure 8** present the estimated monetary cost in USD over Amazon AWS for both CPU and GPU-based Blinder with 5 servers, supporting message size of 32$B$ and 160$B$. * The each data item is plotted by two short lines, the upper one reflects the total cost whereas the lower one reflects the cost per machine’s up time only (so the difference between the two reflects the cost for data transfer). * Costs in AWS are associated with both running time and data transfer, and both depend on the data center (geographic location) at which the instance runs. * This is calculated according to the CPU and GPU machine cost per hour, which are 2.304$ and 24.48$ (in N.Virginia), respectively In our calculation we picked the value $(1 − ρ) = 1/10$ to get the maximal overhead on both time and network. **Figure 8** present the cost for providing $N$ clients a channel for anonymously broadcast 1$MB$, from the point of view of a server and a client, respectively. The figure suggests that it is cheaper to divide the 1MB to $32$ bytes pieces and use Blinder with $L = 32$ bytes messages(rather than 160), however, this would require using an encoding scheme to be able to combine 32 bytes messages across several epochs. ![](https://i.imgur.com/PaVcxfu.png) ## 9 Conclusions and Future Work This work addresses the question of whether we can design a system for anonymous committed broadcast over a synchronous network, which is resilient to a malicious adversary controlling servers and clients, prevents a malicious server from censoring honest clients, and is scalable in all metrics: * The number of servers. * Number of clients. * And the message size. We answer this question in the affirmative and present the first system, called Blinder, that has all these properties. Blinder confirms our hypothesis that an information theoretically secure protocol performs better than a protocol that relies on computational assumptions. We come to this conclusion since the ‘amount’ of work in Blinder and in Riposte is very close, whereas the type of work that is done in both systems is different: * in Blinder the work is dominated by simple finite field arithmetic. * and in Riposte the work is dominated by AES. Even though AES is implemented in Riposte by a hardware instruction, Blinder performs much better in both the CPU and GPU versions. Our evaluations show that even though the GPU machines are expensive, the actual cost per client, for anonymity of either $N = 10^5$ or $N = 10^6$ is $20 − 100$$ for a total usage of 1 Gigabyte. * Blinder’s good performance over a GPU raises the question of whether we could push further the scalability and increase number of clients to tens or even hundreds of millions and still get a reasonable latency by using a TPU. * In addition, it is interesting to investigate the possibility of extending Blinder to operate over an asynchronous network. * . This is achieved by AsynchroMix, but at the cost of not being fully robust (i.e. that work achieves fairness).

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.