Aztec hash retrieval using PIR

# Aztec hash retrieval using PIR The idea is to split the tree by levels and create a PIR database for each level. For a particular node at a given level the corresponding database returns its sibling. The client first requests the sibling of the leaf node, using which it generates the common parent node in the level above. The client then requests the sibling of the parent node and generates the grand parent node and repeats the process till root node. The client iteratively makes $\log{l}$ queries, where $l$ is levels in tree, to PIR databases to build the hash path. There are few things to note: 1. For certain levels (the low ones) it may be more friendly for server and the client to download the entire level instead of creating a PIR database. 2. On server side there's a trade off between setting up multiple databases and client total communication cost. Server can observe that any database of size $s$ that can store nodes at level $i$ (for ex, leaf nodes), another database of same size $s$ can store all nodes from level 1 to i-1. This reduces the number of databases server manages but at the expense of increased client communication cost. 3. If server were to create multiple databases one for each level, then database that stores level $i$ will be twice in size of the database that stores level $i-1$. Since total client communication roughly has complexity $\sqrt{N}$ where $N$ is size of database, client communication cost to query for a node at level $i-1$ will be less than to query for a node at level $i$. ```rust function get_hash_path(leaf): let hash_path = [] for level in L down to I: let sibling = server_pir_request(level, curr_node) hash_path.push(sibling) curr_node = hash(sibling, curr_node) // set curr_node to parent let subtree = download(I-1) // download subrtree till, and including, depth I-1 subtree_hash_path = hash_path(subtree, curr_node) // locally traverse the tree and builds hash path hash_path = hash_path.extend(subtree_hash_path) return hash_path ``` **Concrete values** For concrete values let's assume that server can only setup a database of size 8.5Gb and uses hinteless PIR scheme [1]. Also assume that server stores nodes in elements of database such that nodes that are closer to each other in any given level fall into the same column with high probability. Let the tree have 28 levels with each node of 32 bytes. The server creates two PIR databases each of size 8.5Gb. The first database only stores 28th level (i.e. all leaf nodes) and the second database stores the entire tree in sequence starting from 27th level. ![image](https://hackmd.io/_uploads/rJ8KItQFa.png) The bar chart above plots total client communication cost to retrieve a hash path (iteratively) from the server and varies the levels across x-axis uptill which the client downloads the entire sub tree. For example, level 0 means the client only downloads the root and make PIR requests for rest of the levels. Level 5 means the client downloads the subtree tree till, and including, level 5 and make PIR requests for rest. Notice that, in this case, the client communication cost is lowest when client downloads subtree till level 16 and make PIR requests for the rest of the levels. The values are based on benchmarks given in [1] for PIR database of size 8.56Gb. Note that it is not correct to say the client only retrieves a single hashpath. Instead client retrieves multiple hash paths as long as the leaf nodes for which client wants to retrieve hash path are close to each other in the hash tree. However, in practice this case is unlikely. **Server runtime and preprocessing** Assuming that updates are batched and server follows the rule of laying down nodes at a given level such that nodes that are closer to each other in tree fall in same column with high probability, a single update will affect multiple rows. Let's assume $r$ rows are affect, then pre-processing has complexity $$Nr + \lceil\frac{n^r}{N}\rceil Nn\log{n} + \lceil\frac{n^r}{N}\rceil nN$$ where $n^r$ are no. of rows in database, $N$ is LWE dimension, and $n$ is RLWE ring dimension. In particular for the big database we used above, on a single thread, preprocessing takes 2128s and a single query takes 2.3s. However benchmark of 2128s for pre-processing may not be suitable for our purpose, because it measures processing entire database at once. For our case, a more suitable metric is pre-processing for each incremental update. Additionally note that both pre-processing and runtime are easily parallelizable across threads. **Fast sync maybe better choice for now** Let's assume client wants to retrieve a single hash path, then the client roughly pays 27Mb which is a blow up of x32000 in communication. The communication blowup can be amortized across many hash paths as long as they are clustered to each other. However, in practice it is unlikely. ## References 1. https://eprint.iacr.org/2023/1733