# Private Ethereum State Retrieval
If you are not running your own local node, any information you request about Ethereum’s latest state (such as account balances) is handled by someone else’s node.

This means you rely on a third party for all your queries. Even if you only check a small token balance, the metadata of your requests (IP addresses, location, operating system, etc.) can be gathered. If you keep checking the same address, this data may eventually be used to deanonymize you.
A solution would be an *Ethereum Incognito Mode*, where you can request any on-chain data without revealing what you are asking for, while still receiving accurate answers.
These type of *private queries* can be achieved using **Private Information Retrieval (PIR)**.
## Private Information Retrieval
In PIR, the database owner is not trusted. PIR ensures they cannot see which part of the database you are looking at.
It is similar to [Oblivious Transfer](https://link.springer.com/referenceworkentry/10.1007/978-3-030-71522-9_9), except in OT you can't learn anything about other elements in the database, and the data is usually sent all at once in encrypted form.
Of course, if you download the whole database then PIR becomes trivial, but the goal is that you don't have to do that or in some way that reduces the communication costs of doing so.
I highly recommend this [PIR deep dive](https://youtu.be/Nf4IZ2kTPN4) by Prof. David J. Wu.
The following are some general properties we can use to categorize PIR schemes.
#### Single-server vs Multi-server
- Multi-server: tend to be faster but assume that multiple servers **do not collude**.
- Single-server: can be based on Fully Homomorphic Encryption (FHE) and rely on the Learning With Errors (LWE) assumption.
#### Stateless vs Stateful
- Stateless (online-only): The client does not store any extra information to run queries.
- Stateful: Have an **offline phase** to prepare a compressed version of the database and setup keys for the client.
## Ethereum State
The state in Ethereum is account-based. Each account is an element in the DB that holds:
- `nonce`
- `balance`
- `storageRoot`
- `codeHash`
For externally owned accounts, not smart contracts, only the first two fields matter.
This state is kept in a Merkle Patricia Trie, a key-value data structure with hexary branching (16 children per node).

Merkle trees are key-value data structures.
We have two ways of using this data to build PIR around it:
- Directly: If we use the tree directly, then we should rely on a key-value PIR scheme, this means that our query will use the account address to index the DB.
- Preprocessing: We can flatten the tree and use an index base PIR (simpler), but we would also need a way to map address to array index in a first step.
## Possible Solutions
The outcome of this project should be a way for users to query ethereum state privately. Building a solution around this could be an issue, since the entire state is around 250 GB as seen in [[1]](https://www.paradigm.xyz/2024/03/how-to-raise-the-gas-limit-1), quite larger than the usual size of databases, and elements too, being dealt with in state-of-the-art schemes.
We can go around this issue building this for:
- **Smart contract storage**: Query token balances or some specific smart contract storage, as can be seen [here](https://sprl.it/) for ENS records using FHE.
- **Account Balances**: Query ETH balance on EOA accounts. The accounts trie currently weights around [3.1 GB](https://docs.google.com/spreadsheets/d/1NxyLBqPX6JVMqGqb-kVcWCN0bm8eXmxZmvrKqAS6Xw4/edit?gid=0#gid=0&range=A6:H6), and balances are 256 bits elements.
There is also the question of what does our deliverable consist of:
- **Library/Tooling**: Build a library that allow projects to setup a PIR scheme around data of their own contracts, and rely on them to make this available for users.
- **API**: Make a REST API available for wallets to use this service.
- **Website**: Create a simple website connected to our backend where users just input their address.
## Challenges
#### Updating the tree and doing the pre-processing
Updating the state trie and running a compression function repeatedly can be costly. We might only do this once per epoch, so around 6.4 minutes.
In stateful protocols, this means that clients will need to generate their parameters each epoch as well.
#### Filtering EOAs from state trie
We need to see how clients handle EOA filtering for the world state.
#### Finding a simple key-value PIR scheme.
We must find a suitable scheme, ideally with an implementation, if we go this way.
## Alternative Solutions
#### TOR Network
The problem of linking IP addresses to accounts is not only a problem for Ethereum:
- [Network-Level Privacy - Bitcoin Core vs Wasabi Wallet](https://docs.wasabiwallet.io/why-wasabi/NetworkLevelPrivacy.html)
Accessing any provider through a network like TOR can solve this problem.
#### Portal Network
The Portal Network is a peer-to-peer protocol that runs parallel to Ethereum.
- [Portal Network Docs](https://ethportal.net/overview)
Ethereum data is distributed across the Portal Network, instead of being copied in every individual node.
## Resources
- [How to Raise the Gas Limit, Part 1: State Growth](https://www.paradigm.xyz/2024/03/how-to-raise-the-gas-limit-1)
- [Private Proof of Solvency](https://arxiv.org/abs/2310.13900)