# Compressed Integers
Using lossy compression on uint256 to improve gas costs, ideally by a factor upto 4x.
## Abstract
This document specifies compression of `uint256` to smaller data structures like `uint64`, `uint96`, `uint128`, for optimising costs due to storage for Ethereum Smart Contracts. The smaller data structures are divided into two parts, in the first one we store `significant` bits and in the other number of left `shift`s needed on the significant bits to decompress. This document also includes two specifications for decompression due to the nature of compression being lossy, i.e. it causes underflow.
## Motivation
- Storage is costly, each storage slot costs almost $0.8 to initialize and $0.2 to update (20 gwei, 2000 ETHUSD).
- Usually we store money amounts in uint256 which takes up one entire slot.
- If it's DAI value, the range we work with mostly is 0.001 DAI to 1T DAI (or 10^12^). If it's ETH value, the range we work with mostly is 0.000001 ETH to 1B ETH. Similarly any token of any scale has a resonable range of 10^15^ amounts that we care / work with.
- However, uint256 type allows us to represent $10^-18^ to $10^58^, and most of it is a waste. In technical terms, we have the probability distribution for values larger than $10^15^ and smaller than $10^-3^ as negligbile (i.e. P[val > 10^15^] ≈ 0 and P[val < 10^-3^] ≈ 0).
- Number of bits required to represent 10^15^ values = log_base_2(10^15^) = 50 bits. So just 50 bits are fairly enough to represent practical range of money, causing negligible difference.
## Specification
In this specification, the structure for representing a compressed value is represented using `cintx`, where x is the number of bits taken by the entire compressed value. (On the implementation level, an `uintx` can be used for storing a `cintx` value).
### Compression
#### uint256 into cint64 (upto cint120)
The right most or least significant 8 bits in `cintx` are reserved for storing the shift and the rest available bits are used to store the significant bits starting from the first on bit in `uintx`.
```solidity
struct cint64 { uint56 significant; uint8 shift; }
// ...
struct cint120 { uint112 significant; uint8 shift; }
```
#### uint256 into cint128 (upto cint248)
The right most or least significant 7 bits in `cintx` are reserved for storing the shift and the rest available bits are used to store the significant bits starting from the first on bit in `uintx`.
> In the following code example, `uint7` is used just for representation purposes only, but it should be noted that uints in Solidity are in multiples of 8.
```solidity
struct cint128 { uint121 significant; uint7 shift; }
// ...
struct cint248 { uint241 significant; uint7 shift; }
```
Exampes:
```
Example:
uint256 value: 2**100, binary repr: 1000000...(hundred zeros)
cint64 { significant: 10000000...(55 zeros), shift: 00101101 (45 in decimal)}
Example:
uint256 value: 2**100-1, binary repr: 111111...(hundred ones)
cint64 { significant: 11111111...(56 ones), shift: 00101100 (44 in decimal) }
```
### Decompression
Two decompression methods are defined: a normal `decompress` and a `decompressRoundingUp`.
```solidity
library CInt64 {
// packs the uint256 amount into a cint64
function compress(uint256) internal returns (cint64) {}
// unpacks cint64, by shifting the significant bits left by shift
function decompress(cint64) internal returns (uint256) {}
// unpacks cint64, by shifting the significant bits left by shift
// and having 1s in the shift bits
function decompressRoundingUp(cint64) internal returns (uint256) {}
}
```
#### Normal Decompression
The `significant` bits in the `cintx` are moved to a `uint256` space and shifted left by `shift`.
> NOTE: In the following example, cint16 is used for visual demonstration purpose. But it should be noted that it is definately not safe for storing money amounts because it's significant bits capacity is 8, while at least 50 bits are required for storing money amounts.
```
Example:
cint16{significant:11010111, shift:00000011}
decompressed uint256: 11010111000 // shifted left by 3
Example:
cint64 { significant: 11111111...(56 ones), shift: 00101100 (44 in decimal) }
decompressed uint256: 1111...(56 ones)0000...(44 zeros)
```
#### Decompression along with rounding up
The `significant` bits in the `cintx` are moved to a `uint256` space and shifted left by `shift` and the least significant `shift` bits are `1`s.
```
Example:
cint16{significant:11011110, shift:00000011}
decompressed rounded up value: 11011110111 // shifted left by 3 and 1s instead of 0s
Example:
cint64 { significant: 11111111...(56 ones), shift: 00101100 (44 in decimal) }
decompressed uint256: 1111...(100 ones)
```
This specification is to be used by a new smart contract for managing it's internal state, so that any state mutating calls to it can be cheaper. These compressed values on a smart contract's state are something that should not be exposed to the external world (other smart contracts or clients). A smart contract should expose a decompressed value if needed.
## Rationale
- The `significant` bits are stored in the most significant part of `cintx` while `shift` bits in the least significant part, to help prevent obvious dev mistakes. For e.g. a number smaller than 2^56^-1 it's compressed `cint64` value would be itself if the arrangement were to be opposite than specified. If a developer forgets to uncompress a value before using it, this case would still pass if compressed value is same as uncompressed value.
- It should be noted that, using `cint64` doesn't render gas savings automatically. The solidity compiler needs to pack more data into the same storage slot.
- Also the packing and unpacking adds some small cost too.
```solidity
// uses 3 slots
struct UserData1 {
uint64 amountCompressed;
bytes32 hash;
address beneficiary;
}
// uses 2 slots
struct UserData2 {
uint64 amountCompressed;
address beneficiary;
bytes32 hash;
}
```
## Backwards Compatibility
There are no known backwards incompatible issues.
## Reference Implementation
Using structs for `cintx` as of Solidity 0.8.6 does not yet serve the purpose of saving gas, (see [solidity#11691](https://github.com/ethereum/solidity/issues/11691), requires some backwards incompatible changes in storage layout in Solidity). Hence on the implementation level `uint64` can be used directly.
```soldity
function compress(uint256 full) public pure returns (uint64 cint) {
uint8 bits = mostSignificantBitPosition(full);
if (bits <= 55) {
cint = uint64(full) << 8;
} else {
bits -= 55;
cint = (uint64(full >> bits) << 8) + bits;
}
}
function decompress(uint64 cint) public pure returns (uint256 full) {
uint8 bits = uint8(cint % (1 << 9));
full = uint256(cint >> 8) << bits;
}
function decompressRoundingUp(uint64 cint) public pure returns (uint256 full) {
uint8 bits = uint8(cint % (1 << 9));
full = uint256(cint >> 8) << bits + ((1 << bits) - 1);
}
```
Gist: https://gist.github.com/zemse/0ea19dd9b4922cd68f096fc2eb4abf93
The above gist has `library CompressedMath64` that contains for demonstrative logic for compression, decompression and arithmetic for `cint64`, however we utilise `uint64` data structure due to a present storage layout limitation on nested structs in Solidity. The gist also has an example contract that uses the library for demonstration purpose.
### Some related work
Some engineers have realised that using `uint256` entirely for storing value/money is not optimal in view of storage costs, and some of the bits in the slot could be better utilised with other data stored in it's place.
- Sushiswap's MasterChefV2: ([code ref](https://github.com/sushiswap/sushiswap/blob/master/contracts/MasterChefV2.sol#L198), they are casting `uint256` values to `uint128`, to make some room for other data. No lossy compression).
- Compound's GovernorAlpha ([code ref](https://github.com/compound-finance/compound-protocol/blob/master/contracts/Governance/GovernorAlpha.sol#L88), they use `uint96`)
- Rollups ([Using about 3 bytes or 24 bits for representing ETH value](https://vitalik.ca/general/2021/01/05/rollup.html#how-does-compression-work))
> TODO Find more projects and add links here
## Security Considerations
The following security considerations are discussed:
1. Effects due to lossy compression
- Error estimation for `cint64`
- Handeling the error
2. Losing precision due to incorrect use of `cintx`
3. Compressing something other than money `uint256`s.
### 1. Effects due to lossy compression
When a value is compressed, it causes underflow, i.e. some less significant bits are sacrificed. This results in a `cintx` value whose decompressed value is less than or equal to the actual `uint256` value.
```solidity
uint a = 2**100 - 1; // 100 # of 1s in binary format
uint c = a.compress().decompress();
a > c; // true
a - (2**(100 - 56) - 1) == c; // true
// Visual example:
// before: 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
// after: 1111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000
```
#### Error estimation for cint64
Let's consider we have a `value` of the order 2^m^ (less than 2^m^ and greater than or equal to 2^m-1^).
For all values such that 2^m^ - 1 - (2^m-56^ - 1) <= `value` <= 2^m^ - 1, the compressed value `cvalue` is 2^m^ - 1 - (2^m-56^ - 1).
The maximum error is 2^m-56^ - 1, approximating it to decimal: 10^n-17^ (log_base_2(56) is 17). Here `n` is number of decimal digits + 1.
For e.g. compressing a value of the order $1,000,000,000,000 (or 1T or 10^12) to `cint64`, the maximum error turns out to be 10^12+1-17^ = $10^-4^ = $0.0001. This means the precision after 4 decimal places is lost, or we can say that the uncompressed value is at maximum $0.0001 smaller. Similarly, if someone is storing $1,000,000 into `cint64`, the uncompressed value would be at maximum $0.0000000001 smaller. In comparision, the storage costs are almost $0.8 to initialize and $0.2 to update (20 gwei, 2000 ETHUSD).
#### Handeling the error
Note that compression makes the value slightly smaller (underflow). But we also have another operation that also does that. In integer math, division is a lossy operation (causing underflow). For instance,
```solidity
10000001 / 2 == 5000000 // true
```
Result of the division operation is not always exact, but it's smaller than the actual value, in some cases as in the above example. Though, most engineers try to reduce this effect by doing all the divisions at the end.
```
1001 / 2 * 301 == 150500 // true
1001 * 301 / 2 == 150650 // true
```
The division operation has been in use in the wild, and plenty of lossy integer divisions have taken place, causing DeFi users to get very very slightly less withdrawal amounts, which they don't even notice. If been careful, then the risk is very negligible. Compression is similar, in the sence that it is also division by 2^shift^. If been careful with this too, the effects are minimised.
In general, one should follow the rule:
1. When a smart contract has to transfer a compressed amount to a user, they should use a rounded down value (by using `amount.decompress()`).
2. When a smart contract has to transferFrom a compressed amount from a user to itself, i.e charging for some bill, they should use a rounded up value (by using `amount.compressUp()`).
The above ensures that smart contract does not loose money due to the compression, it is the user who receives less funds or pays more funds. The extent of rounding is something that is negligible enough for the user. Also just to mention, this rounding up and down pattern is observed in many projects including [UniswapV3](https://github.com/Uniswap/uniswap-v3-core/blob/f03155670ec1667406b83a539e23dcccf32a03bc/contracts/libraries/SqrtPriceMath.sol#L118).
### 2. Losing precision due to incorrect use of `cintx`
This is an example where dev errors while using compression can be a problem.
Usual user amounts mostly have an max entropy of 50, i.e. 10^15^ (or 2^50^) values in use, that is the reason why we find uint56 enough for storing significant bits. However, let's see an example:
```solidity
uint64 sharesC = // reading compressed value from storage;
uint64 price = // CALL;
uint64 amountC = sharesC.cmuldiv(price, PRICE_UNIT);
user.transfer(amountC.uncompress());
```
The above code results in a serious precision loss. `sharesC` has an entropy of 50, as well as `priceC` also has an entropy of 50. When we multiply them, we get a value which contains entropies of both, and hence, an entropy of 100. After multiplication is done, `cmul` compresses the value, which drops the entropy of `amountC` to 56 (as we have uint56 there to store significant bits).
To prevent entropy/precision from dropping, we get out from compression.
```solidity
uint64 sharesC = shares.compress();
uint64 priceC = price.compress();
uint256 amount = sharesC.uncompress() * price / PRICE_UNIT;
user.transfer(amount);
```
Compression is only useful when writing to storarge, while doing arithmetic with them should be done very carefully.
### 3. Compressing something other than money `uint256`s.
Compressed Integers is intended to only compress money amount. Technically there are about 10^77^ values that a `uint256` can store but most of those values have a flat distribution i.e. the probability is 0 or extremely negligible. (What is a probability that a user would be depositing 1000T DAI or 1T ETH to a contract? In normal circumstances it doesn't happen, unless someone takes a huge flash loan). Only the amounts that people work with have a non-zero distribution ($0.001 DAI to $1T or 10^15^ to 10^30^ in uint256). 50 bits are enough to represent this information, just to round it we use 56 bits for precision.
Using the same method for compressing something else which have a completely different probability distribution will likely result in a problem. It's best to just not compress if you're not sure about it. And also, for things you think you are sure about using compression for, it's better to give more thought if compression can result into edge cases (for e.g. in previous multiplication example).
### 4. Compressing Stable vs Volatile money amounts
Since we have a dynamic `uint8 shift` value which can move around. So even if you wanted to represent 1 Million [SHIBA INU](https://etherscan.io/token/0x95ad61b0a150d79219dcf64e1e6cc01f0b64c4ce) tokens or 0.0002 WBTC (both $10 as of this writing), cint64 will pick it's top 56 significant bits which will take care of the value representation.
It can be a problem for volatile tokens, if the coin is extremely volatile wrt user's native currency. Imagine a very unlikely case where a coin goes 2^56^x up (price went up by 10^16^ lol). In such cases `uint56` might not be enough as even it's least significant bit is very valuable. If such insanely volatile tokens are to be stored, you should store more significant bits, i.e. using `cint96` or `cint128`.
`cint64` has 56 bits for storing significants, when only 50 were required. Hence there are 6 extra bits, it means that it is fine if the $ value of the cryptocurrency stored in cint64 increases by 2^6^ or 64x. If the value goes down it's not a problem.