owned this note
owned this note
Published
Linked with GitHub
# The Most Important Tasks in Tezos (apart from scaling)
Nicolas Ochem - January 2023
## Proto
### Adaptive inflation
Described and discussed [here](https://forum.tezosagora.org/t/adaptive-inflation/4552).
The actual functional specs for the feature are copied here for reference:
> * Decree that tez at risk (i.e. held as a frozen bond by the baker) carries twice as much block creation right as a delegated stake.
> * Set the global inflation rate to be $(2/(100 x))^2$ where $x$ is the global fraction of tez staked
> * ship it in a protocol upgrade, disabled by default. Then, if 50% of blocks signal, and 80% of those signal in favor for 5 cycles at any point in the future, it kicks in.
### Rewards Account Selection for Bakers
This was [discussed on Tezos Agora](https://forum.tezosagora.org/t/reward-account-selection-for-bakers/4828), has positive feedback from the baker community. It makes operating a public baker easier by eliminating manual steps:
* move of funds from baker to payout address,
* set deposit limits to avoid running out of funds to pay out when full
### Transaction receipt
This allows very fast finality in most cases by letting a baker give a receipt promising to include a transaction (under penalty of slashing).
Arthur described it on youtube at the end of [this video](https://www.youtube.com/watch?v=pgcPhmg5IaM&t=594s). There was also a [recent discussion of it](https://tezos-dev.slack.com/archives/GB0UR34N8/p1670882620423149) on the devteam channel.
Choose between a lightweight and heavyweight method:
* **lightweight method**: offer a primitive in michelson to allow a smart contract to check whether a transaction was included in a specific block level.
Bonding, receipt checking and slashing is handled by the smart contract: for example the [Flashbake registry contract](https://flashbake.xyz/docs/flashbake-registry) can be extended to do it.
An [old, unmerged implementation](https://gitlab.com/tezos/tezos/-/merge_requests/2431) exists but it is incomplete as it does check which baker included an operation.
* **heavyweight method**: introduce a new denunciation operation taking the receipt and block as proof that the operation was not included. The slash is deduced from the security deposit of the baker.
### Hash a human description of the proto proposal as part of the proto proposal
Described [here](https://gitlab.com/tezos/tezos/-/issues/4274).
This increases Tezos decentralization by allowing proto drops to be accompanied by a description or manifesto while remaining anonymous if desired.
Sites such as tezos agora or tezocracy.xyz may display this text (with appropriate spam filtering measures in place).
### Remove Nonce Penalties
Now that we have VDF, it does not matter anymore whether bakers reveal their nonces or not.
Do not punish them for failing to reveal their nonce, unless the VDF was not published either.
This is a better alternative than implementing [deterministic nonces in the baker](https://gitlab.com/tezos/tezos/-/merge_requests/5243): switching to a new baking machine and losing the nonce is now inconsequential, so having non-deterministic nonces is no longer an issue in most cases.
### Store delegator snapshot balances in the Cycle object in the context
Discussed on [Agora](https://forum.tezosagora.org/t/baking-with-just-the-context/4861#reward-distribution-using-the-current-context-2).
Most reward distribution uses TzKT as backend. For better decentralization, payouts should be easy with just the node's RPC.
In order to pay rewards with a node's RPC, it is necessary to have access to all delegators balances at snapshot time for the cycle being paid. The snapshot in question is 7 cycles before payout occurs. Payout software queries the context at the snapshot block's level to access this data. Hence payouts with a rolling node is not possible as garbage collection will have deleted the contexts from 7 cycles ago (unless extra cycles are kept which is unpractical).
Keeping a copy of this data in the Cycle object of the context removes this constraint and allows easy public baking with a rolling node.
## Shell
### Pure RPC Baking
Now that we have [1M](https://gitlab.com/tezos/tezos/-/milestones/23#tab-issues), the baker includes any prechecked operation received from the node. But it still tries to apply the block before sending it out, which requires file-level access the context. This was kept as a safety measure.
Remove the safety net and blindly send a block with 1M-compliant operations in any order when it's ready.
This is an operational win as baker and node can now be two separate services communicating only by RPC (similar to the Ethereum execution client and validator). It removes a common pitfall in baking setups. It makes operation in container cluster environments simpler.
### RPC
#### Fuse octez-node and octez-proxy
Standalone proxy leaves too much responsibility to the user, this is one of the reasons (I think) why it is not widely used.
The tezos-node user has the expectation that performing a reasonable amount of RPC queries will not compromise core functions of the node (such as syncing with the chain).
Deprecate octez-proxy as a standalone executable. Instead, the node should spawn a RPC child process that takes care of all RPC queries. It operates largely independenly of the node itself (although some operations such as injection require a mutex between the two).
#### Pagination
Even with a child RPC process, queries that can completely lock the RPC service should not exist. Today they do:
- recent tests with octez 15.1 on mainnet show **1min15sec** for endorsing rights query for one baker for a cycle.(see [issue]( https://gitlab.com/tezos/tezos/-/issues/1614)). Querying block by block works, but is not an acceptable workaround due to the large number of blocks per cycle.
- baking rights query seems greatly improved (less than 2 seconds)
- querying all contracts (`context/contracts`) takes **1min53 on ghostnet**, **45sec on mainnet** (tests done with octez 15.1)
Heavy requests must be pagingated: you can only query pages of baking rights or contract addresses, with the max page size set to a value known to not freeze the node for too long (say, over a second).
#### Generic Proxying
A RPC node operator should be able to set up a generic proxy on top to reduce the load on octez nodes.
We need a config flag than, when set, will cause the RPC service to include headers for consumption of an upstream proxy server such as nginx or varnish.
*Context Queries by block number/block hash*
These queries are valid forever. HTTP cache headers should be emitted to indicate that these URLs may be cached for arbitrary periods (`Date` and `Expires` are mandatory and `Cache-Control` is optional).
*HEAD context queries*
A reorg may happen at any time so the node may not claim that these results are valid for any number of seconds.
However the node may still facilitate caching of HEAD (or HEAD~n) block queries by giving compatible answers to HTTP request headers [If-Modified-Since](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since) and [If-None-Match](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match). When you use nginx and configure it with `proxy_cache_revalidate` it will automatically send backend requests to check freshness of cached data with either (or both) of these request headers.
The node replies with an `Etag` response header which is unique string identifying the resource. The block hash is an unique choice for Etag.
The node can handle such queries much more efficiently than regular queries because if nothing changed, the response can be empty.
#### Reply 410 Gone instead of 404 Not Found for garbage collected blocks
This allows a load-balanced setup with rolling nodes and a lesser number of archive nodes. It is possible to get a 404 with a context query (for example if a bigmap key is unset), but there is currently no way to tell this apart from the block being gone from storage. A 410 response code allows redirects to archive node of only queries on a gone block - in nginx this would be done with `proxy_pass`.
Note that the node would also reply 410 to a block that *is not present yet*, but that's OK: the goal is to try a running node first, then if the entire block cannot be found, try archive.
## Networking
The behavior of Octez network is unusual, especially when it comes to binding to an interface:
* [issue 1](https://gitlab.com/tezos/tezos/-/issues/4403): to enable external RPC traffic, it is necessary to pass the external ip address to `--listen-addr` (`0.0.0.0` does not work).
* [issue 2](https://gitlab.com/tezos/tezos/-/issues/260): octez wants to bind to an external ip address in order to listen to p2p traffic on it. When using a third-party load balancer (cloud service or physical machine), this can not work.
These are serious long-standing networking bugs that should be fixed as part of the [p2p maintenance](https://gitlab.com/tezos/tezos/-/milestones/77#tab-issues) milestone.