vbuterin

@vbuterin

Joined on Jun 18, 2017

  • One powerful technique when working with polynomials is taking a set of evaluations of that polynomial and using that directly to compute an evaluation at a different point. For example, if $P(x)$ is a degree-100 polynomial, you can use the evaluations $P(0), P(1) ... P(100)$ to directly compute $P(101)$, or $P(1247130)$, in $O(N)$ time, without ever reconstructing the polynomial. This post describes how this is done See also, this earlier and more detailed paper by Oxford researchers on the topic: https://people.maths.ox.ac.uk/trefethen/barycentric.pdf smashy road General technique Let $P(x)$ be the polynomial, and $x_1 ... x_N$ as the set of points for which we have evaluations of $P(x)$. Call these evaluations $y_1 ... y_N$. Think of $P(x)$ as a linear combination $\sum_i y_i L_i(x)$, where $L_i(x)$ is the polynomial that equals 1 at $x_i$ and 0 at all the other x-coordinates in the set. Now, let us explore how these $L_i$ can be computed. Each $L_i(x)$ can be expressed as:
     Like 22 Bookmark
  • This post describes some reasons why the SELFDESTRUCT opcode brings more harm than good to the Ethereum ecosystem, and so should be neutered or removed in some way. To deal with the existing contracts that use SELFDESTRUCT, I propose some ways to eliminate the harmful aspects of SELFDESTRUCT with minimal disruption. A history: SELFDESTRUCT is not necessary SELFDESTRUCT (originally called SUICIDE) was introduced very early on in Ethereum's history; in fact, it was present even in this pre-announcement "spec" of the Ethereum protocol from December 2013. At the time, there was little rigorous thought being done about long-term state size management. However, there was (in my [Vitalik's] head) a general impression that to prevent state from filling up with vestigial garbage without limit, we need it to be possible for any object that can be created to be destroyed. Externally-owned accounts (EOAs), the thinking goes, would automatically be destroyed when their balance drops to zero, and contracts could have a self-destruct clause in the code to delete themselves when they are no longer needed. A gas refund would encourage them to do this. In January 2014, Andrew Miller pointed out something that in retrospect was face-palmingly obvious: in the Dec 2013 spec, EOAs were vulnerable to replay attacks. If I had 100 coins and I sent you 10, you could simply republish this transaction on-chain ten times to clear my entire balance. This was quickly fixed with nonces. However, the addition of nonces removed all hope of EOAs being deleted: the nonce could not be reset to zero. In 2015, some schemes were proposed to get around this and potentially allow accounts that send away all of their ETH to be safely deleted. However, by then it was clear that almost no contract developers were actually using the self-destruct feature: figuring out when to self-destruct is too hard and the rewards are too little. By 2019-21, it has become clear that we need some other form of state management, whether rent or "expiring" long-untouched parts of the tree (aka "partial statelessness"). But if we have such a scheme, and it works, then we no longer need to care about giving contracts the ability to delete themselves voluntarily.
     Like 12 Bookmark
  • --- eip: TBD title: Optional access lists author: Vitalik Buterin (@vbuterin), Martin Swende (@holiman) discussions-to: TBD status: Draft type: Standards Track category: Core created: 2020-08-29 requires: EIP 2718; Gas cost increases https://notes.ethereum.org/@vbuterin/BkrNbeAfD
     Like  Bookmark
  • Two paths to a solution exist, and have existed for a long time: weak statelessness and state expiry: State expiry: remove state that has not been recently accessed from the state (think: accessed in the last year), and require witnesses to revive expired state. This would reduce the state that everyone needs to store to a flat ~20-50 GB. Weak statelessness: only require block proposers to store state, and allow all other nodes to verify blocks statelessly. The good news is that recently, there have been major improvements on both of these paths, that greatly reduce the downsides to both: Some techniques for how a ReGenesis-like epoch-based expiry scheme can be adapted to minimize resurrection conflicts Piper Merriam's work on transaction gossip networks that add witnesses to be stateless-client-friendly, and his work on distributed state storage and on-demand availability Verkle trees, which can reduce worst-case witness sizes from ~4 MB to ~800 kB (this is definitely small enough, because existing worst-case blocks that are full of calldata are already 12.5M / 16 ~= 780 kB and we have to handle those anyway). See slides, doc, code.
     Like 2 Bookmark
  • Out of scope Data availability sampling and other sharding-related changes Light client processing The merge Revamp effective balance / current balance accounting Remove the current concept of effective vs current balance. Instead, keep (i) the balance inside the validator struct, and (ii) a counter of "duties fulfilled". Accounting for (ii) would be very simple: did you get the correct source? Add 1. Correct target? Add 1. Timely? Add 1. Every 256 epochs (can be staggered), we update the validator balance based on the duties fulfilled count, and reset the duties fulfilled count. The inactivity leak is also implemented at this stage. Note that this comes at the cost of some coarseness: if an inactivity leak starts halfway through a period, offlineness before and after the start of the leak is equally penalized (if we want to remove this entirely, we could also have two duties counters, one for leaking periods and one for non-leaking periods).
     Like 1 Bookmark
  • Along with proof of stake, the other central feature in the eth2 design is sharding. This proposal introduces a limited form of sharding, called "data sharding", as per the rollup-centric roadmap: the shards would store data, and attest to the availability of ~250 kB sized blobs of data. This availability verification provides a secure and high-throughput data layer for layer-2 protocols such as rollups. To verify the availability of high volumes of data without requiring any single node to personally download all of the data, two techniques are stacked on top of each other: (i) attestation by randomly sampled committees, and (ii) data availability sampling (DAS). ELI5: randomly sampled committees Suppose you have a big amount of data (think: 16 MB, the average amount that the eth2 chain will actually process per slot, at least initially). You represent this data as 64 "blobs" of 256 kB each. You have a proof of stake system, with ~6400 validators. How do you check all of the data without (i) requiring anyone to download the whole thing, or (ii) opening the door for an attacker who controls only a few validators to sneak an invalid block through? We can solve the first problem by splitting up the work: validators 1...100 have to download and check the first blob, validators 101...200 have to download and check the second blob, and so on. The validators in each of these subgroups (or "committees") simply make a signature attesting that they have verified the blob, and the network as a whole only accepts the blob if they have seen signatures from the majority of the corresponding committee. But this leads to a problem: what if the attacker controls some contiguous subset of validators (eg. 1971....2070)? If this were the case, then even though the attacker controls only ~1.5% of the whole validator set, they would dominate a single committee (in this case, they would have ~70% of committee 20, containing validators 2001...2100), and so they would be able to control the committee and push even invalid/unavailable blobs into the chain. Random sampling solves this by using a random shuffling algorithm to select the committees. We use some hash as the seed of a random number generator, which we then use to randomly shuffle the list [1..6400]. The first 100 values in the shuffled list are the first committee, the next 100 are the second committee, etc.
     Like 31 Bookmark
  • WIP The purpose of this document is to describe in more detail a proposal for how phase 1 can be structured based on a "data-availability-focused" approach. The main addition to the beacon chain will be a Vector of ShardDataHeader objects, one for each shard. A ShardDataHeader is a small object which represents a large amount of underlying data (roughly 0-512 kB in size). A block is only valid if the underlying data that the ShardDataHeader points to is available - that is, it has been published to the network and anyone can download it. However, to preserve scalability, clients will not try to download the full underlying data of every ShardDataHeader to verify the block. Instead, they will verify that the data is available using an indirect technique called data availability sampling. Parameters Parameter Value Description
     Like 12 Bookmark
  • --- eip: <to be assigned> title: Long-term history accessibility author: Vitalik Buterin (@vbuterin) discussions-to: <URL> status: Draft type: Standards Track category: Core created: 2020-09-03 ---
     Like 1 Bookmark
  • This document lays out some possible options for alternatives to EIP 3298 (removal of refunds) that try to preserve some of the legitimate goals and positive incentives that refunds provide. Why perhaps not total removal? Total removal is the simplest option and it accomplishes the desired effects, but it also has some downsides that have been identified by the community. The two strongest critiques of total removal of refunds that I have seen are: Zero -> nonzero -> zero storage use patterns There are two major examples of this: Mutex locks on contracts, where before contract A calls contract B, it first flips a storage slot to some nonzero value. When contract A itself is called, it checks that that storage slot is empty; if it is not, the call fails. This prevents re-entrancy. When the call finishes, the storage slot is flipped back to zero. Approving an ERC20 token for exactly the desired amount and then immediately spending that amount.
     Like 1 Bookmark
  • Suppose you have a polynomial $P$, and the sample proofs $Q_i = \lfloor \frac{P}{X^{16} - \omega^{i * 16}} \rfloor$. Goal: verify all proofs. Note that for all $i$, $Q_i * (X^{16} - \omega^{i * 16}) = P - I_i$, where $I_i$ is the $deg < 16$ interpolant of the i'th subgroup. We can combine all of these equations with a random linear combination: $\sum Q_i * (X^{16} - \omega^{i * 16}) * r_i = \sum (P - I_i) * r_i$
     Like 4 Bookmark
  • In blockchain communities, there are more public goods than private goods Open source client code Protocol research Documentation Community building How do we fund them? Two questions in funding public goods
     Like 6 Bookmark
  • Current Merkle-Patricia tree Current Merkle-Patricia Tree: Properties Property Value Single branch length 128 * log(N)
     Like 2 Bookmark
  • Idea: proposed alternative long-term design for the beacon chain Timing: after the merge, after sharding, after stateless clients Goal: fix problems in status quo hybrid fork choice / finality mechanism by going straight to finality Goal 1: make single-slot reorgs even harder Nakamoto Gasper
     Like 1 Bookmark
  • What is a fork choice rule? A function, evaluated by the client Input = the set of blocks and other messages that have been seen Output = the "canonical chain" Example: Nakamoto PoW (highest-total-difficulty) There are three major types of fork choice rules
     Like 2 Bookmark
  • Written 2021 Feb 12 One of the longest and still unresolved challenges in the Ethereum protocol is that of how to manage the problem of growing state. Many actions in Ethereum (creating an account, filling a storage slot, sending ETH to a new account...) add objects to the state that all full nodes must store in order to be able to verify or produce new blocks. These actions require a one-time gas cost to be paid by the sender of the transaction, but impose permanent ongoing costs on the network, and the nodes that need to store this new data (and future nodes that need to download it when syncing). This is a major imbalance in the system's design, and risks the Ethereum system becoming more and more difficult to use over time as the state fills with no-longer-needed "junk data". The purpose of this document is to describe in more detail what the problem is, and what are some of the paths toward solving it. If we successfully implement a solution, it could pave the way to safely making significant increases to the block gas limit. This document is describing an area of research that is still a work-in-progress; new and better ideas and better tradeoffs could potentially be discovered at any time. Introduction: What is the problem? State refers to information that a node must hold in order to be able to process new incoming blocks and transactions. It is typically contrasted with history, information about past events which can be held for later rebroadcasting and archiving purposes, but is not strictly needed to continue to process the chain.
     Like 43 Bookmark
  • Special thanks to Micah Zoltu for suggesting some of these Preamble: why the merge is our last good opportunity to remove things, and why we should use it We know far more about smart contract and blockchain protocol design in 2020 than we did in 2013-15. As a result, there are many features in Ethereum that were introduced during those earlier years, that if we were building Ethereum from scratch in 2021 we would not introduce again. However, removing features from a running chain with a running ecosystem is harder than not adding them to a new system. Some of these "vestigial features" are harmless. Others can be safely and slowly removed or improved over time. Still others are too deeply integrated into too many applications to contemplate ever changing (eg. the 256-bit EVM might be an example). And on the other hand, there are features that either have been removed or improved already, or will be soon (improvements to the state tree, and replacement of RLP with SSZ, come to mind). But there are also some cases in the middle: features that cause a medium amount of ongoing harm to the ecosystem due to their complexity, and which can be removed, but there is a small but nonzero amount of risk in doing so. If we remove these features, then a small number of applications may break. But if we do not, they will continue being a drag on the ecosystem until the end of time. As is often the case in situations with "short term pain, long term gain", it is easy to underestimate the magnitude of the long term gains. Particularly in our case, the fact that "the code is already written" to handle complexity makes it feel like there is no cost in keeping it. But in reality, there are two significant costs:
     Like 6 Bookmark
  • Special thanks to Dankrad Feist, Karl Floersch and Hsiao-wei Wang for feedback and review. Perhaps the most powerful cryptographic technology to come out of the last decade is general-purpose succinct zero knowledge proofs, usually called zk-SNARKs ("zero knowledge succinct arguments of knowledge"). A zk-SNARK allows you to generate a proof that some computation has some particular output, in such a way that the proof can be verified extremely quickly even if the underlying computation takes a very long time to run. The "ZK" ("zero knowledge") part adds an additional feature: the proof can keep some of the inputs to the computation hidden. For example, you can make a proof for the statement "I know a secret number such that if you take the word 'cow', add the number to the end, and SHA256 hash it 100 million times, the output starts with 0x57d00485aa". The verifier can verify the proof far more quickly than it would take for them to run 100 million hashes themselves, and the proof would also not reveal what the secret number is. In the context of blockchains, this has two very powerful applications: Scalability: if a block takes a long time to verify, one person can verify it and generate a proof, and everyone else can just quickly verify the proof instead Privacy: you can prove that you have the right to transfer some asset (you received it, and you didn't already transfer it) without revealing the link to which asset you received. This ensures security without unduly leaking information about who is transacting with whom to the public.
     Like 4 Bookmark
  • The purpose of this document is to crystallize and explain a series of proposals for incremental changes to the beacon chain specification that simplify the protocol, increase efficiency and improve the protocol's economic properties. Basic reward accounting reform Key goals: simplify attestation processing, reduce costs and shift costs from end-of-epoch to mid-epoch, set the stage for future reforms Key ideas: remove PendingAttestations and instead store flags per validator in shuffled order Target date: HF1 (aka the light client support fork) Currently, when we receive an attestation, we defer almost all processing (except basic verification) to the end of an epoch. We store a PendingAttestation object that contains the attestation data (not including the signature), and at the end of an epoch we take all of the PendingAttestation objects and run the computations for who participated, which epochs get justified and/or finalized, and what each participant's reward or penalty should be.
     Like 1 Bookmark
  • --- eip: <to be assigned> title: SET_INDESTRUCTIBLE opcode author: Vitalik Buterin (@vbuterin) discussions-to: <URL> status: Draft type: Standards Track category: Core created: 2020-08-30 ---
     Like  Bookmark
  • --- eip: <to be assigned> title: Account Abstraction author: Vitalik Buterin (@vbuterin), Ansgar Dietrichs (@adietrichs), Matt Garnett (@lightclient), Will Villanueva (@villanuevawill), Sam Wilson (@SamWilsn) discussions-to: <URL> status: Draft type: Standards Track category: Core created: 2020-08-15 requires: 2718, 1344
     Like  Bookmark