Prysm V2 Repository Structure Proposals

# Prysm V2 Repository Structure Proposals ## Objective To define a sensible repository structure for Prysm for version 2.0.0 that aligns with best practices, developer ergonomics, and simplicity to become a next generation Ethereum client. We will aim to make Prysm an example of an awesome Go repository. ## Background Currently, Prysm's repository structure is a mix of doing things ourselves being guided by Bazel and also a mix of some transition into more of a "standard" Go project structure. After [The Merge](https://ethereum.org/en/eth2/merge/), Prysm will become far more critical software than it is today. It is possible a huge chunk of the _Ethereum_ network will be running Prysm for consensus. Today, go-ethereum is under the highest scrutiny as it controls such as large portion of Ethereum, and it must be as robust as possible to avoid DoS attacks. Moreover, many developers build on go-ethereum as a great reference for writing complex yet safe Go code. Prysm will also be at that level soon, and we must be prepared for it. ## The Problem Two major problems solved by having a great structure for an open source code repository are: 1. Maintainability: the simpler the repository is structured, the easier it is to refactor, to create new features, and for new developers to become productive and find things in the repo 2. Developer Experience: many contributors and/or projects will choose to build on Prysm, import packages from Prysm as third-party dependencies, extend Prysm, and more. Making sane decisions about how to structure things pays off in the long-run, and makes our repository a role model for other Go projects We believe these are worthwhile goals and the right time to talk about them as we are still on the verge of launching Prysm v2.0.0 and before the merge of Ethereum with the Beacon Chain. ## An Exploration Into Go "Standard" Project Layouts Go repository structure has been a topic of much debate recently, as the language does not really impose a requirement on how packages should be structured save for a few exceptions. A very popular repository in the Go community is [golang-standards/project-layout](https://github.com/golang-standards/project-layout), which as gathered a lot of controversy. The suggested looks something like this: ``` configs/ cmd/ docs/ internal/ pkg/ scripts/ tools/ test/ vendor/ web/ ``` In general, it advocates for putting most things that are **not** internal into a `pkg/` folder, which is similar to the `shared/` folder in Prysm. `pkg` folders in this structure could end up having dozens of unrelated packages in there. This practice has recently come under fire. Russ Cox, a principal engineer at Google on the Go team, even [commented](https://github.com/golang-standards/project-layout/issues/117) the repository is _far from being a standard_. Russ mentions: > For example, the vast majority of packages in the Go ecosystem do not put the importable packages in a pkg subdirectory. More generally what is described here is just very complex, and Go repos tend to be much simpler. Russ is not alone in this opinion. Multiple, quite prominent individuals in the Go community have posted similar takes: - [Simple Go project layout with modules](https://eli.thegreenplace.net/2019/simple-go-project-layout-with-modules/) by Eli Bendersky from Google - [Avoid package names like util](https://dave.cheney.net/2019/01/08/avoid-package-names-like-base-util-or-common) by Dave Cheney The idea behind a `pkg/` or `shared/` folder is that we can throw in any miscellanous, small Go package in there that doesn't really have a great home otherwise. For example, `bytesutil` is used by both the validator and beacon node code, which is why we put it in the `shared/` folder in our case. Objectively, `pkg/` is not an official standard, and does not do anything to the project aside from a cosmetic improvement at the top-level directory. A comment in the [golang-standards/project-layout](https://github.com/golang-standards/project-layout/issues/10#issuecomment-504158563) repository had a list of great pros and cons of using a `pkg/` or `shared/` folder in Go projects: ``` Pros: - Fewer files in a top level directory, less friction to understanding a foreign codebase. Standards help newcomers navigate. - Fewer homes for source files is generally good, if only proven empirically. We see this in many other languages. Personally, I have a strong distaste for source code being mixed in with .sh, .json, .md files, etc. - Accepts the reality that projects/repos house more than just source code, things such as: default configuration, install scripts, etc. - Consistency of code organization across a broader community than yourself or organization. Cons - Not official guidance, so there's always a chance for this to be up for debate. Detracts from time otherwise spent productively. - Keeping package names 100% meaningful, adding "pkg/" to all of them is repetitive and noisy. Although this would go away, if Go officially supported this. If I was writing an open-source library, the pkg layout would probably be a non-starter for me. - Moves away from the elegance of packages in Go and their naming seamlessly merging into your folder/repo hierarchy. - Objectively unnecessary: the only benefit is cosmetic. ``` One of the important points is that without a `pkg` directory, especially in a large monorepo, you end up with a ton of folders at the top-level, making it hard for newcomers to navigate the codebase and see what's going on. Another benefit of using a `pkg` or `shared` folder is that it implies the packages in there have **no dependency** on others in the repo. For example, it is understood that `pkg/bytesutil` is standalone. In a project structure where these utility packages are more semantically grouped with other packages, this distinction becomes less clear. On the other hand, there are many great arguments for avoiding a `pkg/` or `utils/` pattern. Instead of having a lot of small utility packages, we should instead consider the following from Dave Cheney: > Packages with names like base or common are often found when functionality common to two or more related facilities, for example common types between a client and server or a server and its mock, has been refactored into a separate package. Instead the solution is to reduce the number of packages by combining client, server, and common code into a single package named after the facility the package provides. > For example, the net/http package does not have client and server packages, instead it has client.go and server.go files, each holding their respective types. transport.go holds for the common message transport code used by both HTTP clients and servers. > Name your packages after what they provide, not what they contain. There are a lot of great points on either side, as and always, the final decision depends on the application we are building. In Prysm's case, we'll explore what an alternative to the `shared/` folder could look like in the next sections. ## First Proposal: Throw Most Things Into `Internal` Estimate: less than 1 day, mostly moving stuff around with no logic changes and no ripping apart packages' internals ### Current Structure The current state of our repository is as follows: ``` beacon-chain/ cmd/ contracts/ endtoend/ fuzz/ proto/ scripts/ shared/ slasher/ spectest/ third_party/ tools/ validator/ ``` We'll go over each directory: **beacon-chain** Contains all our beacon chain code, including core/ functionality, services, database, and all the logic required to run a node in eth2 **cmd** Go standard for placing CLI entrypoints. This folder contains entrypoints for running the beacon-chain, validator, client-stats binaries. **contracts** Contains our own copy of the Validator Deposit Contract in solidity that we use as part of our endtoend test suite and for some unit tests in the Prysm repository that involve deploying a contract with a simulated backend and running tests and against checking for deposit logs. **endtoend** Contains our end-to-end test suite for Prysm, with evaluators and policies for checking important properties of our application. **fuzz** Contains definitions and inputs we want to fuzz test using [beacon fuzz](https://github.com/sigp/beacon-fuzz) as a differential fuzzer for eth2. **proto** Contains all our protobuf definitions for types and endpoints used in Prysm. **scripts** Contains bash scripts for deployments, regenerating mocks, generated code, testing, etc. **shared** Contains a ton of small, utility packages used throughout Prysm, such as bytesutils, testutils, mathutils, and more. **slasher** Slasher implementation for eth2. It is _deprecated_ and will instead be part of the beacon-node binary behind a flag in Prysm v2. **spectest** All specification test runners for eth2. Test fixtures and expected results are retrieved from [eth2.0-spec-tests](https://github.com/ethereum/eth2.0-spec-tests/tree/master/tests). **third_party** Bazel BUILD files for our third-party cryptography libraries, fuzz testing, cross-compilation, and git patches applied at build time to certain dependencies. **tools** A large set of small binaries to accomplish specific tasks such as inspect a Prysm beacon node database, generate a genesis state from some private keys, deploy the Eth2 Validator Deposit Contract, a p2p bootnode, and more. **validator** An implementation of a validator client to be used with a Prysm beacon node. ### Proposed Structure ``` cmd/ internal/ beacon-chain/ validator/ shared/ testing/ fuzz/ spectest/ endtoend/ proto/ scripts/ third_party/ ``` The goal of this structure is to only maintain the bare minimum users need to run Prysm itself. This means a lot of peripheral code can go in a different repository and not be exposed to users when compiling or running a beacon/validator. We have **limited time** to ship v2, and a complete overhaul is likely not going to happen this month. In the short-term, we want to simplify our structure a lot. Having a testing/ package helps with grouping by domain, and also allows us to move smaller subpackages such as existing testutils into it. Additionally, we remove the `contracts` package in favor of moving it to prysmaticlabs/periphery on Github. We also move all our `tools/` into the periphery repository as they are not needed to run Prysm. ### Major Problem: Breaking Change With No Upgrade Path The **biggest** problem with using `internal` is it introduces breaking changes with no upgrade path for third-party callers of Prysm. For example, if we move all of shared/ into internal/, people using Prysm will not be able to use those packages again once they update their go.mod version to v2.0.0 Prysm. This is a huge problem as we block anyone depending on our project to update it from now on. As such, we determined the first proposal is likely **unfeasible**. ## Second Proposal: Grouping of Packages at Top-Level For our `shared/` package, we could restructure import paths according to grouped functionality. For example, instead of `bytesutil`, we could have `encoding/bytes`, which is the convention the Go project recommends. We could also have an `encoding/ssz`. For cryptography, we could have `crypto/rand`, and `crypto/bls`. This is more akin to what we would expect Go projects to expose. This would mean we have a lot more folders at top-level which is what we normally see in idiomatic Go projects such as the [Go](https://github.com/golang/go) language repository itself or [go-ethereum](https://github.com/ethereum/go-ethereum/). ### Proposed Structure This proposal is only a renaming and grouping of packages and does not deal with refactoring package internals. As such, we estimate it can be completed in a short amount of time. The scope is mostly about grouping the packages in `shared/` under more idiomatic folder paths. Things not covered: 1. shared/cmd has flags that should be put in the top-level cmd/ package instead 2. A few util packages are flattened into a more idiomatic package name, such as putting code from different packages in timeutil, slotutil, into a `time` package 3. No refactoring being considered from within beacon-chain/ or validator/, but this can be the next step if we have time 4. The shared/interop package should probably be in a better place, maybe named better and grouped with the interop-cold-start package from the beacon-chain ``` api/ gateway/ pagination/ grpc/ arc/ abool/ multilock/ async/ event/ every.go every_test.go debounce.go scatter.go # from mputils beacon-chain/ ... cmd/ config/ features/ cache/ lru/ container/ attestation/ aggregation/ slashings/ block/ slice/ queue/ trie/ crypto/ bls/ hash/ keystore/ rand/ encoding/ bytes/ ssz/ # rename of htrutils and sszutils io/ file/ stdin/ prompt.go math/ monitoring/ backup/ clientstats/ prometheus/ tracing/ journald/ logs/ progress/ net/ http/ ip/ proto/ runtime/ debug/ maxprocs/ prereqs/ version/ tos/ hack/ allscripts.sh allyaml.yaml tools/ third_party/ testing/ bench/ fuzz/ spectest/ endtoend/ mocks/ time/ slots/ mclock/ validator/ ... ```