> If you knew what you know now, and you were starting a new CL from scratch, what structures would you use? - Potuz Answers by @rauljordan ## Language of Choice Would I still use Go if I were starting a client by myself? No – I would 100% rewrite everything in Rust for a variety of reasons: - **No gc**: In Prysm, latency is critical, memory is critical, and management of the lifecycle of a complex application is currently out of our control as gc can only be tuned so much in Go - **Memory-safety first**: in a paradigm that forces us to think about _ownership_ while giving us full control - **Extreme performance**: Rust is fast, and can be much faster than Go. The runtime is not in the way, and gc does not exist. Further, we can write even more extremely-performant Rust code if we want to abandon some of its guardrails. That is, we can still use _unsafe_ Rust in key paths, but compared to unsafe Go, that flavor of Rust can still be _sound_ (read more about this [here](https://doc.rust-lang.org/nomicon/working-with-unsafe.html)) - Rust is, unfortunately, superior at many things that are critical for client development. However, the Lighthouse code is not simpler than Prysm's and I would argue they have even more technical debt than we do after studying their codebase. Rust, by itself, does not solve technical debt Would I choose Go if we were to reset Prysm with the existing team? Yes – I would choose Go 100% and here's why: - We are nowhere close to maximizing the benefits we can get out of Go as a language due to certain technical debt burdens - Generics have improved significantly, allowing us to reduce code debt - We have accummulated incredible domain knowledge in Go. We maintain one of the most imporant open source Go projects in the world with real production usage and real user feedback in a _public environment_. There is a lot we can apply from our skillset if we were to pull off a "great reset" - Yes, GC is a burden, but there are many creative ways for it to no longer become a problem if we really think about it! Potuz already had some suggestions of data structures that would play a lot better with GC - We understand where we went wrong in designing Prysm in Go the first time. The second time would be way better - Go fits the bill for small-medium sized teams building networked software in a way that is _consistent_. It really matches what Google cared about when building this language, and with great software practices, onboarding new devs becomes a breeze if we write _good_ Go code ## Endgame There are two schools of thought that dominate client development today: 1. Be as correct as possible, be rock-solid, be like geth 2. Be experimental and squeeze out as much performance as possible However, I believe our team culture fits more into camp (1). We have significant usage, and care about making staking rock-solid for the average user. We learn more from having constant feedback from real people than from no one using our code. Being performant at the cost of diverging from a spec is problematic because it becomes _unmaintainable_. No one, but the person who wrote the feature, knows how to debug it because it reads nowhere close to what the spec says it is supposed to do. It would be much easier for knowledge silos to form. Here's what we competed on with other teams before, when us and others were less experienced: - Being *first* at things - Having *good docs* - Having X feature that Y client *does not have* - Being *faster at X feature* than Y client is Here's what I think the right things to compete on are: 1. Maintainability 2. Being "rock-solid" for the average staker 3. Being extensible and instrumentable Why? (1) Maintainability leads to **happy users**, **happy developers**, and a **happy team**. It makes the code a joy to work with. As a personal finance analogy: instead of paying off our debts, we can finally start saving, and potentially reinvesting those gains. (2) If the average user has an excellent experience, Ethereum staking grows in popularity, and therefore Ethereum becomes more secure. We learn a lot more from having people use our software, report bugs, and develop positive feedback cycles than we do from no one downloading our code. We learn from responsibility and wield it with care. (3) There's no secret to optimizing code other than the feedback cycle of gathering data directly, seeing a large flame in the flame graph, and making it smaller. If our code is instrumentable where it matters, we can attribute bugs or regressions to specific features. We can figure out which _latent_ variable is the real problem. Moreover, by making our code extensible, such as by making it easy to add new APIs, we encourage a lot more people to poke around and find bugs in our code! ## My Guiding Principles if We Could Reset **Some duplication is better than the wrong abstraction** - Credits to Kasey on this one. When I read Kasey's code, I can feel how each design decision was thought about from common sense, and sometimes the code feels painfully simple and that's a great thing **Don't build things you won't need** - We don't need a one-size-fits-all serialization tool when there are < 20 key data structures. Write it by hand. - We don't need a complex net of interweaving locks, mutexes, and crazy channels: just encapsulate a small struct with a lock **Be as abstract as possible in the internals of the codebase, be concrete at the entrypoints of the codebase** - Being abstract in the internals is fantastic because it means the foundations of the codebase don't move around as much. If the internals of our code works with _any_ kind of beacon block, it's much better than if we have to be fork-aware - We should save concrete details to the entrypoints, such as where the CLI `main()` is, and in the node initialization. The deeper we go, we should think in the abstract **Prevent footguns by using SOLID software engineering** - "I am my own worst enemy" is a line that has always stuck with me. I now think hard about _hidden coupling_ in the code I write that will prevent me from refactoring it in the future easily, or that can cause me to make dangerous mistakes. Not using good software practices tends to create an environment that pushes problems under the rug, and eventually have functions that no one even dares to read - Use the type system more to our advantage **Do it right the first time** - If something takes 2 weeks to do it right instead of 3 days, take those 2 weeks **Make developers _want_ to use the codebase as a dependency** - I regret people aren't importing Prysm as a dependency - I regret our choice of LICENSE - I regret that we didn't make it easier for ecosystem developers to `go get` our code, call our endpoints with Go, interact with our database with Go, make it easy to add APIs, gather data with Prysm, build networked applications with Prysm's networking stack **Make illegal states unrepresentable** - Use the type system a lot more to our advantage. Prevent certain states from ever happening. For example, if we want a function that only operates on gossip verified blocks, make a gossip verified block type **Make code more futureproof** As a counter: how can we anticipate all scenarios? - Avoid tight-coupling of dependencies - Make code more pure - Make test setups a breeze to simplify refactors ## Specifics 1. **Would you use a custom allocator for the beacon-state/validators/attestations structures?** If I were writing this in Rust, no need. If it were in Go, I would leverage the `sync` package more, and leverage sync.Pool better or other primitives. I would have a better picture of how GC-efficient our data is, how long it must live, and of memory-locality when designing them 2. **Would you use a struct like object for them or would you use a functional style object?** I would use functional code where possible. Prevent race conditions by just making things immutable, allowing for scratch pads or things that could be thrown away upon failures. I would focus on correctness first, however. One idea is to have structs be represented as their tree-structure (serialized bytes) internally, with "views" into their fields, so that HashTreeRoot is trivial. Potuz also recommended a few approaches using red/black trees. Journal structures are also powerful in which we focus on diffs between changes rather than the full thing itself. I would explore more approaches that use "sharding" instead of one mutex being contended among all concurrent callers/writers. 3. **What sort of thread management would you use?** Go runtime. If in Rust, I would use a modified version of Tokio with a few abstractions over its green threads 4. **What sort of fork management would you use?** I wouldn't be afraid to use "reflect" a little more. I would prefer we use "super-structs" more with struct tags that can tell us how to do certain things depending on the fork-kind. I would encapsulate a lot more logic as methods on our block type, to avoid needing to have conditionals and switch statements in the middle of important business logic 5. **Would you trade security and correctness for performance?** No. I would focus on more encapsulation, simplicity, and making code that is easier to reason about before worrying about performance. I think we have a lot more to gain of Prysm and Go regarding security/correctness before performance 6. **How would you deal with the engine? Would you change the interaction with the engine if you knew that ALL your users are using MEV-Boost? would you optimize the builder code in detriment of the local code?** I would focus on the average staker first and making the software rock-solid for them. If the builder code and docs are way better than our users' docs and they think our support for them sucks, we might need to course-correct