If I Had a Reset Button...

If you knew what you know now, and you were starting a new CL from scratch, what structures would you use? - Potuz

Answers by @rauljordan

Language of Choice

Would I still use Go if I were starting a client by myself? No – I would 100% rewrite everything in Rust for a variety of reasons:

No gc: In Prysm, latency is critical, memory is critical, and management of the lifecycle of a complex application is currently out of our control as gc can only be tuned so much in Go
Memory-safety first: in a paradigm that forces us to think about ownership while giving us full control
Extreme performance: Rust is fast, and can be much faster than Go. The runtime is not in the way, and gc does not exist. Further, we can write even more extremely-performant Rust code if we want to abandon some of its guardrails. That is, we can still use unsafe Rust in key paths, but compared to unsafe Go, that flavor of Rust can still be sound (read more about this here)
Rust is, unfortunately, superior at many things that are critical for client development. However, the Lighthouse code is not simpler than Prysm's and I would argue they have even more technical debt than we do after studying their codebase. Rust, by itself, does not solve technical debt

Would I choose Go if we were to reset Prysm with the existing team? Yes – I would choose Go 100% and here's why:

We are nowhere close to maximizing the benefits we can get out of Go as a language due to certain technical debt burdens
Generics have improved significantly, allowing us to reduce code debt
We have accummulated incredible domain knowledge in Go. We maintain one of the most imporant open source Go projects in the world with real production usage and real user feedback in a public environment. There is a lot we can apply from our skillset if we were to pull off a "great reset"
Yes, GC is a burden, but there are many creative ways for it to no longer become a problem if we really think about it! Potuz already had some suggestions of data structures that would play a lot better with GC
We understand where we went wrong in designing Prysm in Go the first time. The second time would be way better
Go fits the bill for small-medium sized teams building networked software in a way that is consistent. It really matches what Google cared about when building this language, and with great software practices, onboarding new devs becomes a breeze if we write good Go code

Endgame

There are two schools of thought that dominate client development today:

Be as correct as possible, be rock-solid, be like geth
Be experimental and squeeze out as much performance as possible

However, I believe our team culture fits more into camp (1). We have significant usage, and care about making staking rock-solid for the average user. We learn more from having constant feedback from real people than from no one using our code. Being performant at the cost of diverging from a spec is problematic because it becomes unmaintainable.

No one, but the person who wrote the feature, knows how to debug it because it reads nowhere close to what the spec says it is supposed to do. It would be much easier for knowledge silos to form.

Here's what we competed on with other teams before, when us and others were less experienced:

Being first at things
Having good docs
Having X feature that Y client does not have
Being faster at X feature than Y client is

Here's what I think the right things to compete on are:

Maintainability
Being "rock-solid" for the average staker
Being extensible and instrumentable

Why? (1) Maintainability leads to happy users, happy developers, and a happy team. It makes the code a joy to work with. As a personal finance analogy: instead of paying off our debts, we can finally start saving, and potentially reinvesting those gains.

(2) If the average user has an excellent experience, Ethereum staking grows in popularity, and therefore Ethereum becomes more secure. We learn a lot more from having people use our software, report bugs, and develop positive feedback cycles than we do from no one downloading our code. We learn from responsibility and wield it with care.

(3) There's no secret to optimizing code other than the feedback cycle of gathering data directly, seeing a large flame in the flame graph, and making it smaller. If our code is instrumentable where it matters, we can attribute bugs or regressions to specific features. We can figure out which latent variable is the real problem. Moreover, by making our code extensible, such as by making it easy to add new APIs, we encourage a lot more people to poke around and find bugs in our code!

My Guiding Principles if We Could Reset

Some duplication is better than the wrong abstraction

Credits to Kasey on this one. When I read Kasey's code, I can feel how each design decision was thought about from common sense, and sometimes the code feels painfully simple and that's a great thing

Don't build things you won't need

We don't need a one-size-fits-all serialization tool when there are < 20 key data structures. Write it by hand.
We don't need a complex net of interweaving locks, mutexes, and crazy channels: just encapsulate a small struct with a lock

Be as abstract as possible in the internals of the codebase, be concrete at the entrypoints of the codebase

Being abstract in the internals is fantastic because it means the foundations of the codebase don't move around as much. If the internals of our code works with any kind of beacon block, it's much better than if we have to be fork-aware
We should save concrete details to the entrypoints, such as where the CLI main() is, and in the node initialization. The deeper we go, we should think in the abstract

Prevent footguns by using SOLID software engineering

"I am my own worst enemy" is a line that has always stuck with me. I now think hard about hidden coupling in the code I write that will prevent me from refactoring it in the future easily, or that can cause me to make dangerous mistakes. Not using good software practices tends to create an environment that pushes problems under the rug, and eventually have functions that no one even dares to read
Use the type system more to our advantage

Do it right the first time

If something takes 2 weeks to do it right instead of 3 days, take those 2 weeks

Make developers want to use the codebase as a dependency

I regret people aren't importing Prysm as a dependency
I regret our choice of LICENSE
I regret that we didn't make it easier for ecosystem developers to go get our code, call our endpoints with Go, interact with our database with Go, make it easy to add APIs, gather data with Prysm, build networked applications with Prysm's networking stack

Make illegal states unrepresentable

Use the type system a lot more to our advantage. Prevent certain states from ever happening. For example, if we want a function that only operates on gossip verified blocks, make a gossip verified block type

Make code more futureproof

As a counter: how can we anticipate all scenarios?

Avoid tight-coupling of dependencies
Make code more pure
Make test setups a breeze to simplify refactors

Specifics

Would you use a custom allocator for the beacon-state/validators/attestations structures?

If I were writing this in Rust, no need. If it were in Go, I would leverage the sync package more, and leverage sync.Pool better or other primitives. I would have a better picture of how GC-efficient our data is, how long it must live, and of memory-locality when designing them

Would you use a struct like object for them or would you use a functional style object?

I would use functional code where possible. Prevent race conditions by just making things immutable, allowing for scratch pads or things that could be thrown away upon failures. I would focus on correctness first, however.

One idea is to have structs be represented as their tree-structure (serialized bytes) internally, with "views" into their fields, so that HashTreeRoot is trivial. Potuz also recommended a few approaches using red/black trees. Journal structures are also powerful in which we focus on diffs between changes rather than the full thing itself.

I would explore more approaches that use "sharding" instead of one mutex being contended among all concurrent callers/writers.

What sort of thread management would you use?

Go runtime. If in Rust, I would use a modified version of Tokio with a few abstractions over its green threads

What sort of fork management would you use?

I wouldn't be afraid to use "reflect" a little more. I would prefer we use "super-structs" more with struct tags that can tell us how to do certain things depending on the fork-kind. I would encapsulate a lot more logic as methods on our block type, to avoid needing to have conditionals and switch statements in the middle of important business logic

Would you trade security and correctness for performance?

No. I would focus on more encapsulation, simplicity, and making code that is easier to reason about before worrying about performance. I think we have a lot more to gain of Prysm and Go regarding security/correctness before performance

How would you deal with the engine? Would you change the interaction with the engine if you knew that ALL your users are using MEV-Boost? would you optimize the builder code in detriment of the local code?

I would focus on the average staker first and making the software rock-solid for them. If the builder code and docs are way better than our users' docs and they think our support for them sucks, we might need to course-correct

Language of Choice

Endgame

My Guiding Principles if We Could Reset

Specifics

Read more

What Happens After Finality in ETH2?

Migrating Prysm to Slog

Running BOLD Challenges on Sepolia

Generalized History Commitment