Download as PDF (for docs.racklet.io)
Authors (in alphabetical order):
Status (as defined here): Implementable
Creation Date: 2020-12-10
Last Updated: 2020-06-08
Version Number: v1.1.1
Racklet is a fully-integrated, miniature server rack. It is a scale model of a hyperscaler server rack loosely based on the Open Compute Project (OCP) rack designs. It consists of several pluggable "compute units", a Rack Management Controller (RMC) and shared power delivery (the so-called busbar). In addition, there is some functionality borrowed from OCP Edge Cloud implementations in place, such as a common interconnect for the compute units (SMBus).
Physically a Racklet rack is a bit larger than a one liter milk carton and hosts single board computers conforming to the Raspberry Pi 3/4 form factor. The defining features of Racklet compared to other "Raspberry Pi clouds" are the fully integrated and secure but still pluggable open source firmware/software/hardware solutions, and the scalability enabled by e.g. hotplug support as well as the inexpensive and available manufacturing techniques applied.
Racklet aims to inspire their users to explore how modern, advanced server architectures work in practice, in a tangible and educational way. With the new-found knowledge and inspiration, the user may apply their modernization skills on traditional server infrastructure, which improves the status quo and pushes the industry forward. The aim of the project is also to write modular pieces of software and firmware that can be re-used across a diverse set of systems, not only on Racklet itself.
Racklet is defined by its values and principles. Below you can read about the 9 values that shape this project, and what they mean in practice. One value to highlight here is accessibility. Racklet is 100% open source and should be accessible to a group as diverse as possible from all over the world. This means all parts of the system should be reproducible through open PCB designs, 3D-printed casing, and commodity, off-the-shelf hardware. We want to lower the barrier of entry for this domain.
In this RFC, we will outline what Racklet is, why create it (from the user's perspective), and the values and design constraints for the system.
In short, we'd like to say that
Racklet aims to be for Cloud Computing what Raspberry Pi is for Programming, and Arduino for Electronics
Have fun tinkering with it!
Problem statement (choose the one that appeals to you):
Distributed systems of various kinds are steadily becoming the foundation for all important technological environments; and their backends require ever-increasing capacity. The world of cloud computing software is rapidly evolving towards dynamic, scalable and self-correcting systems. The amount of tools and services required to run high-performing cloud systems in a diverse range of environments are vast, and the integration between them complex[1]. Due to the complicated nature of this quickly-evolving cloud infrastructure, how can newcomers to the field of cloud computing get an idea of the landscape and workings of the systems in an effective way?
and/or
Empirically, it seems that many mainstream server and infrastructure provisioning guides (or even fully-integrated solutions) often don't put enough effort into securing the firmware stack of the server, but focus more on the layers above. This, in combination with the firmware being proprietary, often leads to situations of unknown/random bugs and security flaws (caused by the user due to insufficient knowledge, or flaws in the firmware without patched versions). Different pieces and layers of firmware doing the same things (and often too much) in subtly different ways, but without it being possible to only activate what you need or want to extend. Firmware written in C suffers from many common memory errors and even security flaws. Network booting of servers often stick to legacy and unreliable protocols like TFTP or similar, and skip any verification of the payload's integrity or authenticity.
Proposed solution:
As detailed in the Summary section above, "Racklet is a fully-integrated, miniature server rack". Building on top of good ideas and practices from OCP, the open source firmware community (e.g. LinuxBoot, u-boot, TF-A, etc.), the Raspberry Pi educational model, and the advancements in writing secure system software and firmware with Rust, we think we can push the state of the art here, and educate newcomers to the field of secure and open source cloud computing at the same time.
As pointed out in the first problem statement, it can be challenging to "get into" to the server infrastructure world, especially if you want to run your own servers, due to a multitude of reasons, including complexity, lack of standardization at several layers of the stack, and cost. Through Racklet, we want to explore these venues in a tangible and low-cost way with the help of Raspberry Pi's (or alternate single board computers). It has been shown earlier that Raspberry Pis can be helpful for teaching cloud computing[2], hence we believe this could be a good fit.
The goal of the project is to be comprehensive and "real-world" enough to feature, at least conceptually, most of what you would find in a modern hyperscaler server infrastructure environment, but still clear, well-documented, and user-friendly enough to attract and welcome new, future talents to the server and firmware worlds.
This proposal consists of a detailed breakdown of the Values of the project, who we think will be using this project (what kind of user persona we are optimizing for, in User Perspectives), how we envision the users will use it (User Goals), and finally, a high-level overview of the hardware/software Layers.
Any further technical details are out of scope for this RFC. Those will be covered by upcoming, more detailed RFCs.
The following values apply to the whole system, and are sorted roughly in priority order, as a guideline when making decisions. Future RFCs should outline in their "Motivation" chapter what values have (or have not) been adhered in the RFC and how.
Disclaimer: All of these values are aspirational, they are not literal guarantees that can be used for liability claims. We expect to iteratively improve towards and get closer to these goals as the project matures and new versions are released.
Security
Interoperability
Accessibility / Reproducibility
Modularity / Compatibility
Transparency
Maintainability / Upgradability
Affordability
Main User Persona: "Racklet is for a student, hobbyist, teacher or industry professional who wants to learn and understand modern, increasingly important distributed and cloud computing skills to foster education, research and work opportunities."
A University distributed computing class could use one or multiple Racklets as a prototyping platform to enable fast learning augmented by practical training.
Companies demoing their software and hardware at conferences could use this as an innovative way to showcase their solutions.
As this aims to be a scale model of a real cloud environment, it forms a good target for capture-the-flag hacking contests.
In the same spirit as the Tangible Cloud Teaching use-case above, we anticipate it would also be a good fit for commercial trainings, when time is limited and you quickly need to demonstrate how some specific piece of technology works in detail. The instructor can easily engage their audience in a practical way.
Hobbyists could use one or multiple Racklets to establish home infrastructure while learning more about and further developing the platform.
Racklet can be used for Research and Development purposes in Computer Labs where some specific application's architecture is being validated on a real-world but inexpensive system. This is an alternative to data center simulation programs. Racklet captures the real-world aspects that a simulation might not take into account or can not realistically represent. Realistic outages and partial failure modes (e.g. power outages or network partitions) can easily be applied to the system in order to research how the tested application reacts.
Achieve user goals through containers and Kubernetes: The user wants to use containers and Kubernetes as their preferred way of running applications, and hence some base functionality of that should be provided in a "batteries included, but swappable"-sense. At the end of the day, we're building this project so that the user can build something nice on top of it through these standard interfaces.
Fast reconfiguration / turn-around time: The user wants to configure their hardware once, and after that be able to set up and/or recreate the whole software stack from the ground up multiple times over with minimal hassle. For example, an educator may want to rebuild the rack configuration, trusted certificates, etc. or do a "factory reset" for every class/workshop they run.
Secure firmware[3] and software updates: When the user gets notified that a new release is available, the user doesn't want to do it the "classical" way of downloading some hex binary and flashing it manually for each server. Instead, they want the upgrades to be atomic (e.g. A/B partitioning), secure (payload is signed), automated and defend against common upgrading attacks (e.g. rollback attack). Optionally, automatic deployment of upgrades can be enabled.
Network boot in a zero-trust environment: The user should feel ready to plug Racklet in to (almost) any existing network, without the system interfering with existing devices on the network or vice versa. This goes strongly in hand with #3 as ensuring security, especially in the boot and upgrade process, is of paramount importance[4].
End-to-end encryption and authentication: The user wants to feel comfortable running software on top of their Racklet without worrying about e.g. MITM attacks in the surrounding untrusted network it is connected to. Hence, all TCP/IP traffic should be end-to-end encrypted, authenticated or preferably, both.
Hot swappability: The user wants to be able to upgrade their racks often for newer hardware as they enter the market. The user also wants to be able to maintain and service the rack while it is running, and dynamically expand capacity at runtime.
Keep track of power usage and efficiency: For educational purposes and intelligent power control, it is important to know how compute utilization translates to power consumption across the system. The user wants transparently reported metrics so that they can analyze the data and utilize higher-level power control routines to optimize power draw.
Physical portability: Racklet should be lightweight enough to be carried by hand and should not require specialized equipment or disassembly for transportation.
Commodity power and I/O: The user wants to be able to utilize commodity resources they already have at hand, instead of needing to buy specialized equipment only for Racklet. Examples of these commodities include: Laptop chargers instead of a custom power cable/transformer, USB instead of some proprietary high-speed interconnect, and Ethernet switches & cables instead of e.g. expensive SFP+.
The only design detail in-scope for this document is defining the layers of the system at a high level.
The following section will go through the various layers of the system and the requirements/contract of each of the items.
This layer of the "stack" consists mainly of the 3D-printed casing and trays of the rack. The Ethernet switch optionally attached on the side of the rack can also be considered a structural item.
This layer consists of compute capacity (e.g. a Raspberry Pi with an attached SSD), our reproducible Baseboard Management Controller PCB attached to it in some way (as also can be found in mainstream servers), and our reproducible backplane PCB/wiring which feeds the common busbar power rails, and the SMBus interconnect between compute units.
The firmware layer is defined as the code that is running in "bare metal" environments, i.e. on the compute before the primary OS has been loaded, or on the BMC microcontroller. Examples of code that is capable of (and specialized at) running before the primary OS includes the (proprietary) Raspberry Pi firmware, LinuxBoot, u-boot, and TF-A. We strive to use open source Embedded Rust due to the language's suitability for memory safe firmware and good support for most popular microcontrollers.
The system software includes everything the system needs to run in order to fulfil the user goals. All applications at this layer are built for and depend on Linux. As per above, we expect the user to utilize Kubernetes, and hence there is a default (but replaceable) installation of that. We will also pre-install a (configurable) operating system (OS), e.g. Bottlerocket, so that the user can get going without too much preliminary work. In addition, we will implement the Rack Management Controller features at this level.
At this layer are the user-deployable workloads running in containers, we consider them as "user-space applications". This layer is not part of the Racklet project, it is entirely user-defined. This is also a good place for users to extend Racklet and add extra functionality of their liking.
Unit tests will be created for individual software components of the system. Integration tests will be created for cross-component communications. Automated end-to-end tests will be conducted by a physical Racklet instance that is continuously "upgraded" to the latest development version and reports feedback. This way we will assure the stability and resilience of the software/firmware stack.
Furthermore, we will rely on the developer community to test out many different hardware, software and firmware combinations other than the reference implementation.
For this project to be considered successful and graduated, we mandate the following:
2020-12-10
: First version of this RFC has been accepted.2021-06-07
: Values have been refined, misc. clarifications and readability improvements.2021-06-08
: All values, subvalues and user goals have been given IDs for referring to them.See “8 ways the cloud is more complex than you think | CIO.” and “Cloud Computing, Once Loved For Its Simplicity, Is Now A Complex Beast.” (accessed Dec. 05, 2019). ↩︎
"KubeCloud: A Small-Scale Tangible Cloud Computing Environment". Master's thesis in Computer Engineering at Aarhus University by Kasper Nissen and Martin Jensen. Published June 6th, 2016. Download PDF here ↩︎
For example the 1st stage bootloader of the Raspberry Pi 4 is currently closed source software which we cannot audit or modify, and hence cannot use as a "complete end to end" hardware root of trust. However, such non-idealities don't stop us from getting as close as possible to full hardware root of trust, and more importantly, conceptually being consistent in the way we work with these SBCs and "normal" servers. ↩︎ ↩︎
At least initially this does not mean that the system is 100% secure, there are both some practical limits[3:1] and software/hardware features that need to be explored for improved security (e.g. ARM TrustedFirmware). ↩︎