Download as PDF (for docs.racklet.io)
Authors (in alphabetical order):
Status (as defined here): Provisional
Creation Date: 2021-04-15
Last Updated: 2021-07-02
RFC Handle: 0002-layer-architecture
Initial Pull Request: racklet/racklet#20
Tracking Issue: racklet/racklet#21
This RFC describes the overall Racklet architecture, its defining layers, and requirements for each such layer, derived from RFC-0001. For each layer the defining components are described at a high level (avoiding implementation details). The components are associated with their role and five highlighted key requirements from the values and user goals of RFC-0001.
With this RFC we aim to clearly define the layers Racklet consists of to provide a clear overview of the system for all contributors and maintainers. Additionally this document concisely presents the techniques and technologies used in the various layers to achieve the goals stated in RFC-0001.
Racklet is divided into 5 distinct layers, from highest-level to lowest-level:
5. User Software
4. System Software
3. Firmware
2. Electrical
1. Structural
There is some overlap between these defined layers, mostly due to individual components contributing to multiple layers, but we aim to keep a clear distinction in this definition. If for example a microcontroller is part of both the electrical and firmware layer, the electrical layer only considers its electical properties and the firmware layer only its firmware.
The architecture is designed with the layers and their interaction as the primary focus. The requirements of a layer drive the design of the layer below it, which aims to satisfy the dependencies according to the values and user goals of the project. The layers are described here in reverse order (layer 5 first), since the highest layer starts the dependency chain by directly fulfilling the user goals.
Summary: The user software layer should allow the user to schedule workloads of choice using either containers or VMs. There should be an accessible and observable graphical user interface in place for the user to monitor and manage the Racklet system and workloads.
Goals:
Layer components:
Component | Role | Key Requirements |
---|---|---|
Micro Virtual Machine orchestration | Define and run VMs declaratively | Improve status quo, Openness, Declarative management, Documentation, Fast reconfiguration |
Kubernetes deployment automation | Consume/use a Kubernetes cluster | De-facto standards, Declarative management, Loose coupling, Upgradability, Utilize Kubernetes |
Racklet dashboard | Monitor rack and cluster state, deploy workloads | Security by design, Declarative management, Open source, Portability, Observability |
Summary: The system software layer is responsible for enabling the container/VM solutions of the user software layer. There should be a hypervisor in place for the virtual machines and a container orchestration solution (Kubernetes) for container workloads. Kubernetes is also leveraged for orchestrating the Racklet rack and performing managemental operations in a declarative fashion.
Goals:
Layer components:
Component | Role | Key Requirements |
---|---|---|
System Kubernetes installation | Run container workloads, perform management | Declarative management, Consistency, Modular design, Portability, Loose coupling |
Hypervisor operating system | Run VM workloads, enable kernel-level security | Defense in depth, De-facto standards, Declarative management, Raspberry Pi compatibility, Portability |
CNI compliant networking | Network the Racklet cluster compute units | Security by design, No old/insecure protocols, Openness, Observability, End-to-end encryption |
GitOps tooling | Declarative management of the Racklet stack | Improve status quo, De-facto standards, Declarative management, Observability, Auto-upgradability |
Summary: The firmware helps in securely booting and configuring Racklet compute, for example it is declaratively managed and performs cryptographic verification of payloads to boot. The firmware should also help with collecting hardware observability data and telemetry for monitoring and debugging.
Goals:
Layer components:
Component | Role | Key Requirements |
---|---|---|
u-root based bootstrap environment | Secure Git access, firmware updates and payload booting | Security by design, Improve status quo, Open source, Secure updates, Zero-trust network boot |
BMC (Baseboard Management Controller) firmware | Compute booting and debugging, key and signature storage for software layers | Security by design, No old/insecure protocols, Declarative management, Debuggability, One-time hardware setup |
RMC (Rack Management Controller) firmware | Rack hardware control and observability, e.g. fans | Openness, Declarative management, Loose coupling, Observability, Secure updates |
Summary: The electrical layer backs the computational, power delivery and physical networking requirements of the compute. It also provides a means to run the firmware on the BMC and RMC (microcontrollers).
Goals:
Layer components:
Component | Role | Key Requirements |
---|---|---|
Compute unit | Run the bootstrap and hypervisor operating systems and compute workloads | Common off-the-shelf parts, Raspberry Pi compatibility, Hot swappability, One-time hardware setup, Physical portability |
BMC PCB | Host the BMC microcontroller and deliver power to the compute unit | Open Source, Reproducible PCBs, Modular design, Raspberry Pi compatibility, Energy monitoring |
Backplane PCB | Rack level power distribution and inter-BMC connectivity | Common off-the-shelf parts, Reproducible PCBs, Physical portability, Hot swappability, Upgradability |
Network switch | Provides networking for the rack (and cluster) | De-facto standards, Common off-the-shelf parts, Sensible rack cost, Physical portability, Commodity power and I/O |
Summary: The structural layer consists of physical components that form the structure of the Racklet rack. The structural layer enables Racklet to be compact, modular and easily transportable. The rack consists of a casing that hosts the backplane, network switch and slots for slide-in trays. The compute unit with its storage is attached to modular compute trays, that have matching rails for the slide-in slots in the rack.
Goals:
Layer Components:
Component | Role | Key Requirements |
---|---|---|
Compute tray | Enable mounting of a compute unit in a hot-swappable and modular way | Open source, 3D printed parts, Modular design, Raspberry Pi compatibility, Hot swappability |
Rack case | Contain the network switch, a power backplane and multiple compute trays | Open source, 3D printed parts, Modular design, Sensible rack cost, Physical portability |
The layer architecture described in the proposals introduces some new named concepts and components. By layer, they can be explained as follows:
5. User Software
4. System Software
3. Firmware
2. Electrical
1. Structural
Note: These RFCs target a "reference" implementation of Racklet, as envisioned by its authors. The components and key requirements for them are described from the perspective of this reference implementation, and thus "community" implementations of Racklet (e.g. in a different physical form factor) don't need to strictly adhere to the requirements laid out here. A "Racklet compliant" system ultimately only required to follow the values laid out in RFC-0001 and the loose coupling hardware/software interfaces of the project. That said, it is still advised that variations of Racklet follow the layers, high-level components and key requirements in this document.
The Racklet team aims to adapt to community requirements and adaptations to keep the Racklet ecosystem cohesive. The project has three strategies to mitigate against the risk of the ecosystem fragmenting with incompatible hardware/software implementations of Racklet:
As stated in Risks and Mitigations, Racklet is (one of) the first of its kind with regards to its specification-first architecture. The initial layer separation presented here is the result of an iterative thought process by the core Racklet authors. The five layers are chosen to clearly separate roles and responsibilities of components, without going into too much detail (too many layers) or causing excessive overlap (too few layers). Firmware and system software are separated to achieve loose coupling and clear, secure communication between them. User software is separated from system software to define a border between software mostly provided by the Racklet project and external software that the user introduces (workloads).
Loose coupling plays a very important role in the architecture presented here. Racklet could have been designed as a fully integrated system with implementations that are strictly defined by the project, but while this potentially could make the system more compact and simple, it also faces many drawbacks that make it incompatible with the values and goals of the project. For example, Racklet relies heavily on various different projects in the Open Source Firmware and Cloud Native ecosystems, many of which evolve quickly and provide alternative implementations complying to standard APIs. We want Racklet to be accessible, transparent and modular, which means supporting a wide variety of hardware, and enabling user customization to a great extent. If loose coupling is implemented properly, we believe that the standardized architecture presented here will be relatively simple to maintain and extend, and community-built Racklet solutions will also be able to use the modules and different software implementations effortlessly. In summary, to fulfill the values defined in RFC-0001 and to avoid ecosystem fragmentation the Racklet project aims to provide interfaces, not implementations.
At the time of Racklet creation the history of Raspberry Pi (and other single board computer) based cluster computers is already very rich. Various private persons, educational insistutes and companies have come up with a wide variety of designs (e.g. KubeCloud[1]) for different use cases for at least the past 8 years. What sets Racklet apart from these mostly one-off implementations is it's specification. Instead of deriving a specification from some implementation, Racklet as a system is primarily defined as a set of RFC documents. This specification is intended to define a standardized way to build a miniature compute cluster, from the lowest-level hardware details up to a state-of-the-art software stack. Since the specification is defined from the ground up, we prioritize basing it on the most secure and modern technologies available today, essentially merging the core concepts of prior SBC cluster computer implementations with the state of the art security and fleet management models of large-scale cloud providers.
The architecture described in this document is prone to encounter changes as the detailed RFCs describing individual components/layers are established. It is also unclear if this particular layered architecture with the chosen high-level components is optimal, and thus the reference implementation will likely influence the structure here once it is better known what works and what doesn't.
Racklet is also a complex system, and this document in its current state can likely not provide the full picture of the architecture to an unfamiliar reader. To combat this, additional graphical elements such as architecture diagrams could be embedded into this document in a future revision (TODO).
The concept of "Racklet conformance" briefly disussed in Risks and Mitigations is not expanded upon here, but might warrant its own RFC specifically for community implementations.
The layer definitions presented here are expected to evolve with the project. This document serves as a starting point for discussion, and records the current consensus. In the future the scope of this document might also include a thorough introduction to the architecture for newcomers to the project, as well as improved reasoning for particular high-level architectural decisions and how they are derived.
2021-07-20
: This RFC has been accepted."KubeCloud: A Small-Scale Tangible Cloud Computing Environment". Master's thesis in Computer Engineering at Aarhus University by Kasper Nissen and Martin Jensen. Published June 6th, 2016. Download PDF here. ↩︎