I built a small-scale rack to run server-grade CPUs in a home environment. It is entirely water-cooled and managed by a custom controller that handles thermals, fan speeds, and provides authenticated remote access via IPMIv2/Redfish.

## WTF, why
I am a software dev building hypervisors for a living and I love running local hardware for server-grade hypervisor testing, but I can't stand the noise. Server equipment usually sounds like a jet engine, so water-cooling was a requirement. Plus, hardware is just super fun to mess with (albeit expensive).
Also, I think 1Us are really really cool so i'm going to solve the noise problem with a solution **that is 10x more complex and expensive than just buying a bigger case**.

## The hardware
The rack currently houses my main devbox and three test nodes: an EPYC Milan, a Xeon EMR, and an Ampere Altra. These are the systems i need to cover all CPU types my work has to deal with.
The rest of it is a mix of used motherboards and IPMI firmware versions. I managed to source most of this before the DRAM-pocalypse hit.
All units are externally liquid-cooled and managed by a rack controller. The fan grill on the foto below is laser-cut by a local workshop.

## Rack controller
The rack, cooling and units are managed by an ESP32 controller sitting on a physically isolated IPMI VLAN. It acts as a single, secure entry point for the management network. I wrote a custom IPMI management stack specifically for this hardware.
The controller is responsible for:
* Remote authentication and ipmi access.
* Monitoring thermal sensors to generate PWM signals for the pump and fans.
* Pushing system fan curve configurations to the individual hosts.
With it i can control stuff from my devbox, power-manage my test hosts and ssh into them through the controller tunnel.
Also, there is something inherently "low-life + high-tech = cyberpunk" about using a $4 microcontroller to boss around thousands of euro in high-end server gear.
```
> th-thermals
devbox 46 degrees C
milan 32 degrees C
emr 31 degrees C
altra offline
```

## Cooling strategy
The 1U and 2U boxes are liquid-cooled externally. Each unit uses a cold-plate and quick-disconnect fittings (QDCs) that lead to a shared external radiator.

The loop is designed in parallel so I can remove individual units without shutting down the entire system. However, this makes flow speed harder to predict. To prevent the manifold from becoming a bottleneck, the collector tubes must be significantly wider than the branch lines. Here's my amazingly clear schematic of the whole thing:

To maintain consistent pressure and flow across $n$ parallel loops, the cross-sectional area of the manifold should ideally be greater than or equal to the sum of the areas of the individual branch lines. The required manifold diameter $D$ for $n$ branches of diameter $d$ is:
$$D \geq d \sqrt{n}$$
Using standard 7mm inner-diameter tubing for the branches and 12.8mm for the manifold hits its physical limit at roughly 3-4 loop nodes:
$$n \approx \frac{12.8^2}{7^2} \approx 3.34$$

Beyond four parallel loop nodes, the 12.8mm manifold tube causes a massive increase in back-pressure, which stresses the pump and reduces cooling efficiency for the furthest nodes in the loop, so i need to scale up to another manifold for every 4 nodes or install a second pump to brute-force the issue.
Using different less-common diameter tubes is also an option but that severely limits part selection for fittings and disconnects.

## Things i got wrong
* Radiator dust is a real problem, I should have built a filter cage.
* Pump redundancy is something that is highly desirable.
* I still need to install proper leak detection sensors and hook them up to the controller.
* Server hardware is fragile, much more so than consumer parts. A slipped screwdriver while the power is connected will end tragically for the board. My electronics repair skills definitely leveled-up a couple of times from this.

* IPMI spec is bloated. I have a feeling there's much to be said about potential firmware attack surface over IPMI.
* While the system works well and stays silent under load, I made enough design mistakes to justify a v2 in the future. I might just be looking for an excuse to do this all over again though, lol.