# Post-mortem for broken RHEL 9 mockbuild in osbuild CI
## What happened
Today, RHEL 9 mockbuild broke in osbuild CI. Basically, when `mock` was being installed, a lot of packages were updated, and during these updates, ssh connection dropped and couldn't be established again.
## Moar context
Earlier this week, a mass RHEL 9 rebuild happened. As RHEL 9 is still in devel, this can **break backward compatibility and updates**.
Also earlier this week, we redeployed composer.osbuild.org. There was kinda big change for the official RHEL 9 guest images - we now build them using v2 manifests (so basically the whole definition was rewritten from scratch).
Our CI runs guest images in AWS (ec2 images are not yet available in RHEL 9). Yes, we build them on composer.osbuild.org. Yes, these are the images from the previous paragraph.
## My first (I think correct) assumption
My first assumption was that the mass rebuild indeed broke updates. As it doesn't make much sense to care about older RHEL 9 content, the best fix is to just update our images that Schutzbot uses.
So we indeed tried that with Jakub. Unfortunately, we couldn't ssh into EC2 instances booted from these images. Shortly after that, I discovered that these images are not booting.
## My second (and fatally wrong) assumption
"OMFG, I think we broke something with the v1->v2 manifest migration." Yeah, I panicked a little (j/k, I panicked **a lot**), declared our new image definitions for RHEL 9 broken, spammed several people, and left my computer because I sadly had some errands.
The things got even weirder some after that because Christian tried booting the latest RHEL 9 images and they worked fine for him...
## The realization
After I read the message from Christian, I started to think what we are doing differently in EC2. Then I realized the whole truth: We uploaded qcow2 images into EC2! But it doesn't work that way! EC2 cannot consume qcow2! We always (conditions apply) have to use raw images! So I tried converting the qcow2 into raw and uploading it, and voila - the image happily booted!
**So the images are still bootable and we are fine.** \o/ I just forgot one step. Sorry.
*Narrator: This wasn't the end though.*
## The changed image definition strikes back
Unfortunately, even when the image booted, it failed to install `s3cmd`. `s3cmd` isn't available in RHEL 9 so we have to install it from our private [copr]. The important thing is that `s3cmd` requires `python-magic`. Because `python-magic` is also not available in RHEL 9, it also lives in our internal COPR.
However, after the update, we weren't able to install `s3cmd` because `python-file-magic` was already installed and it conflicts with `python-magic`.
## My third (and also wrong) assumption
Ah! This must be something with the mass-rebuild, right? They surely changed something in the Python packaging and dependencies are now broken. So I started looking at all brew builds of `s3cmd`, `python-magic`, and `python-file-magic`. But nothing changed during the mass rebuild...
## The second realization
After some time I realized what changed: I was the one who added `python-file-magic` into the guest image! Actually, I didn't add specifically `python-file-magic` but I added `insights-client` that requires `python-file-magic`. I added it for the parity with 8.5 images (it also makes sense to include it and the reason why it wasn't there before is that it didn't exist when Jozef originally created the RHEL 9 definition). So that's how `python-file-magic` got into there and was conflicting with `python-magic` required by `s3cmd`.
## The fix
After some digging, I found out that `python-file-magic` and `python-magic` provides basically the same API (that's why they are conflicting each other). I also found out that `s3cmd` in Fedora turns off the dependency generator and just pulls `python-file-magic` directly. But it does that only for Fedora. So I extended the condition also for RHEL 9 and voila! `s3cmd` now requires `python-file-magic` instead of `python-magic` so there's now no conflict and everybody's happy.
## Summary
So yeah, in the end, both the mass-rebuild, and the change of the image definition indeed broke us but in a slightly different way than I originally thought. Images are fun. :)
[copr]: https://copr.devel.redhat.com/coprs/osbuild-team/epel-el9/builds/