Today, RHEL 9 mockbuild broke in osbuild CI. Basically, when mock
was being installed, a lot of packages were updated, and during these updates, ssh connection dropped and couldn't be established again.
Earlier this week, a mass RHEL 9 rebuild happened. As RHEL 9 is still in devel, this can break backward compatibility and updates.
Also earlier this week, we redeployed composer.osbuild.org. There was kinda big change for the official RHEL 9 guest images - we now build them using v2 manifests (so basically the whole definition was rewritten from scratch).
Our CI runs guest images in AWS (ec2 images are not yet available in RHEL 9). Yes, we build them on composer.osbuild.org. Yes, these are the images from the previous paragraph.
My first assumption was that the mass rebuild indeed broke updates. As it doesn't make much sense to care about older RHEL 9 content, the best fix is to just update our images that Schutzbot uses.
So we indeed tried that with Jakub. Unfortunately, we couldn't ssh into EC2 instances booted from these images. Shortly after that, I discovered that these images are not booting.
"OMFG, I think we broke something with the v1->v2 manifest migration." Yeah, I panicked a little (j/k, I panicked a lot), declared our new image definitions for RHEL 9 broken, spammed several people, and left my computer because I sadly had some errands.
The things got even weirder some after that because Christian tried booting the latest RHEL 9 images and they worked fine for him…
After I read the message from Christian, I started to think what we are doing differently in EC2. Then I realized the whole truth: We uploaded qcow2 images into EC2! But it doesn't work that way! EC2 cannot consume qcow2! We always (conditions apply) have to use raw images! So I tried converting the qcow2 into raw and uploading it, and voila - the image happily booted!
So the images are still bootable and we are fine. \o/ I just forgot one step. Sorry.
Narrator: This wasn't the end though.
Unfortunately, even when the image booted, it failed to install s3cmd
. s3cmd
isn't available in RHEL 9 so we have to install it from our private copr. The important thing is that s3cmd
requires python-magic
. Because python-magic
is also not available in RHEL 9, it also lives in our internal COPR.
However, after the update, we weren't able to install s3cmd
because python-file-magic
was already installed and it conflicts with python-magic
.
Ah! This must be something with the mass-rebuild, right? They surely changed something in the Python packaging and dependencies are now broken. So I started looking at all brew builds of s3cmd
, python-magic
, and python-file-magic
. But nothing changed during the mass rebuild…
After some time I realized what changed: I was the one who added python-file-magic
into the guest image! Actually, I didn't add specifically python-file-magic
but I added insights-client
that requires python-file-magic
. I added it for the parity with 8.5 images (it also makes sense to include it and the reason why it wasn't there before is that it didn't exist when Jozef originally created the RHEL 9 definition). So that's how python-file-magic
got into there and was conflicting with python-magic
required by s3cmd
.
After some digging, I found out that python-file-magic
and python-magic
provides basically the same API (that's why they are conflicting each other). I also found out that s3cmd
in Fedora turns off the dependency generator and just pulls python-file-magic
directly. But it does that only for Fedora. So I extended the condition also for RHEL 9 and voila! s3cmd
now requires python-file-magic
instead of python-magic
so there's now no conflict and everybody's happy.
So yeah, in the end, both the mass-rebuild, and the change of the image definition indeed broke us but in a slightly different way than I originally thought. Images are fun. :)