--- title: updates for week of 2025-08-04 --- # Resources - [x] - outage ticket: https://pagure.io/fedora-infrastructure/issue/12679 - [x] - status update: https://github.com/fedora-infra/statusfpo/pull/67 - [x] - announcement sent to devel-announce/infrastructure lists # SOP-ish (put this in docs) - log into batcave01 - Put your name next to the machine here (with "in progress"). - run: `sudo rbac-playbook vhost_update_reboot.yml` - - If it's bare metal (You'll find out because libvirt will give an erorr and the vhost_* playbook will fail) Kevin needs to do it. - - Authenticate (should be 2factor) - - Put the full hostname into the prompt, to start the update - - Monitor the output, and in another window you might want to monitor the machine (see below) - - After the update has finished, you'll be prompted a second time for a host to reboot. Put **the same hostname** into this prompt - To monitor a machine you are update/rebooting in another window (also from batcave01) you can run: `mtr --displaymode 1 -i 4 <host>` - - The main thing to look for is if it goes away during the upgrade part (very bad), or if it takes too long to reboot (all hosts are different, but 5m is fine and over 10m is usually bad). - If anything weird happens, speak to Kevin - When a host fails to come back after the reboot it's usually due to luks passwds needed - - Which requires console access via (without the -stg if not in staging) **noc01**: `ipmitool -U admin -H <host>-stg.mgmt.rdu3.fedoraproject.org -I lanplus shell` ... you'll need the passwds if you are doing this bit. (and run `sol activate`) - - If `ipmitool` doesn't work you can use: `sshuttle 172.23.1.0/24 -r noc-cc01.rdu-cc.fedoraproject.org -v` updates for week of 2025-08-04 ## openqa - sometime - schedule with adamw * openqa-a64-worker01.rdu3.fedoraproject.org - * openqa-a64-worker02.rdu3.fedoraproject.org - * openqa-x86-worker01.rdu3.fedoraproject.org - * openqa-x86-worker02.rdu3.fedoraproject.org - * openqa-x86-worker03.rdu3.fedoraproject.org - * openqa-x86-worker04.rdu3.fedoraproject.org - * openqa-x86-worker05.rdu3.fedoraproject.org - * qvmhost-x86-01.rdu3.fedoraproject.org - ## copr_hypervisor - sometime - schedule / check with #buildsys:fedoraproject.org * vmhost-p08-copr01.rdu-cc.fedoraproject.org - * vmhost-p08-copr02.rdu-cc.fedoraproject.org - * vmhost-p09-copr01.rdu-cc.fedoraproject.org - * vmhost-x86-copr01.rdu-cc.fedoraproject.org - * vmhost-x86-copr02.rdu-cc.fedoraproject.org - * vmhost-x86-copr03.rdu-cc.fedoraproject.org - * vmhost-x86-copr04.rdu-cc.fedoraproject.org - * db.stg.aws.fedoraproject -? # 2025-08-04 Monday - staging * bvmhost-s390x-01.stg.s390.fedoraproject.org - james - done * bvmhost-a64-01.stg.rdu3.fedoraproject.org - james - done * bvmhost-x86-01.stg.rdu3.fedoraproject.org - james - done * bvmhost-x86-02.stg.rdu3.fedoraproject.org - james - done * bvmhost-x86-03.stg.rdu3.fedoraproject.org - james - done * vmhost-x86-01.stg.rdu3.fedoraproject.org - james - done * vmhost-x86-02.stg.rdu3.fedoraproject.org - james - done * vmhost-x86-03.stg.rdu3.fedoraproject.org - james - done * vmhost-x86-04.stg.rdu3.fedoraproject.org - james - done * vmhost-x86-05.stg.rdu3.fedoraproject.org - james - done # 2025-08-06 Wednesday - non outage causing production * autosign01.rdu3.fedoraproject.org - ⚠ needs robosig pass - kevin - done * backup01.rdu3.fedoraproject.org - ⚠ needs backup pass - * dedicatedsolutions01.fedoraproject.org - kevin - done * ibiblio02.fedoraproject.org - james - done * ibiblio05.fedoraproject.org - james - done * internetx02.fedoraproject.org - kevin - done * osuosl02.fedoraproject.org - kevin - done * proxy05.fedoraproject.org - ⚠ bare metal - kevin - done * retrace03.rdu-cc.fedoraproject.org - ⚠ bare metal - kevin - done * sign-vault01.rdu3.fedoraproject.org - kevin - done * sign-vault02.rdu3.fedoraproject.org - ⚠ needs vault pass - kevin - done * storinator01.rdu-cc.fedoraproject.org - ⚠ bare metal - kevin - in progress * bvmhost-x86-riscv01.rdu3.fedoraproject.org - (schedule with riscv matrix channel) * vmhost-x86-cc02.rdu-cc.fedoraproject.org - kevin - done * vmhost-x86-cc05.rdu-cc.fedoraproject.org - kevin - done * vmhost-x86-cc06.rdu-cc.fedoraproject.org - kevin - done * maintainer_test group - kevin - done --- # 2025-08-05 Tueday - main event 21:00 UTC - **Don't** start before: ``` ================================= Day: Tuesday ================================= 2025-08-05 14:00 PDT US/Pacific 2025-08-05 17:00 EDT --> US/Eastern <-- 2025-08-05 21:00 UTC UTC 2025-08-05 22:00 BST Europe/London 2025-08-05 23:00 CEST Europe/Berlin 2025-08-05 23:00 CEST Europe/Paris ------------------------------ New Day: Wednesday ------------------------------ 2025-08-06 02:30 IST Asia/Calcutta 2025-08-06 05:00 HKT Asia/Hong_Kong 2025-08-06 05:00 +08 Asia/Singapore 2025-08-06 06:00 JST Asia/Tokyo 2025-08-06 07:00 AEST Australia/Brisbane ``` ## Checklist [x] Apply updates to everything (pre outage) [x] Update status [x] Silence nagios [x] Disable backups on backup01 [x] Disable updates pushes on bodhi-backend01 [x] Stop koschei scheduler ### Extra [-] Do batcave01 / vmhost-x86-05.rdu3 first? ### bkernel * buildhw-x86-01.rdu3.fedoraproject.org - needs pesignd passphrase - kevin - done ### builders * buildhw-a64-01.rdu3.fedoraproject.org - james - done * buildhw-a64-02.rdu3.fedoraproject.org - james - done * buildhw-x86-01.rdu3.fedoraproject.org - james - done * buildhw-x86-02.rdu3.fedoraproject.org - james - done * buildhw-x86-03.rdu3.fedoraproject.org - james - kevin fixed * buildhw-x86-04.rdu3.fedoraproject.org - ⚠ james - broken, libvirt * bvmhost-s390x-01.s390.fedoraproject.org - ⚠ statistically speaking likely will not come back up nicely - kevin - done * bvmhost-a64-01.rdu3.fedoraproject.org - james - done * bvmhost-a64-02.rdu3.fedoraproject.org - james - done * bvmhost-a64-03.rdu3.fedoraproject.org - james - done * bvmhost-a64-04.rdu3.fedoraproject.org - james - done * bvmhost-a64-05.rdu3.fedoraproject.org - james - done (kevin fixed boot order) * bvmhost-p09-05.rdu3.fedoraproject.org - james - done * bvmhost-p10-01.rdu3.fedoraproject.org - james - done - no ipmi, ⚠ iscsi ordering with libvirt. * bvmhost-x86-05.rdu3.fedoraproject.org - james - done * bvmhost-x86-06.rdu3.fedoraproject.org - james - done ### non builders * bvmhost-x86-01.rdu3.fedoraproject.org - james - done * bvmhost-x86-02.rdu3.fedoraproject.org - james - done - pkgs01 unhappy rkhunter * bvmhost-x86-03.rdu3.fedoraproject.org - james - done * bvmhost-x86-04.rdu3.fedoraproject.org - james - done * vmhost-x86-01.rdu3.fedoraproject.org (bastion01/ipa01/noc01) - ⚠ nagios shush won't work - kevin - done * vmhost-x86-02.rdu3.fedoraproject.org (bastion02/ipa02) - james - done, also -e nodns=true * vmhost-x86-03.rdu3.fedoraproject.org (log01) - james - rebooting. * vmhost-x86-04.rdu3.fedoraproject.org (db01) - james - done * vmhost-x86-05.rdu3.fedoraproject.org (batcave01) - kevin - in progress * vmhost-x86-cc01.rdu-cc.fedoraproject.org - james - done, after -e nodns=true * vmhost-x86-cc03.rdu-cc.fedoraproject.org - james - done * proxy3*-40 ( aws ) - kevin - done # Post checklist [x] start koschei scheduler [x] enable backups on backup01 [x] enable updates pushes on bodhi-backend01 [x] update status [x] unsilence nagios --- # Weird Errors