owned this note
owned this note
Published
Linked with GitHub
---
title: updates for week of 2025-03-24
---
- [x] - outage ticket: https://pagure.io/fedora-infrastructure/issue/12455
- [x] - status update: https://github.com/fedora-infra/statusfpo/pull/56
- [x] - announcement sent to devel-announce/infrastructure lists
# SOP-ish (put this somewhere else)
- log into batcave01
- Put your name next to the machine here (with "in progress").
- run: `sudo rbac-playbook vhost_update_reboot.yml`
- - If it's bare metal (You'll find out because libvirt will give an erorr and the vhost_* playbook will fail) Kevin needs to do it.
- - Authenticate (should be 2factor)
- - Put the full hostname into the prompt, to start the update
- - Monitor the output, and in another window you might want to monitor the machine (see below)
- - After the update has finished, you'll be prompted a second time for a host to reboot. Put **the same hostname** into this prompt
- To monitor a machine you are update/rebooting in another window (also from batcave01) you can run: `mtr --displaymode 1 -i 4 <host>`
- - The main thing to look for is if it goes away during the upgrade part (very bad), or if it takes too long to reboot (all hosts are different, but 5m is fine and over 10m is usually bad).
- If anything weird happens, speak to Kevin
- When a host fails to come back after the reboot it's usually due to luks passwds needed
- - Which requires console access via (without the -stg if not in staging) **noc01**: `ipmitool -U admin -H <host>-stg.mgmt.iad2.fedoraproject.org -I lanplus shell` ... you'll need the passwds if you are doing this bit. (and run `sol activate`)
- - If `ipmitool` doesn't work you can use: `sshuttle 172.23.1.0/24 -r noc-cc01.rdu-cc.fedoraproject.org -v`
updates for week of 2025-03-24
= openqa - sometime - schedule with adamw
* openqa-a64-worker01.iad2.fedoraproject.org - adamw - done
* openqa-a64-worker02.iad2.fedoraproject.org - adamw - done
* openqa-a64-worker03.iad2.fedoraproject.org - adamw - done
* openqa-a64-worker04.iad2.fedoraproject.org - adamw - done
* openqa-p09-worker01.iad2.fedoraproject.org - adamw - done
* openqa-p09-worker02.iad2.fedoraproject.org - adamw - done
* openqa-x86-worker01.iad2.fedoraproject.org - adamw - done
* openqa-x86-worker02.iad2.fedoraproject.org - adamw - done
* openqa-x86-worker03.iad2.fedoraproject.org - adamw - done
* openqa-x86-worker04.iad2.fedoraproject.org - adamw - done
* openqa-x86-worker05.iad2.fedoraproject.org - adamw - done
* openqa-x86-worker06.iad2.fedoraproject.org - adamw - done
* qvmhost-x86-01.iad2.fedoraproject.org - adamw - done
* qvmhost-x86-02.iad2.fedoraproject.org - adamw - done
= copr - sometime - schedule / check with #buildsys:fedoraproject.org
* vmhost-p08-copr01.rdu-cc.fedoraproject.org
* vmhost-p08-copr02.rdu-cc.fedoraproject.org
* vmhost-p09-copr01.rdu-cc.fedoraproject.org
* vmhost-x86-copr01.rdu-cc.fedoraproject.org
* vmhost-x86-copr02.rdu-cc.fedoraproject.org
* vmhost-x86-copr03.rdu-cc.fedoraproject.org
* vmhost-x86-copr04.rdu-cc.fedoraproject.org
= 2025-03-24 Monday - staging
* bvmhost-a64-01.stg.iad2.fedoraproject.org - james - done - luks
* bvmhost-p09-01.stg.iad2.fedoraproject.org - james - done
* bvmhost-s390x-01.stg.s390.fedoraproject.org - kevin - done
* bvmhost-x86-01.stg.iad2.fedoraproject.org - james - done
* bvmhost-x86-02.stg.iad2.fedoraproject.org - james - done
* bvmhost-x86-03.stg.iad2.fedoraproject.org - james - done
* bvmhost-x86-05.stg.iad2.fedoraproject.org - james - done
* vmhost-x86-01.stg.iad2.fedoraproject.org - james - done
* vmhost-x86-02.stg.iad2.fedoraproject.org - james - done
* vmhost-x86-05.stg.iad2.fedoraproject.org - kevin - done
* vmhost-x86-06.stg.iad2.fedoraproject.org - kevin - done
* vmhost-x86-07.stg.iad2.fedoraproject.org - kevin - done
* vmhost-x86-08.stg.iad2.fedoraproject.org - kevin - done
* vmhost-x86-09.stg.iad2.fedoraproject.org - kevin - done
* vmhost-x86-11.stg.iad2.fedoraproject.org - kevin - done
* vmhost-x86-12.stg.iad2.fedoraproject.org - kevin - done
= 2025-03-25 Tuesday - non outage causing
* autosign02.iad2.fedoraproject.org - needs robosig pass - kevin - done
* backup01.iad2.fedoraproject.org - needs backup pass - kevin - done
* dedicatedsolutions01.fedoraproject.org - greg - done
* ibiblio02.fedoraproject.org - james - done
* ibiblio05.fedoraproject.org - pmoura - done
* internetx02.fedoraproject.org - pmoura - in progress - done, reverse dns is busted. ;(
* osuosl02.fedoraproject.org - james - done
* proxy05.fedoraproject.org - error: bare metal - needs kevin - done
* retrace03.rdu-cc.fedoraproject.org - error: bare metal - needs kevin - done
* sign-vault02.iad2.fedoraproject.org - needs vault pass - kevin - done
* storinator01.rdu-cc.fedoraproject.org - error: bare metal - needs kevin - done
* bvmhost-x86-riscv01.iad2.fedoraproject.org - james - done
* vmhost-x86-cc02.rdu-cc.fedoraproject.org - james - done
* vmhost-x86-cc05.rdu-cc.fedoraproject.org - james - done
* vmhost-x86-cc06.rdu-cc.fedoraproject.org - james - done
* maintainer_test group - kevin - done
= 2025-03-26 Wed
21UTC - don't start before that
[ ] Apply updates to everything
[x] update status
[ ] silence nagios
[x] disable backups on backup01
[x] disable updates pushes on bodhi-backend01
[x] stop koschei scheduler
builders:
* bkernel01.iad2.fedoraproject.org - needs pesignd passphrase - kevin - done
* bkernel02.iad2.fedoraproject.org - needs pesignd passphrase - kevin - done
* buildhw-a64-03.iad2.fedoraproject.org - pmoura - done
* buildhw-a64-04.iad2.fedoraproject.org - james - done
* buildhw-a64-05.iad2.fedoraproject.org - james - done
* buildhw-a64-06.iad2.fedoraproject.org - james - done
* buildhw-x86-01.iad2.fedoraproject.org - james - done
* buildhw-x86-02.iad2.fedoraproject.org - james - done
* buildhw-x86-03.iad2.fedoraproject.org - james - done
* buildhw-x86-04.iad2.fedoraproject.org - james - done
* buildhw-x86-05.iad2.fedoraproject.org - james - done
* buildhw-x86-06.iad2.fedoraproject.org - james - done
* buildhw-x86-07.iad2.fedoraproject.org - james - done
* buildhw-x86-08.iad2.fedoraproject.org - james - done
* buildhw-x86-09.iad2.fedoraproject.org - james - done
* buildhw-x86-10.iad2.fedoraproject.org - james - done
* buildhw-x86-11.iad2.fedoraproject.org - james - done
* buildhw-x86-12.iad2.fedoraproject.org - james - done
* buildhw-x86-13.iad2.fedoraproject.org - james - done
* buildhw-x86-14.iad2.fedoraproject.org - pmoura - done
* buildhw-x86-15.iad2.fedoraproject.org - james - done
* buildhw-x86-16.iad2.fedoraproject.org - james - done
* bvmhost-a64-01.iad2.fedoraproject.org - greg - facts error, see below
* bvmhost-a64-02.iad2.fedoraproject.org - greg - done (after bad PXE boot)
* bvmhost-a64-03.iad2.fedoraproject.org - greg - done (after luks)
* bvmhost-a64-04.iad2.fedoraproject.org - kevin - playbook running to hopefully fix luks - done and fixed!
* bvmhost-x86-06.iad2.fedoraproject.org - greg - done
* bvmhost-x86-07.iad2.fedoraproject.org - greg - done
* bvmhost-p09-01.iad2.fedoraproject.org - network issue with 6.13.x - kevin - in progress
* bvmhost-p09-02.iad2.fedoraproject.org - network issue with 6.13.x - james - done
* bvmhost-p09-03.iad2.fedoraproject.org - network issue with 6.13.x - james - in progress
* bvmhost-p09-04.iad2.fedoraproject.org - network issue with 6.13.x
* bvmhost-p09-05.iad2.fedoraproject.org - james - done
* bvmhost-s390x-01.s390.fedoraproject.org - statistically speaking likely will not come back up nicely - kevin - done
non builders:
* bvmhost-x86-01.iad2.fedoraproject.org - greg - done
* bvmhost-x86-02.iad2.fedoraproject.org - greg - done
* bvmhost-x86-03.iad2.fedoraproject.org - greg - done
* bvmhost-x86-04.iad2.fedoraproject.org - greg - done
* bvmhost-x86-05.iad2.fedoraproject.org - greg - done
* vmhost-x86-01.iad2.fedoraproject.org (bastion01/batcave01) - kevin - done
* vmhost-x86-02.iad2.fedoraproject.org - james - done
* vmhost-x86-03.iad2.fedoraproject.org - james - done
* vmhost-x86-04.iad2.fedoraproject.org - IPA! - james - done -- Also contains noc01, and thus. nagios shush does't work anymore.
* vmhost-x86-05.iad2.fedoraproject.org - james - done
* vmhost-x86-06.iad2.fedoraproject.org - IPA! - done
* vmhost-x86-08.iad2.fedoraproject.org - pmoura - done -- [vmhost-x86-08.iad2.fedoraproject.org -> noc01.iad2.fedoraproject.org]: UNREACHABLE!
* vmhost-x86-cc01.rdu-cc.fedoraproject.org - pmoura - done
* vmhost-x86-cc03.rdu-cc.fedoraproject.org - pmoura - done
* proxy3*-40 ( aws ) - kevin - done
possible 40/41 upgrades:
proxy* - kevin - in progress
[x] start koschei scheduler
[x] enable backups on backup01
[x] enable updates pushes on bodhi-backend01
[x] update status
[ ] unsilence nagios
---
bvmhost-a64-01.iad2.fedoraproject.org error :
```
TASK [Gathering Facts] **********************************************************************************************************************
Wednesday 26 March 2025 22:16:27 +0000 (0:00:00.992) 0:00:16.774 *******
Wednesday 26 March 2025 22:16:27 +0000 (0:00:00.992) 0:00:16.774 *******
[WARNING]: Ignoring subset(None) for python3_fact
fatal: [buildvm-a64-01.iad2.fedoraproject.org]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "exception": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 92, in _ansiballz_main\n File \"/usr/lib64/python3.13/tempfile.py\", line 373, in mkdtemp\n prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)\n ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 126, in _sanitize_params\n dir = gettempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 315, in gettempdir\n return _os.fsdecode(_gettempdir())\n ~~~~~~~~~~~^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 308, in _gettempdir\n tempdir = _get_default_tempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 223, in _get_default_tempdir\n raise FileNotFoundError(_errno.ENOENT,\n \"No usable temporary directory found in %s\" %\n dirlist)\nFileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/root']\n", "failed": true, "module_stderr": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 92, in _ansiballz_main\n File \"/usr/lib64/python3.13/tempfile.py\", line 373, in mkdtemp\n prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)\n ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 126, in _sanitize_params\n dir = gettempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 315, in gettempdir\n return _os.fsdecode(_gettempdir())\n ~~~~~~~~~~~^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 308, in _gettempdir\n tempdir = _get_default_tempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 223, in _get_default_tempdir\n raise FileNotFoundError(_errno.ENOENT,\n \"No usable temporary directory found in %s\" %\n dirlist)\nFileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/root']\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}, "python3_fact": {"exception": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 92, in _ansiballz_main\n File \"/usr/lib64/python3.13/tempfile.py\", line 373, in mkdtemp\n prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)\n ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 126, in _sanitize_params\n dir = gettempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 315, in gettempdir\n return _os.fsdecode(_gettempdir())\n ~~~~~~~~~~~^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 308, in _gettempdir\n tempdir = _get_default_tempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 223, in _get_default_tempdir\n raise FileNotFoundError(_errno.ENOENT,\n \"No usable temporary directory found in %s\" %\n dirlist)\nFileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/root']\n", "failed": true, "module_stderr": "Traceback (most recent call last):\n File \"<stdin>\", line 107, in <module>\n File \"<stdin>\", line 92, in _ansiballz_main\n File \"/usr/lib64/python3.13/tempfile.py\", line 373, in mkdtemp\n prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)\n ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 126, in _sanitize_params\n dir = gettempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 315, in gettempdir\n return _os.fsdecode(_gettempdir())\n ~~~~~~~~~~~^^\n File \"/usr/lib64/python3.13/tempfile.py\", line 308, in _gettempdir\n tempdir = _get_default_tempdir()\n File \"/usr/lib64/python3.13/tempfile.py\", line 223, in _get_default_tempdir\n raise FileNotFoundError(_errno.ENOENT,\n \"No usable temporary directory found in %s\" %\n dirlist)\nFileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/root']\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}}, "msg": "The following modules failed to execute: python3_fact, ansible.legacy.setup\n"}
```