General notes on Fedora Hardware Installation

# General notes on Fedora Hardware Installation ## Order hardware and deal with PNT/IT tickets Basically there needs to be an SOP on how to order hardware for CPE. In the old days, you would ask our hardware resellers for quotes and then work with Aoife to get them into the Finance purchase systems depending on whether they were OPEX (small cash) or CAPEX (large cash) purchases. That changes often and so will be out of date by the time you are reading this. Last time (2021-05) purchases go something like this: 1. Person getting quotes needs to have done various training and be approved to make purchases. [Aoife is currently that person.] 2. Quotes need to be done via a 3rd party system finance has put in. Training goes over that system. Since Aoife (or whoever will be approved to do this) does not have first hand hardware knowledge, please do the following: 1. Go to hardware manufacturer website. 2. Spec out the system you want (CPU type/#cores, memory, disks etc). 3. Get a PDF of that quote and work with Aoife to get it into the quote system. 3. Hardware is then ordered through the ServiceNow system with a lot of tickets. 1. A ticket from RH Engineering finance that you have a purchase this quarter. 2. A ticket you will make under this ticket to say you will make a purchase with this PO. 3. A ticket for rack and stack you will need with Red Hat IT (NOT Engineering IT). They will need to know 1. Which colocation 2. When is it to arrive (the colos have ticket systems to get hardware in and if unexpected hardware arrives it is shipped back at our expense) 3. Generally where you want it in the racks. 4. Any special needs for this hardware (it needs a serial line run to a server etc.) 4. Work with the reseller to get shipping information and hopefully the mac addresses of the hardware. 4. Deal with the tickets and add info to them regularly with a 'please close now' at the end. In the past, Red Hat has paid for hardware at OSUOSL and Ibiblio as secondary locations. However, getting hardware has been harder as Red Hat and these data centers requires more controls on items in it. If renewing this hardware, check to see if Red Hat management wants it there anymore. If not, start moving the virtual systems elsewhere. ## Get hardware racked When new hardware comes into the IAD2 or RDU2 datacenter, there are going to be a slew of tickets to get it racked and stacked. After it is put into the racks, the tickets usually contain data about: * power ports on PDU * rack location * switch port for management ports * switch port for regular network lines * maybe a serial port (but only if requested in original ticket. Or you may be told that this has been entered into a DCIM which is internal to Red Hat. If so get the information into a central data sheet. ## Get hardware networked Once you have the switch and ports identified for this hardware, work with Red Hat IT (via Servicenow tickets) to get the ports activated. Each datacentre has different VLANs for different subnets. The primary one to get done first is the management as you will need mac addresses of various ports to help IT locate them on switches later. Once the management ports are activated, you will need to log into the DHCP server for that dc. In IAD2 it is currently noc01 and in RDU-CC it should be cloud-noc-os01.rdu-cc.fedoraproject.org. Also look in the ansible to find the current range of 'freely assigned range'. For example at the time of writing the range in IAD2 is `range 10.3.160.200 10.3.160.249;` Look in the logs for a new host getting a temporary IP address, and nmap that host to see what ports are open. If the host has 80,443, and some others it is probably the mgmt interface of the host. If it does not, then it may be a different network port on the device trying to PXE boot. In either case, record the mac address you see. If we are dealing with a mgmt interface, log into it via the Red Hat VPN or via shuttle to a host. Usually the hardware has a default login/password like `root / calvin` for Dell. However newer hardware ships with more random passwords which you will need to get from the rack and stack person. Once you have gotten into the management, there are usually a set of configs to change. {Needs to be filled in by Nirik for Fedora} The CentOS infra has all of this done via ipmi so that it can be done via ansible. {Look to see if this can be done in Fedora.} From the mgmt interface get the mac addresses for the other interfaces. You will need to open a ticket with IT to get them on the correct networks (staging/production/build/etc) ## Installation of Hosts 1. Assign forward and reverse DNS entries for the host. **NOTE** Always look for duplicates of that ip address. Sometimes people forget to remove or put in both forward and reverse. 2. Edit the networks dhcpd.conf to have the primary mac address of the host and its assigned ip address. Entries usually look something like ``` host vmhost-x86-03 { hardware ethernet E4:43:4B:B1:28:CC; fixed-address 10.3.163.13; filename "uefi/grubx64.efi"; next-server 10.3.163.10; option routers 10.3.163.254; option subnet-mask 255.255.255.0; } ``` 3. Get a remote console off of the mgmt, and check to see if the box gets a pxe address. If it does you may be able to start installing it , but if it doesn't then you will need to work with IT on where that mac address thinks it is. ## Writing/Editing a kickstart A lot of kickstarts have been prewritten in the Fedora kickstarts repository on batcave. They are classified by type of install, OS, hardware architecture, and number of disks. So if you have a hardware RHEL-8 x86 with 8 disks you use: `hardware-rhel-8-x86_64-08disk-iad2` If there isn't one, then you usually do a ``` git pull cp <kickstart which is closest> <new kickstartname> git add <new kickstartname> git commit -a -m 'inital add of <new kickstartname>' edit the kickstart to meet what you need git commit -a <add long form going over what the kickstart does and why it is there.> ``` ## Installation Generally at this point, it is a matter of artisinally crafting the hardware. Boot the system, let pxe/kickstart build the system, log into the new system, see if it actually worked (maybe the drives in the kickstart said `/dev/sda` but this new hardware actually calls it `/dev/dumptya`). If the build worked, then add it to the ansible inventory for the class of systems it will be working on. Run the playbooks and make sure it 'works'. If it doesn't you may need to go all the way back to a pxe kickstart because / needed N more disk or something. (This is very very rare these days). ## Things I would like to have made better 1. Automate the installs more. There is too much hand tuning. 2. Make a general kickstart templates in jinja that can be built out per host. The playbook run then makes the appropriate kickstarts in noc01 pxe directories so that it can reboot/find the correct template if needed. 3. Make a ipmitool the tool which kicks off the mgmt configs via ansible. Ansible should maintain what the mac address are, what the ip address should be, and an ad-hoc playbook does all that magical work.