# Install and config Xilinx Alveo U55C card
###### tags: `LDRD`
## Server access information
The server is named as `isc01.fnal.gov`. It is placed in FCC2 computing room. If physical access is needed, contact David Fagan or someone else in the SSI group.
To power cycle the node, or get a console to it, do:
```
root@ssiconsole4
cons isc01
# will bring up a console window
pnode isc01 <on/off/status/reboot/cycle>
# will cycle via IPMI power
# In case the connection is slow or if need to access BIOS,
# open the console using a web browser.
ssh -D <port_number> root@ssiconsole4
# Then locally, set browser to use sockets://localhost:<port_number>,
# and open a webpage to 192.168.56.45.
```
## Hardware installation
* The card is plugged into slot 10 on the riser card in `isc01.fnal.gov`. The corresponding BIOS bifucation setting port is `JS4A`.
* The bifucaiton setting should be changed to `x16` instead of `Auto` for the card to show up.
* Here is a one-to-one mapping of the physical port to the BIOS `JS` port.

* The card can take an optional power cable to have a bigger power budget, (75W - no add-on power, 100W - small add-on power, 250W - full add-on power?)
* Jerry from KOI provided a GPU power cable, which was too short to be connected, but it can use the power extension cable came with the server, and be plugged into the blue sockets underneath the edge of the riser card.
## Software installation
### Installing system dependencies
* The server is installed with both Scientific Linux 7 and CentOS Stream 8;
* By default the server will boot into CS8, with kernel minor version being `338`, which is supported by the Xilinx software (the latest kernel version, beyond 394, is not supported yet).
* Instaled `kernel-devel-4.18.0-338.el8.x86_64` matching the kernel version used.
```bash=
yum --showduplicates list kernel-devel
# find the same kernel version as 338, and install
yum install -y kernel-devel-4.18.0-338.el8
```
* Followed [this guide](https://www.golinuxcloud.com/change-default-kernel-version-rhel-centos-8/) to set the default grub entry to kernel `338`.
* Installed `kernel-headers-4.18.0-338.el8.x86_64`
### Installing `xrt`
==Jovan==
* Tried to install `yum install ./xrt_202120.2.12.427_8.1.1911-x86_64-xrt.rpm` but still have an error:
```
Loading new xrt-2.12.427 DKMS files...
Building for 4.18.0-338.el8.x86_64
Building initial module for 4.18.0-338.el8.x86_64
Error! Build of xocl.ko failed for: 4.18.0-338.el8.x86_64 (x86_64)
Make sure the name of the generated module is correct and at the root of the
build directory, or consult make.log in the build directory
/var/lib/dkms/xrt/2.12.427/build for more information.
```
The error is:
```
CC [M] /var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_drv.o
CC [M] /var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_errors.o
CC [M] /var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_bo.o
/var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_bo.c: In function ‘xocl_gem_prime_import_sg_table’:
/var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_bo.c:1175:8: error: implicit declaration of function ‘drm_prime_sg_to_page_addr_arrays’; did you mean ‘drm_prime_sg_to_dma_addr_array’? [-Werror=implicit-function-declaration]
ret = drm_prime_sg_to_page_addr_arrays(sgt, importing_xobj->pages,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drm_prime_sg_to_dma_addr_array
/var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_bo.c: In function ‘xocl_gem_prime_mmap’:
/var/lib/dkms/xrt/2.12.427/build/driver/xocl/userpf/xocl_bo.c:1260:39: error: ‘const struct drm_driver’ has no member named ‘gem_vm_ops’
vma->vm_ops = xobj->base.dev->driver->gem_vm_ops;
^~
cc1: some warnings being treated as errors
```
Just in case, since the instructions say to reboot before installing the xrt, I tried rebooting, but again I got the newer 408 kernel.
==Pengfei==
* According to the [manual](https://docs.xilinx.com/r/en-US/ug1120-alveo-platforms/U55C), the XRT version supporting U55C is `2022.1`.
* [XRT 2022.1 release notes](https://www.xilinx.com/applications/data-center/high-performance-computing/u55c.html#clustering)
* [Download](https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html#gettingStarted) page for `xrt`.

#### Installing `xrt`:
1. Removed preivous version `yum remove xrt-2.12.427-1.x86_64`;
2. Installed `tcsh` since `xrt` requires `csh`;
3. Installed new version with all dependencies `yum localinstall xrt_202210.2.13.466_8.1.1911-x86_64-xrt.rpm` (this finished successfuly).
#### Installing `Vitis`:

Manual can be found [here](https://docs.xilinx.com/r/en-US/ug1393-vitis-application-acceleration/OpenCL-Installable-Client-Driver-Loader).
==Jovan==
* Installed the Deployment Target Platform and Development Target Platform corresponding to release 2022.1 for the U55C.
* With Pengfei's help we flashed the card: `/opt/xilinx/xrt/bin/xbmgmt program --base --device 0000:28:00.0 --image xilinx_u55c_gen3x16_xdma_base_3`
* Did a cold reboot and things seem to function correctly
* Running validation:
```
[jmitrevs@isc01 ~]$ /opt/xilinx/xrt/bin/xbutil validate --device 0000:28:00.1
Starting validation for 1 devices
Validate Device : [0000:28:00.1]
Platform : xilinx_u55c_gen3x16_xdma_base_3
SC Version : 7.1.17
Platform ID : 97088961-FEAE-DA91-52A2-1D9DFD63CCEF
-------------------------------------------------------------------------------
Test 1 [0000:28:00.1] : pcie-link
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 2 [0000:28:00.1] : sc-version
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:28:00.1] : verify
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 4 [0000:28:00.1] : dma
Details : Buffer size - '16 MB'
Host -> PCIe -> FPGA write bandwidth = 12002.2 MB/s
Host <- PCIe <- FPGA read bandwidth = 12243.8 MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 5 [0000:28:00.1] : iops
Details : IOPS: 366950 (verify)
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 6 [0000:28:00.1] : mem-bw
Details : Throughput (Type: HBM) (Bank count: 1) : 12099.4MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 7 [0000:28:00.1] : vcu
Validation completed. Please run the command '--verbose' option for more details
```
Asside: The SmartSSD is currently not supported in 2022.1.
* In Scientific Linux 7 we installed release Vitis 2021.2. These are the validation results there:
```
[jmitrevs@isc01 ~]$ xbutil validate --device 0000:28:00.1
Starting validation for 1 devices
Validate Device : [0000:28:00.1]
Platform : xilinx_u55c_gen3x16_xdma_base_2
SC Version : 7.1.14
Platform ID : FCC68C40-94CC-3A1E-9A6C-A76BCA494718
-------------------------------------------------------------------------------
Test 1 [0000:28:00.1] : PCIE link
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 2 [0000:28:00.1] : SC version
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:28:00.1] : Verify kernel
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 4 [0000:28:00.1] : DMA
Details : Host -> PCIe -> FPGA write bandwidth = 9766.1 MB/s
Host <- PCIe <- FPGA read bandwidth = 3450.2 MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 5 [0000:28:00.1] : iops
Details : IOPS: 440531 (verify)
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 6 [0000:28:00.1] : Bandwidth kernel
Details : Throughput (Type: HBM) (Bank count: 1) : 12097.4MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 7 [0000:28:00.1] : vcu
Validation completed. Please run the command '--verbose' option for more details
```
DMA seems significantly lower, though IOPs is higher.
* Because gcc is quite old in Scientific Linux 7, it is useful to use a newer version for development. To easily enable newer software, I installed `cvmfs`, from where newer versions of `gcc`, `git`, and `BOOST` can be easily enabled.
* Found 2021.2 still has the Y2K22 bug, so I installed the patch, but only for the SL7 version. (We may need to install it on the CentOS 8 side, too.)