# Installing Xilinx smartSSD
###### tags: `LDRD`
## System layout
1. Host server (current): [Dell Precision Tower 7810](https://www.dell.com/support/manuals/en-us/precision-t7810-workstation/precision_t7810_om_pub/technical-specifications?guid=guid-1a5124e2-8da0-4083-915d-c96dfb9f8d90&lang=en-us)
1. Host server (incompatible at the moment, need to try with PCIe bifurcation with 1x16.): [Lenovo P710 tower](https://pcsupport.lenovo.com/pa/en/products/workstations/thinkstation-p-series-workstations/thinkstation-p710/30b6/30b6s14r00/mj05cayg)
1. [Xilinx smartSSD](https://www.xilinx.com/applications/data-center/computational-storage/smartssd.html) and its [manual](https://docs.xilinx.com/v/u/en-US/ug1382-smartssd-csd)
1. [starTech PCIex4 to u.2 adapter card](https://www.amazon.com/StarTech-com-U-2-PCIe-Adapter-PEX4SFF8639/dp/B072JK2XLC/ref=sr_1_3?dchild=1&keywords=U.2+TO+PCIE&qid=1631216068&sr=8-3)
1. [Honeywell Air Circulator fan](https://www.target.com/p/honeywell-turbo-force-table-air-circulator-fan-black/-/A-11153539?ref=tgt_adv_XS000000&AFID=google_pla_df&fndsrc=tgtao&DFA=71700000012764136&CPNG=PLA_Home%2BImprovement%2BShopping_Local%7CHome%2BImprovement_Ecomm_Home&adgroup=SC_Home%2BImprovement&LID=700000001170770pgs&LNM=PRODUCT_GROUP&network=g&device=c&location=9021681&targetid=pla-1410899086559&ds_rl=1246978&ds_rl=1247068&ds_rl=1248099&gclid=CjwKCAjw-ZCKBhBkEiwAM4qfF5PGWA5IDYeHdC-y9iibzrdB24BCH8yyqAUONZB_3aZ6pNi3_5n41RoCw9oQAvD_BwE&gclsrc=aw.ds)
1. [APC 7901 switched power strip](https://download.schneider-electric.com/files?p_File_Name=ASTE-6Z6KAM_R0_EN.pdf&p_Doc_Ref=SPD_ASTE-6Z6KAM_EN&p_enDocType=User+guide) (not yet connected), with NEMA locking plug changed to house plug, to be connected to Lenovo server via serial.
## Installation steps
### On Lenovo P710 Tower (`mwts`)
1. Put the smartSSD into the smartTech adapter card, and put the card into a PCIe slot (`Gen 3` or above);
1. Update BIOS on P710 server: Burn this [.iso](https://download.lenovo.com/pccbbs/thinkcentre_bios/s01j971usa.iso) file to CD, and reboot from CD to finish updating;
1. In BIOS settings, go to "Advanced -> PCI Subsystem Setting -> Above 4G Decoding", and enable it;
1. In BIOS settings, go to "Advanced -> IIO Configuration -> select the corresponding slot to set the bifurcation"
1. Add kernel parameters to the OS: `grubby --args="pci=assign-busses,hpbussize=4" --update-kernel ALL` and `grubby --args="realloc=on,hpmemsize=16G" --update-kernel ALL`
5. Reboot, and check the output of `lspci` to see if the Xilinx device is listed. -- ***It did not..***
### On Dell Precision Tower 7810 (`docker-bd`)
1. Put the adapter card with smartSSD into a PCIex16 slot;
2. Boot the system... and vola! `lspci` listed the following:
```
dingpf@docker-bd $ lspci|grep -i xilinx
04:00.0 PCI bridge: Xilinx Corporation Device 9134
05:00.0 PCI bridge: Xilinx Corporation Device 9234
05:01.0 PCI bridge: Xilinx Corporation Device 9434
07:00.0 Processing accelerators: Xilinx Corporation Device 6987
07:00.1 Processing accelerators: Xilinx Corporation Device 6988
```
3. `lsblk`
```
root@docker-bd:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
└─sda1 8:1 0 931.5G 0 part /home/dingpf
sdb 8:16 0 465.8G 0 disk
├─sdb1 8:17 0 433.9G 0 part /
├─sdb2 8:18 0 1K 0 part
└─sdb5 8:21 0 31.9G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
nvme0n1 259:1 0 3.5T 0 disk
```
4. Install softwares (Step 1-7 on page 13,14 of the manual);
5. Flash firmware `sudo /opt/xilinx/xrt/bin/xbmgmt flash --update --shell xilinx_u2_gen3x4_xdma_gc_base_2`
6. Cold reboot the server;
## Card bring-up and validation
### `lspic -vd 10ee:`
```
root@docker-bd:~# lspci -vd 10ee:
04:00.0 PCI bridge: Xilinx Corporation Device 9134 (prog-if 00 [Normal decode])
Physical Slot: 4
Flags: bus master, fast devsel, latency 0, NUMA node 0
Bus: primary=04, secondary=05, subordinate=07, sec-latency=0
I/O behind bridge: 00000000-00000fff [size=4K]
Memory behind bridge: fb100000-fb1fffff [size=1M]
Prefetchable memory behind bridge: 0000033c00000000-0000033e040fffff [size=8257M]
Capabilities: [40] Power Management version 3
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Upstream Port, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [1c0] Secondary PCI Express
Kernel driver in use: pcieport
05:00.0 PCI bridge: Xilinx Corporation Device 9234 (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 33, NUMA node 0
Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
I/O behind bridge: 00000000-00000fff [size=4K]
Memory behind bridge: fb100000-fb1fffff [size=1M]
Prefetchable memory behind bridge: [disabled]
Capabilities: [40] Power Management version 3
Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Downstream Port (Slot+), MSI 00
Capabilities: [100] Access Control Services
Capabilities: [1c0] Secondary PCI Express
Kernel driver in use: pcieport
05:01.0 PCI bridge: Xilinx Corporation Device 9434 (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, NUMA node 0
Bus: primary=05, secondary=07, subordinate=07, sec-latency=0
I/O behind bridge: 00000000-00000fff [size=4K]
Memory behind bridge: [disabled]
Prefetchable memory behind bridge: 0000033c00000000-0000033e040fffff [size=8257M]
Capabilities: [40] Power Management version 3
Capabilities: [70] Express Downstream Port (Slot-), MSI 00
Capabilities: [100] Access Control Services
Capabilities: [140] Secondary PCI Express
07:00.0 Processing accelerators: Xilinx Corporation Device 6987
Subsystem: Xilinx Corporation Device 1351
Flags: bus master, fast devsel, latency 0, NUMA node 0
Memory at 33e02000000 (64-bit, prefetchable) [size=32M]
Memory at 33e04010000 (64-bit, prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [60] MSI-X: Enable- Count=33 Masked-
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Secondary PCI Express
Capabilities: [180] Vendor Specific Information: ID=0040 Rev=0 Len=018 <?>
Capabilities: [400] Access Control Services
Capabilities: [480] Vendor Specific Information: ID=0020 Rev=0 Len=010 <?>
Kernel driver in use: xclmgmt
Kernel modules: xclmgmt
07:00.1 Processing accelerators: Xilinx Corporation Device 6988
Subsystem: Xilinx Corporation Device 1351
Flags: bus master, fast devsel, latency 0, NUMA node 0
Memory at 33e00000000 (64-bit, prefetchable) [size=32M]
Memory at 33e04000000 (64-bit, prefetchable) [size=64K]
Memory at 33c00000000 (64-bit, prefetchable) [size=8G]
Capabilities: [40] Power Management version 3
Capabilities: [60] MSI-X: Enable+ Count=32 Masked-
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [400] Access Control Services
Capabilities: [480] Vendor Specific Information: ID=0020 Rev=0 Len=010 <?>
Kernel driver in use: xocl
Kernel modules: xocl
```
### `lspci -vs 06:00.0`
```
root@docker-bd:~# lspci -vs 06:00.0
06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a825 (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device a815
Flags: bus master, fast devsel, latency 0, IRQ 29, NUMA node 0
Memory at fb110000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at fb100000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Device Serial Number 1f-07-50-11-91-38-25-00
Capabilities: [178] Secondary PCI Express
Kernel driver in use: nvme
Kernel modules: nvme
```
3. `lsblk`
```
root@docker-bd:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
└─sda1 8:1 0 931.5G 0 part /home/dingpf
sdb 8:16 0 465.8G 0 disk
├─sdb1 8:17 0 433.9G 0 part /
├─sdb2 8:18 0 1K 0 part
└─sdb5 8:21 0 31.9G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
nvme0n1 259:1 0 3.5T 0 disk
```
### nvme log
```
root@docker-bd:~# nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0x2
temperature : 85 C
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 0%
data_units_read : 20
data_units_written : 0
host_read_commands : 1,010
host_write_commands : 0
controller_busy_time : 0
power_cycles : 20
power_on_hours : 19
unsafe_shutdowns : 13
media_errors : 0
num_err_log_entries : 4
Warning Temperature Time : 9
Critical Composite Temperature Time : 549
Temperature Sensor 1 : 82 C
Temperature Sensor 2 : 84 C
Temperature Sensor 3 : 85 C
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
```
### `xbutil scan`
```
root@docker-bd:~# xbutil scan
---------------------------------------------------------------------
Deprecation Warning:
The given legacy sub-command and/or option has been deprecated
to be obsoleted in the next release.
Further information regarding the legacy deprecated sub-commands
and options along with their mappings to the next generation
sub-commands and options can be found on the Xilinx Runtime (XRT)
documentation page:
https://xilinx.github.io/XRT/master/html/xbtools_map.html
Please update your scripts and tools to use the next generation
sub-commands and options.
---------------------------------------------------------------------
INFO: Found total 1 card(s), 1 are usable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
System Configuration
OS name: Linux
Release: 5.4.0-84-generic
Version: #94-Ubuntu SMP Thu Aug 26 20:27:37 UTC 2021
Machine: x86_64
Model: Precision Tower 5810
CPU cores: 8
Memory: 32041 MB
Glibc: 2.31
Distribution: Ubuntu 20.04.1 LTS
Now: Thu Sep 16 20:43:16 2021 GMT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
XRT Information
Version: 2.11.634
Git Hash: 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
Git Branch: 2021.1
Build Date: 2021-06-08 22:08:45
XOCL: 2.11.634,5ad5998d67080f00bca5bf15b3838cf35e0a7b26
XCLMGMT: 2.11.634,5ad5998d67080f00bca5bf15b3838cf35e0a7b26
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[0] 0000:07:00.1 xilinx_u2_gen3x4_xdma_gc_base_2 user(inst=129)
```
### `xbmgmt flash --scan`
```
root@docker-bd:~# /opt/xilinx/xrt/bin/xbmgmt flash --scan
---------------------------------------------------------------------
Deprecation Warning:
The given legacy sub-command and/or option has been deprecated
to be obsoleted in the next release.
Further information regarding the legacy deprecated sub-commands
and options along with their mappings to the next generation
sub-commands and options can be found on the Xilinx Runtime (XRT)
documentation page:
https://xilinx.github.io/XRT/master/html/xbtools_map.html
Please update your scripts and tools to use the next generation
sub-commands and options.
---------------------------------------------------------------------
Card [0000:07:00.0]
Card type: u2
Flash type: SPI
Flashable partition running on FPGA:
xilinx_u2_gen3x4_xdma_gc_base_2,[ID=0x8c8dfd8818ab79b2],[SC=INACTIVE]
Flashable partitions installed in system:
xilinx_u2_gen3x4_xdma_gc_base_2,[ID=0x8c8dfd8818ab79b2]
```
### `xbutil validate -d 0000:07:00.1`
```
root@docker-bd:~# xbutil validate -d 0000:07:00.1
Starting validation for 1 devices
Validate Device : [0000:07:00.1]
Platform : xilinx_u2_gen3x4_xdma_gc_base_2
SC Version : 0.0.0
Platform ID : 0x0
-------------------------------------------------------------------------------
Test 1 [0000:07:00.1] : PCIE link
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 2 [0000:07:00.1] : SC version
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:07:00.1] : Verify kernel
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 4 [0000:07:00.1] : DMA
Details : Host -> PCIe -> FPGA write bandwidth = 71.359753 MB/s
Host <- PCIe <- FPGA read bandwidth = 71.354767 MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 5 [0000:07:00.1] : iops
Details : IOPS: 5050 (hello)
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 6 [0000:07:00.1] : Bandwidth kernel
Error(s) : Host buffer alignment 4096 bytes
Compiled kernel =
/opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/bandwidth.xclbin
Shell = b'xilinx_u2_gen3x4_xdma_gc_base_2'
Index = 0
PCIe = GEN3 x 4
OCL Frequency = (1, 0) MHz
DDR Bank = 0
Device Temp = 89 C
MIG Calibration = True
Finished downloading bitstream
/opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/bandwidth.xclbin
CU[0] b'bandwidth1:bandwidth1_1' @0x1810000
CU[1] b'bandwidth2:bandwidth2_1' @0x1820000
[0] b'bank0' @0x4000000000
LOOP PIPELINE 16 beats
Test 0, Throughput: 27 MB/s
LOOP PIPELINE 64 beats
Test 1, Throughput: 83 MB/s
LOOP PIPELINE 256 beats
Test 2, Throughput: 142 MB/s
LOOP PIPELINE 1024 beats
Test 3, Throughput: 142 MB/s
TTTT: 27
Maximum throughput: 142 MB/s
ERROR: Throughput is less than expected value of 10 GB/sec
FAILED TEST
Details : Maximum throughput: 142 MB/s
Test Status : [FAILED]
-------------------------------------------------------------------------------
Validation failed. Please run the command '--verbose' option for more details
Validation Summary
------------------
1 device(s) evaluated
0 device(s) validated successfully
1 device(s) had exceptions during validation
Validated successfully [0 device(s)]
Validation Exceptions [1 device(s)]
- [0000:07:00.1] : xilinx_u2_gen3x4_xdma_gc_base_2 : First failure: 'Bandwidth kernel'
Warnings produced during test [0 device(s)] (Note: The given test successfully validated)
```
This test failed because of low throughput rate than expected. I noticed that the smartSSD may not be sufficiently cooled while doing the validation. I did have a successful validation earlier today. 
After the card got heated, I don't see it listed anymore with `lsblk`
```
root@docker-bd:/home/dingpf# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
└─sda1 8:1 0 931.5G 0 part /home/dingpf
sdb 8:16 0 465.8G 0 disk
├─sdb1 8:17 0 433.9G 0 part /
├─sdb2 8:18 0 1K 0 part
└─sdb5 8:21 0 31.9G 0 part [SWAP]
sr0 11:0 1 1024M 0 rom
```
And `lspci -vs 06:00.0` gives me:
```
root@docker-bd:/home/dingpf# lspci -vs 06:00.0
06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a825 (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel modules: nvme
```
Did a cold reboot.
### Rerunning the validation, creating file system, and doing `fio` test.
```
root@docker-bd:~# source /opt/xilinx/xrt/setup.sh
XILINX_XRT : /opt/xilinx/xrt
PATH : /opt/xilinx/xrt/bin:/home/dingpf/.local/bin:/home/dingpf/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH : /opt/xilinx/xrt/lib:
PYTHONPATH : /opt/xilinx/xrt/python:
root@docker-bd:~# xbutil validate -d 0000:07:00.1 --verbose
Verbose: Enabling Verbosity
Starting validation for 1 devices
Validate Device : [0000:07:00.1]
Platform : xilinx_u2_gen3x4_xdma_gc_base_2
SC Version : 0.0.0
Platform ID : 0x0
-------------------------------------------------------------------------------
Test 1 [0000:07:00.1] : Aux connection
Description : Check if auxiliary power is connected
Details : Aux power connector is not available on this board
Test Status : [SKIPPED]
-------------------------------------------------------------------------------
Test 2 [0000:07:00.1] : PCIE link
Description : Check if PCIE link is active
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 3 [0000:07:00.1] : SC version
Description : Check if SC firmware is up-to-date
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 4 [0000:07:00.1] : Verify kernel
Description : Run 'Hello World' kernel test
Xclbin : /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/verify.xclbin
Testcase : /opt/xilinx/xrt/test/22_verify.py
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 5 [0000:07:00.1] : DMA
Description : Run dma test
Details : Host -> PCIe -> FPGA write bandwidth = 3308.324361 MB/s
Host <- PCIe <- FPGA read bandwidth = 3303.036681 MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 6 [0000:07:00.1] : iops
Description : Run scheduler performance measure test
Xclbin : /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/verify.xclbin
Testcase : /opt/xilinx/xrt/test/xcl_iops_test.exe
Details : IOPS: 161941 (hello)
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 7 [0000:07:00.1] : Bandwidth kernel
Description : Run 'bandwidth kernel' and check the throughput
Xclbin : /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/bandwidth.xclbin
Testcase : /opt/xilinx/xrt/test/23_bandwidth.py
Details : Maximum throughput: 15392 MB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 8 [0000:07:00.1] : Peer to peer bar
Description : Run P2P test
Details : bank0 validated
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 9 [0000:07:00.1] : Memory to memory DMA
Description : Run M2M test
Details : M2M is not available
Test Status : [SKIPPED]
-------------------------------------------------------------------------------
Test 10 [0000:07:00.1] : Host memory bandwidth test
Description : Run 'bandwidth kernel' when host memory is enabled
Details : Address translator IP is not available
Test Status : [SKIPPED]
-------------------------------------------------------------------------------
Test 11 [0000:07:00.1] : vcu
Description : Run decoder test
Details : Verify xclbin not available or shell partition is not
programmed. Skipping validation.
Test Status : [SKIPPED]
-------------------------------------------------------------------------------
Validation completed. Please run the command '--verbose' option for more details
Validation Summary
------------------
1 device(s) evaluated
1 device(s) validated successfully
0 device(s) had exceptions during validation
Validated successfully [1 device(s)]
- [0000:07:00.1] : xilinx_u2_gen3x4_xdma_gc_base_2
Validation Exceptions [0 device(s)]
Warnings produced during test [0 device(s)] (Note: The given test successfully validated)
Unsupported tests [1 device(s)]
- [0000:07:00.1] : xilinx_u2_gen3x4_xdma_gc_base_2 : Test(s): 'Aux connection', 'Memory to memory DMA',
'Host memory bandwidth test', vcu
root@docker-bd:~# mkfs.ext4 /dev/nvme0n1
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 937684566 4k blocks and 234422272 inodes
Filesystem UUID: eeedca44-6f0a-49b9-861a-30ef1858fa83
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
root@docker-bd:~#
root@docker-bd:~#
root@docker-bd:~# fio --name=rand-write --ioengine=libaio --iodepth=256 --rw=randwrite --bs=4k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n
1 --group_reporting=1
rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.16
Starting 12 processes
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=1023840710656, buflen=4096
fio: pid=16240, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=603464712192, buflen=4096
fio: pid=16244, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2122233233408, buflen=4096
fio: pid=16245, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=565072101376, buflen=4096
fio: pid=16235, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=484806328320, buflen=4096
fio: pid=16237, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
Jobs: 12 (f=12): [f(1),w(1),f(1),w(1),f(1),w(3),f(2),w(2)][66.7%][w=42.2MiB/s][w=10fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2002030358528, buflen=4096
fio: pid=16236, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=247143903232, buflen=4096
fio: pid=16243, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2218376523776, buflen=4096
fio: pid=16241, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2508332797952, buflen=4096
fio: pid=16246, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=1621292158976, buflen=4096
fio: pid=16242, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=613084512256, buflen=4096
fio: pid=16247, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=1119989624832, buflen=4096
fio: pid=16239, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error
Jobs: 10 (f=10): [f(7),X(1),f(2),X(1),f(1)][100.0%][eta 00m:00s]
rand-write: (groupid=0, jobs=12): err= 5 (file:io_u.c:1787, func=io_u error, error=Input/output error): pid=16235: Thu Sep 16 16:28:04 2021
write: IOPS=248k, BW=970MiB/s (1017MB/s)(37.8GiB/39917msec); 0 zone resets
slat (nsec): min=1547, max=15166k, avg=6264.92, stdev=9270.51
clat (usec): min=718, max=1091.7k, avg=12273.86, stdev=19672.83
lat (usec): min=722, max=1091.7k, avg=12280.34, stdev=19672.89
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 8],
| 30.00th=[ 9], 40.00th=[ 10], 50.00th=[ 11], 60.00th=[ 12],
| 70.00th=[ 14], 80.00th=[ 17], 90.00th=[ 21], 95.00th=[ 24],
| 99.00th=[ 31], 99.50th=[ 35], 99.90th=[ 104], 99.95th=[ 127],
| 99.99th=[ 1070]
bw ( KiB/s): min= 3030, max=1560517, per=100.00%, avg=1003779.43, stdev=17961.45, samples=947
iops : min= 756, max=390129, avg=250944.50, stdev=4490.37, samples=947
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.40%, 4=2.89%, 10=43.06%, 20=43.69%, 50=9.66%
lat (msec) : 100=0.17%, 250=0.07%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2000=0.03%
cpu : usr=9.43%, sys=12.62%, ctx=849954, majf=0, minf=237
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,9916743,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=970MiB/s (1017MB/s), 970MiB/s-970MiB/s (1017MB/s-1017MB/s), io=37.8GiB (40.6GB), run=39917-39917msec
Disk stats (read/write):
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
```
**Validation is passed succssfullly, as well as creating file system. However ~50% into the fio random read/write test, the test failed with I/O error. This is very likely due to insufficient cooling of the device.**
### Temperature readings after the cold boot:
```
root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 | grep temp
temperature : 69 C
root@docker-bd:/home/dingpf# #20% into Bandwidth kernel test
root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 |grep temp
root@docker-bd:/home/dingpf# # after validation
root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 | grep temp
temperature : 73 C
root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 | grep temp # idling for another 3 minutes
temperature : 78 C
```

Tempearture readings taken 1-2 minutes apart.

### **Decide to take the smartSSD off from the motherboard until we have a solution for the cooling issue.**
### This is how the card is installed currently.

### With a fan directing at the device.

## Raw SSD Read/Write test
* Random write test;
**`WRITE: bw=2362MiB/s (2477MB/s), 2362MiB/s-2362MiB/s (2477MB/s-2477MB/s), io=138GiB (149GB), run=60007-60007msec`**
* Random read test;
**`READ: bw=2867MiB/s (3007MB/s), 2867MiB/s-2867MiB/s (3007MB/s-3007MB/s), io=168GiB (180GB), run=60015-60015msec`**
* Sequetial write test;
**` WRITE: bw=2391MiB/s (2507MB/s), 2391MiB/s-2391MiB/s (2507MB/s-2507MB/s), io=141GiB (151GB), run=60284-60284msec`**
* Sequential read test;
**`READ: bw=2961MiB/s (3105MB/s), 2961MiB/s-2961MiB/s (3105MB/s-3105MB/s), io=175GiB (187GB), run=60356-60356msec`**
```=
root@docker-bd:~# fio --name=rand-write --ioengine=libaio --iodepth=256 --rw=randwr
ite --bs=4k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n
1 --group_reporting=1
rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.16
Starting 12 processes
Jobs: 12 (f=12): [w(12)][100.0%][w=2431MiB/s][w=622k IOPS][eta 00m:00s]
rand-write: (groupid=0, jobs=12): err= 0: pid=15750: Fri Sep 17 11:54:04 2021
write: IOPS=605k, BW=2362MiB/s (2477MB/s)(138GiB/60007msec); 0 zone resets
slat (nsec): min=1524, max=10231k, avg=4488.34, stdev=8496.11
clat (usec): min=17, max=162066, avg=5072.02, stdev=3473.33
lat (usec): min=24, max=162074, avg=5076.65, stdev=3473.43
clat percentiles (usec):
| 1.00th=[ 1614], 5.00th=[ 2376], 10.00th=[ 2769], 20.00th=[ 3195],
| 30.00th=[ 3556], 40.00th=[ 3982], 50.00th=[ 4490], 60.00th=[ 5080],
| 70.00th=[ 5800], 80.00th=[ 6587], 90.00th=[ 7898], 95.00th=[ 9110],
| 99.00th=[ 11994], 99.50th=[ 13698], 99.90th=[ 50594], 99.95th=[ 74974],
| 99.99th=[116917]
bw ( MiB/s): min= 340, max= 2803, per=99.96%, avg=2361.14, stdev=26.85, samples=1433
iops : min=87142, max=717598, avg=604452.42, stdev=6873.47, samples=1433
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.02%, 1000=0.06%
lat (msec) : 2=2.36%, 4=38.00%, 10=56.54%, 20=2.77%, 50=0.13%
lat (msec) : 100=0.08%, 250=0.02%
cpu : usr=16.19%, sys=20.36%, ctx=1648028, majf=0, minf=154
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=0,36286011,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=2362MiB/s (2477MB/s), 2362MiB/s-2362MiB/s (2477MB/s-2477MB/s), io=138GiB (149GB), run=60007-60007msec
Disk stats (read/write):
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
```
```=
root@docker-bd:~# fio --name=rand-read --ioengine=libaio --iodepth=256 --rw=randread --bs=4k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n1 --group_reporting=1
rand-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
...
fio-3.16
Starting 12 processes
Jobs: 12 (f=12): [r(12)][100.0%][r=1893MiB/s][r=485k IOPS][eta 00m:00s]
rand-read: (groupid=0, jobs=12): err= 0: pid=24614: Fri Sep 17 13:11:43 2021
read: IOPS=734k, BW=2867MiB/s (3007MB/s)(168GiB/60015msec)
slat (nsec): min=1483, max=16053k, avg=4234.76, stdev=6314.14
clat (usec): min=25, max=44358, avg=4177.38, stdev=2575.76
lat (usec): min=29, max=44362, avg=4181.78, stdev=2575.83
clat percentiles (usec):
| 1.00th=[ 619], 5.00th=[ 1106], 10.00th=[ 1500], 20.00th=[ 2057],
| 30.00th=[ 2573], 40.00th=[ 3097], 50.00th=[ 3621], 60.00th=[ 4228],
| 70.00th=[ 5014], 80.00th=[ 5932], 90.00th=[ 7504], 95.00th=[ 9110],
| 99.00th=[13304], 99.50th=[14746], 99.90th=[16450], 99.95th=[16581],
| 99.99th=[16909]
bw ( MiB/s): min= 1740, max= 3588, per=100.00%, avg=2869.41, stdev=48.36, samples=1435
iops : min=445623, max=918698, avg=734568.90, stdev=12379.40, samples=1435
lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.43%, 750=1.38%
lat (usec) : 1000=2.09%
lat (msec) : 2=14.89%, 4=37.50%, 10=40.41%, 20=3.28%, 50=0.01%
cpu : usr=18.96%, sys=28.65%, ctx=11591764, majf=0, minf=3200
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=44053216,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=2867MiB/s (3007MB/s), 2867MiB/s-2867MiB/s (3007MB/s-3007MB/s), io=168GiB (180GB), run=60015-60015msec
Disk stats (read/write):
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
```
```=
root@docker-bd:~# fio --name=seq-write --ioengine=libaio --iodepth=64 --rw=write --bs=1024k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n1 --group_reporting=1
seq-write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.16
Starting 12 processes
Jobs: 12 (f=12): [W(12)][100.0%][w=2402MiB/s][w=2402 IOPS][eta 00m:00s]
seq-write: (groupid=0, jobs=12): err= 0: pid=28762: Fri Sep 17 13:15:03 2021
write: IOPS=2390, BW=2391MiB/s (2507MB/s)(141GiB/60284msec); 0 zone resets
slat (usec): min=38, max=121102, avg=1548.95, stdev=3981.32
clat (msec): min=8, max=939, avg=319.50, stdev=120.71
lat (msec): min=8, max=948, avg=321.05, stdev=122.10
clat percentiles (msec):
| 1.00th=[ 48], 5.00th=[ 99], 10.00th=[ 148], 20.00th=[ 213],
| 30.00th=[ 259], 40.00th=[ 300], 50.00th=[ 338], 60.00th=[ 372],
| 70.00th=[ 397], 80.00th=[ 418], 90.00th=[ 443], 95.00th=[ 481],
| 99.00th=[ 609], 99.50th=[ 684], 99.90th=[ 827], 99.95th=[ 852],
| 99.99th=[ 877]
bw ( MiB/s): min= 1106, max= 4361, per=99.93%, avg=2388.90, stdev=48.92, samples=1440
iops : min= 1106, max= 4361, avg=2388.46, stdev=48.93, samples=1440
lat (msec) : 10=0.02%, 20=0.02%, 50=1.10%, 100=4.10%, 250=22.66%
lat (msec) : 500=68.29%, 750=3.56%, 1000=0.26%
cpu : usr=1.67%, sys=1.62%, ctx=126200, majf=0, minf=148
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,144118,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=2391MiB/s (2507MB/s), 2391MiB/s-2391MiB/s (2507MB/s-2507MB/s), io=141GiB (151GB), run=60284-60284msec
Disk stats (read/write):
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
```
```=
root@docker-bd:~# fio --name=seq-read --ioengine=libaio --iodepth=64 --rw=read --bs=1024k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n1 --group_reporting=1
seq-read: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.16
Starting 12 processes
Jobs: 12 (f=12): [R(12)][100.0%][r=1887MiB/s][r=1886 IOPS][eta 00m:00s]
seq-read: (groupid=0, jobs=12): err= 0: pid=32251: Fri Sep 17 13:17:46 2021
read: IOPS=2960, BW=2961MiB/s (3105MB/s)(175GiB/60356msec)
slat (usec): min=22, max=41923, avg=1719.48, stdev=3517.71
clat (msec): min=24, max=973, avg=257.35, stdev=129.26
lat (msec): min=25, max=994, avg=259.07, stdev=130.95
clat percentiles (msec):
| 1.00th=[ 39], 5.00th=[ 69], 10.00th=[ 99], 20.00th=[ 146],
| 30.00th=[ 184], 40.00th=[ 222], 50.00th=[ 257], 60.00th=[ 284],
| 70.00th=[ 300], 80.00th=[ 334], 90.00th=[ 426], 95.00th=[ 502],
| 99.00th=[ 659], 99.50th=[ 735], 99.90th=[ 844], 99.95th=[ 877],
| 99.99th=[ 944]
bw ( MiB/s): min= 894, max= 6316, per=100.00%, avg=2965.37, stdev=87.73, samples=1440
iops : min= 894, max= 6316, avg=2965.06, stdev=87.74, samples=1440
lat (msec) : 50=2.32%, 100=8.08%, 250=37.52%, 500=46.95%, 750=4.72%
lat (msec) : 1000=0.40%
cpu : usr=0.22%, sys=2.31%, ctx=158027, majf=0, minf=196739
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=178708,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=2961MiB/s (3105MB/s), 2961MiB/s-2961MiB/s (3105MB/s-3105MB/s), io=175GiB (187GB), run=60356-60356msec
Disk stats (read/write):
nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
```
## Device temperature readings
The device stays at around 56C when idling and reaches ~60C while doing intensive read/writes (CPU<->SSD).
```=
# Code boot from room temperature, kept idling for 5 minutes
temperature : 49 C
# before running xbmgt validation
temperature : 49 C
# started validation; readings taken ~10-20s apart
temperature : 49 C
temperature : 50 C
temperature : 50 C
temperature : 51 C
temperature : 53 C
temperature : 55 C
temperature : 56 C
temperature : 55 C
temperature : 55 C
# finished validation
# idling for 3 minutes
temperature : 55 C
root@docker-bd:~# # started fio random write test
temperature : 55 C
temperature : 55 C
temperature : 55 C
temperature : 56 C
temperature : 56 C
temperature : 58 C
root@docker-bd:~# # end of ramdon write test
temperature : 57 C
# idling for 2 minutes
temperature : 57 C
temperature : 57 C
temperature : 57 C
temperature : 56 C
# started random read test
temperature : 56 C
# 10s
temperature : 57 C
temperature : 57 C
# finished random read test
temperature : 57 C
# begin seq write test
temperature : 57 C
temperature : 57 C
# 10s
temperature : 57 C
temperature : 58 C
# 20s
temperature : 59 C
temperature : 59 C
temperature : 59 C
# finished seq write
temperature : 60 C
# seq read test
temperature : 59 C
temperature : 59 C
# 10s
temperature : 59 C
temperature : 59 C
# 20s
temperature : 59 C
temperature : 59 C
# 40s
temperature : 59 C
temperature : 59 C
# finished seq read test
temperature : 59 C
# 5 minutes idling
temperature : 56 C
```
## Remote power-cycle
1. Connect to `mwts.fnal.gov` via ssh;
2. Connect to its serial port with `screen /dev/ttyS0 9600,cs8`, and keep press "Enter" until "User Name" prompt shows up;
3. Use default `apc` username and password to login to the power strip's console;
4. Plug 1-3 are labeled: 1 for the Dell Tower where the smartSSD is installed, 2 for the Honeywell fan, 3 for the monitor;
5. APC's control user interface is pretty self-explained, simply follow the prompted menu items in the output to power-cycle plug 1 (refer to the following GIF to see it in action);
6. when finished, use `ctrl+a` followed by `\` to close the `screen` session;
7. you can also leave the `screen` session open, and use `screen -r` to resume, or `screen -ls` to list all sessions and use `screen -r <session_id>` if there are multiple sessions opened.
The Dell Tower has been configured to start booting when power is restored. So power on/off the plug via the APC console should be enough for a "cold" boot.

## P2P read/write test
### A brief summary of P2P test
[DUNE FD DAQ requirements](https://docs.dunescience.org/cgi-bin/sso/ShowDocument?docid=11314) state that
> “10 Gb/s average storage throughput; 100 Gb/s peak temporary storage throughput per single phase detector module”;
and
> “Average throughput estimated from physics and calibration requirements; peak throughput allowing for fast storage of SNB data ($\sim 10^4$ seconds to store 120 TB of data).”
The P2P test results shows, for example:
```
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 72729us 100.00% 1759.96MB/s
p2p 56981us 78.35% 2246.36MB/s
kernel 77416us 106.44% 1653.40MB/s
XDMA 66177us 90.99% 1934.21MB/s
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 106873us 100.00% 1197.68MB/s
p2p 59548us 55.72% 2149.53MB/s
kernel 56630us 52.99% 2260.29MB/s
XDMA 42131us 39.42% 3038.14MB/s
```
:bulb: If our firmware does not bring too much slowdown, 10 smart SSDs might be able to handle the throughput. However that does not satisfy the total storage requirement of 12TB, with each SSD being 3.5TB. That will require us to have ~40 smartSSDs per module, which I think is still very reasonable. With that 40 smartSSDs, each would handle 315MB/s throughput, at 20%-30% of the ultimate read/write speed shown in the p2p tests. That might give us a comfortable margin to play with compression/pattern recognition ML algorithms.
* P2P Read test, the data flows from the SSD -> FPGA DDR -> Byte Copy Read (from FPGA DDR) -> Byte Copy Write (into FPGA DDR) -> Host DDR. The flow of data from SSD to FPGA DDR is called P2P Read.
* P2P Write test, the data flows from Host DDR -> FPGA DDR -> Byte Copy Read from FPGA DDR -> Byte Copy Write into FPGA DDR -> SSD. The flow of data from FPGA DDR to SSD is called P2P Write.
### P2P test tools
The testing tools (with prebuilt firmware) was obtained from Xilinx. They are placed under `/root/ug1382-smartssd-csd/scripts/shell_version_independent/xilinx-hotplug-tool`.
```
root@docker-bd:~# tree /root/ug1382-smartssd-csd/
/root/ug1382-smartssd-csd/
└── scripts
├── readme.txt
├── shell_version_dependent
│ ├── bandwidth.xclbin
│ ├── bytecopy_async.exe
│ ├── bytecopy.exe
│ ├── bytecopy.xclbin
│ ├── kernel_bw.exe
│ ├── README.txt
│ ├── run_aync_bytecopy.sh
│ ├── run_bytecopy.sh
│ ├── validate.exe
│ ├── verify.xclbin
│ └── xrt.ini
└── shell_version_independent
├── Flat_shell_WBSTAR_Python_Script
│ └── WbstarFlow.py
├── PCIe_RW_scripts
│ ├── axi_i2c_read.sh
│ ├── clk_scaling_IP_reg.sh
│ ├── clk_thrt_en.sh
│ ├── clk_thrt_set_limit.sh
│ ├── data_log
│ │ ├── board_power_data
│ │ ├── fpga_temp_data
│ │ └── krnl_freq_data
│ ├── ddr4_access_check.sh
│ ├── fpga_dna_data_rd.sh
│ ├── plot_graph_clk_scaling.py
│ ├── README.txt
│ ├── run_all.sh
│ ├── rwmem
│ ├── shell_board_revision_check.sh
│ ├── shell_feature_rom_access_check.sh
│ ├── shell_firmware_load_check.sh
│ └── shell_version_check.sh
└── xilinx-hotplug-tool
├── pci
│ ├── __init__.py
│ └── pci_devices.py
├── README
├── utils
│ ├── common.py
│ ├── __init__.py
│ ├── parsing.py
│ └── util_cmds.py
└── xilinx-hotplug.py
9 directories, 38 files
```
### How to run the P2P test
**To run the tests, firstly source the `xrt` setup script:** ` source /opt/xilinx/xrt/setup.sh`. (Note: :bulb: Some of the scripts has `\r` in the text, which may causes bash errors when executing. Use ` sed -i 's/\r$//' run_all.sh` to replace those.)
Under `/root/ug1382-smartssd-csd/scripts/shell_version_dependent`:
1. To run Hello world Kernel: `./validate.exe verify.xclbin`
2. To run Bandwidth Kernel: `./kernel_bw.exe bandwidth.xclbin`
3. To run regular/sync bytecopy kernel: `./run_bytecopy.sh`
4. To run aync_bytecopy kernel: `./run_aync_bytecopy.sh`
:bulb: **The tests took less than 10 minutes. The device stayed around 56C throughout the tests.**
### Test results
#### Hello World Kernel
```shell=
root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./validate.exe verify.xclbin
CL_PLATFORM_VENDOR Xilinx
CL_PLATFORM_NAME Xilinx
Get 1 devices
Using 1th device
loading verify.xclbin
RESULT:
Hello World
```
#### Bandwidth Kernel
```shell=
root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./kernel_bw.exe bandwidth.xclbin
Found 1 compute devices!:
loading bandwidth.xclbin
LOOP PIPELINE 16 beats
Test : 0, Throughput: 5007.841797 MB/s
LOOP PIPELINE 64 beats
Test : 1, Throughput: 11004.667969 MB/s
LOOP PIPELINE 256 beats
Test : 2, Throughput: 15761.664062 MB/s
LOOP PIPELINE 1024 beats
Test : 3, Throughput: 16025.180664 MB/s
TTTT : 5007.841797
Maximum throughput: 16025.180664 MB/s
PASSED
```
#### Regular/Sync Bytecopy Kernel
```shell=
root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./run_bytecopy.sh
iteration 0
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
^[[BINFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
INFO: Kick off test
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 73778us 100.00% 1734.93MB/s
p2p 58146us 78.81% 2201.36MB/s
kernel 75924us 102.91% 1685.90MB/s
XDMA 69504us 94.21% 1841.62MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 1
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
INFO: Kick off test
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 105184us 100.00% 1216.92MB/s
p2p 54818us 52.12% 2335.00MB/s
kernel 56037us 53.28% 2284.21MB/s
XDMA 41789us 39.73% 3063.01MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 2
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
INFO: Kick off test
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 72691us 100.00% 1760.88MB/s
p2p 56693us 77.99% 2257.77MB/s
kernel 82243us 113.14% 1556.36MB/s
XDMA 72417us 99.62% 1767.54MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 3
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
INFO: Kick off test
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 106601us 100.00% 1200.74MB/s
p2p 58838us 55.19% 2175.46MB/s
kernel 56561us 53.06% 2263.04MB/s
XDMA 42195us 39.58% 3033.53MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 4
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
INFO: Kick off test
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 72729us 100.00% 1759.96MB/s
p2p 56981us 78.35% 2246.36MB/s
kernel 77416us 106.44% 1653.40MB/s
XDMA 66177us 90.99% 1934.21MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 5
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
INFO: Kick off test
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 106873us 100.00% 1197.68MB/s
p2p 59548us 55.72% 2149.53MB/s
kernel 56630us 52.99% 2260.29MB/s
XDMA 42131us 39.42% 3038.14MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 6
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 8 pipelines
########
##
## Pressed Ctrl+c to exit...
##
########
root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent#
```
#### Async_bytecopy Kernel
```shell=
^Croot@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./run_aync_bytecopy.sh
iteration 0
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
INFO: Kick off test
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 69095us 100.00% 1852.52MB/s
p2p 39814us 57.62% 3214.95MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 1
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
INFO: Kick off test
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 106198us 100.00% 1205.30MB/s
p2p 60463us 56.93% 2117.00MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 2
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
INFO: Kick off test
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 68189us 100.00% 1877.14MB/s
p2p 40048us 58.73% 3196.16MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 3
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
INFO: Kick off test
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 106168us 100.00% 1205.64MB/s
p2p 60204us 56.71% 2126.10MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 4
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
INFO: Kick off test
SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST
overall 67784us 100.00% 1888.35MB/s
p2p 39976us 58.98% 3201.92MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 5
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
INFO: Kick off test
HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD
overall 107446us 100.00% 1191.30MB/s
p2p 61535us 57.27% 2080.12MB/s
INFO: Evaluating test result
INFO: Test passed
iteration 6
INFO: Successfully opened NVME SSD /dev/nvme0n1
Detected 1 devices, using the 0th one
INFO: Importing ./bytecopy.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: Preparing 131072KB test data in 32 pipelines
########
##
## Pressed Ctrl+c to exit...
##
########
root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent#
```
### Install Vitis



