# Installing Xilinx smartSSD ###### tags: `LDRD` ## System layout 1. Host server (current): [Dell Precision Tower 7810](https://www.dell.com/support/manuals/en-us/precision-t7810-workstation/precision_t7810_om_pub/technical-specifications?guid=guid-1a5124e2-8da0-4083-915d-c96dfb9f8d90&lang=en-us) 1. Host server (incompatible at the moment, need to try with PCIe bifurcation with 1x16.): [Lenovo P710 tower](https://pcsupport.lenovo.com/pa/en/products/workstations/thinkstation-p-series-workstations/thinkstation-p710/30b6/30b6s14r00/mj05cayg) 1. [Xilinx smartSSD](https://www.xilinx.com/applications/data-center/computational-storage/smartssd.html) and its [manual](https://docs.xilinx.com/v/u/en-US/ug1382-smartssd-csd) 1. [starTech PCIex4 to u.2 adapter card](https://www.amazon.com/StarTech-com-U-2-PCIe-Adapter-PEX4SFF8639/dp/B072JK2XLC/ref=sr_1_3?dchild=1&keywords=U.2+TO+PCIE&qid=1631216068&sr=8-3) 1. [Honeywell Air Circulator fan](https://www.target.com/p/honeywell-turbo-force-table-air-circulator-fan-black/-/A-11153539?ref=tgt_adv_XS000000&AFID=google_pla_df&fndsrc=tgtao&DFA=71700000012764136&CPNG=PLA_Home%2BImprovement%2BShopping_Local%7CHome%2BImprovement_Ecomm_Home&adgroup=SC_Home%2BImprovement&LID=700000001170770pgs&LNM=PRODUCT_GROUP&network=g&device=c&location=9021681&targetid=pla-1410899086559&ds_rl=1246978&ds_rl=1247068&ds_rl=1248099&gclid=CjwKCAjw-ZCKBhBkEiwAM4qfF5PGWA5IDYeHdC-y9iibzrdB24BCH8yyqAUONZB_3aZ6pNi3_5n41RoCw9oQAvD_BwE&gclsrc=aw.ds) 1. [APC 7901 switched power strip](https://download.schneider-electric.com/files?p_File_Name=ASTE-6Z6KAM_R0_EN.pdf&p_Doc_Ref=SPD_ASTE-6Z6KAM_EN&p_enDocType=User+guide) (not yet connected), with NEMA locking plug changed to house plug, to be connected to Lenovo server via serial. ## Installation steps ### On Lenovo P710 Tower (`mwts`) 1. Put the smartSSD into the smartTech adapter card, and put the card into a PCIe slot (`Gen 3` or above); 1. Update BIOS on P710 server: Burn this [.iso](https://download.lenovo.com/pccbbs/thinkcentre_bios/s01j971usa.iso) file to CD, and reboot from CD to finish updating; 1. In BIOS settings, go to "Advanced -> PCI Subsystem Setting -> Above 4G Decoding", and enable it; 1. In BIOS settings, go to "Advanced -> IIO Configuration -> select the corresponding slot to set the bifurcation" 1. Add kernel parameters to the OS: `grubby --args="pci=assign-busses,hpbussize=4" --update-kernel ALL` and `grubby --args="realloc=on,hpmemsize=16G" --update-kernel ALL` 5. Reboot, and check the output of `lspci` to see if the Xilinx device is listed. -- ***It did not..*** ### On Dell Precision Tower 7810 (`docker-bd`) 1. Put the adapter card with smartSSD into a PCIex16 slot; 2. Boot the system... and vola! `lspci` listed the following: ``` dingpf@docker-bd $ lspci|grep -i xilinx 04:00.0 PCI bridge: Xilinx Corporation Device 9134 05:00.0 PCI bridge: Xilinx Corporation Device 9234 05:01.0 PCI bridge: Xilinx Corporation Device 9434 07:00.0 Processing accelerators: Xilinx Corporation Device 6987 07:00.1 Processing accelerators: Xilinx Corporation Device 6988 ``` 3. `lsblk` ``` root@docker-bd:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk └─sda1 8:1 0 931.5G 0 part /home/dingpf sdb 8:16 0 465.8G 0 disk ├─sdb1 8:17 0 433.9G 0 part / ├─sdb2 8:18 0 1K 0 part └─sdb5 8:21 0 31.9G 0 part [SWAP] sr0 11:0 1 1024M 0 rom nvme0n1 259:1 0 3.5T 0 disk ``` 4. Install softwares (Step 1-7 on page 13,14 of the manual); 5. Flash firmware `sudo /opt/xilinx/xrt/bin/xbmgmt flash --update --shell xilinx_u2_gen3x4_xdma_gc_base_2` 6. Cold reboot the server; ## Card bring-up and validation ### `lspic -vd 10ee:` ``` root@docker-bd:~# lspci -vd 10ee: 04:00.0 PCI bridge: Xilinx Corporation Device 9134 (prog-if 00 [Normal decode]) Physical Slot: 4 Flags: bus master, fast devsel, latency 0, NUMA node 0 Bus: primary=04, secondary=05, subordinate=07, sec-latency=0 I/O behind bridge: 00000000-00000fff [size=4K] Memory behind bridge: fb100000-fb1fffff [size=1M] Prefetchable memory behind bridge: 0000033c00000000-0000033e040fffff [size=8257M] Capabilities: [40] Power Management version 3 Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Upstream Port, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [1c0] Secondary PCI Express Kernel driver in use: pcieport 05:00.0 PCI bridge: Xilinx Corporation Device 9234 (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 33, NUMA node 0 Bus: primary=05, secondary=06, subordinate=06, sec-latency=0 I/O behind bridge: 00000000-00000fff [size=4K] Memory behind bridge: fb100000-fb1fffff [size=1M] Prefetchable memory behind bridge: [disabled] Capabilities: [40] Power Management version 3 Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Downstream Port (Slot+), MSI 00 Capabilities: [100] Access Control Services Capabilities: [1c0] Secondary PCI Express Kernel driver in use: pcieport 05:01.0 PCI bridge: Xilinx Corporation Device 9434 (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, NUMA node 0 Bus: primary=05, secondary=07, subordinate=07, sec-latency=0 I/O behind bridge: 00000000-00000fff [size=4K] Memory behind bridge: [disabled] Prefetchable memory behind bridge: 0000033c00000000-0000033e040fffff [size=8257M] Capabilities: [40] Power Management version 3 Capabilities: [70] Express Downstream Port (Slot-), MSI 00 Capabilities: [100] Access Control Services Capabilities: [140] Secondary PCI Express 07:00.0 Processing accelerators: Xilinx Corporation Device 6987 Subsystem: Xilinx Corporation Device 1351 Flags: bus master, fast devsel, latency 0, NUMA node 0 Memory at 33e02000000 (64-bit, prefetchable) [size=32M] Memory at 33e04010000 (64-bit, prefetchable) [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [60] MSI-X: Enable- Count=33 Masked- Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Secondary PCI Express Capabilities: [180] Vendor Specific Information: ID=0040 Rev=0 Len=018 <?> Capabilities: [400] Access Control Services Capabilities: [480] Vendor Specific Information: ID=0020 Rev=0 Len=010 <?> Kernel driver in use: xclmgmt Kernel modules: xclmgmt 07:00.1 Processing accelerators: Xilinx Corporation Device 6988 Subsystem: Xilinx Corporation Device 1351 Flags: bus master, fast devsel, latency 0, NUMA node 0 Memory at 33e00000000 (64-bit, prefetchable) [size=32M] Memory at 33e04000000 (64-bit, prefetchable) [size=64K] Memory at 33c00000000 (64-bit, prefetchable) [size=8G] Capabilities: [40] Power Management version 3 Capabilities: [60] MSI-X: Enable+ Count=32 Masked- Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [400] Access Control Services Capabilities: [480] Vendor Specific Information: ID=0020 Rev=0 Len=010 <?> Kernel driver in use: xocl Kernel modules: xocl ``` ### `lspci -vs 06:00.0` ``` root@docker-bd:~# lspci -vs 06:00.0 06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a825 (prog-if 02 [NVM Express]) Subsystem: Samsung Electronics Co Ltd Device a815 Flags: bus master, fast devsel, latency 0, IRQ 29, NUMA node 0 Memory at fb110000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at fb100000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable+ Count=64 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [148] Device Serial Number 1f-07-50-11-91-38-25-00 Capabilities: [178] Secondary PCI Express Kernel driver in use: nvme Kernel modules: nvme ``` 3. `lsblk` ``` root@docker-bd:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk └─sda1 8:1 0 931.5G 0 part /home/dingpf sdb 8:16 0 465.8G 0 disk ├─sdb1 8:17 0 433.9G 0 part / ├─sdb2 8:18 0 1K 0 part └─sdb5 8:21 0 31.9G 0 part [SWAP] sr0 11:0 1 1024M 0 rom nvme0n1 259:1 0 3.5T 0 disk ``` ### nvme log ``` root@docker-bd:~# nvme smart-log /dev/nvme0n1 Smart Log for NVME device:nvme0n1 namespace-id:ffffffff critical_warning : 0x2 temperature : 85 C available_spare : 100% available_spare_threshold : 10% percentage_used : 0% data_units_read : 20 data_units_written : 0 host_read_commands : 1,010 host_write_commands : 0 controller_busy_time : 0 power_cycles : 20 power_on_hours : 19 unsafe_shutdowns : 13 media_errors : 0 num_err_log_entries : 4 Warning Temperature Time : 9 Critical Composite Temperature Time : 549 Temperature Sensor 1 : 82 C Temperature Sensor 2 : 84 C Temperature Sensor 3 : 85 C Thermal Management T1 Trans Count : 0 Thermal Management T2 Trans Count : 0 Thermal Management T1 Total Time : 0 Thermal Management T2 Total Time : 0 ``` ### `xbutil scan` ``` root@docker-bd:~# xbutil scan --------------------------------------------------------------------- Deprecation Warning: The given legacy sub-command and/or option has been deprecated to be obsoleted in the next release. Further information regarding the legacy deprecated sub-commands and options along with their mappings to the next generation sub-commands and options can be found on the Xilinx Runtime (XRT) documentation page: https://xilinx.github.io/XRT/master/html/xbtools_map.html Please update your scripts and tools to use the next generation sub-commands and options. --------------------------------------------------------------------- INFO: Found total 1 card(s), 1 are usable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ System Configuration OS name: Linux Release: 5.4.0-84-generic Version: #94-Ubuntu SMP Thu Aug 26 20:27:37 UTC 2021 Machine: x86_64 Model: Precision Tower 5810 CPU cores: 8 Memory: 32041 MB Glibc: 2.31 Distribution: Ubuntu 20.04.1 LTS Now: Thu Sep 16 20:43:16 2021 GMT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ XRT Information Version: 2.11.634 Git Hash: 5ad5998d67080f00bca5bf15b3838cf35e0a7b26 Git Branch: 2021.1 Build Date: 2021-06-08 22:08:45 XOCL: 2.11.634,5ad5998d67080f00bca5bf15b3838cf35e0a7b26 XCLMGMT: 2.11.634,5ad5998d67080f00bca5bf15b3838cf35e0a7b26 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [0] 0000:07:00.1 xilinx_u2_gen3x4_xdma_gc_base_2 user(inst=129) ``` ### `xbmgmt flash --scan` ``` root@docker-bd:~# /opt/xilinx/xrt/bin/xbmgmt flash --scan --------------------------------------------------------------------- Deprecation Warning: The given legacy sub-command and/or option has been deprecated to be obsoleted in the next release. Further information regarding the legacy deprecated sub-commands and options along with their mappings to the next generation sub-commands and options can be found on the Xilinx Runtime (XRT) documentation page: https://xilinx.github.io/XRT/master/html/xbtools_map.html Please update your scripts and tools to use the next generation sub-commands and options. --------------------------------------------------------------------- Card [0000:07:00.0] Card type: u2 Flash type: SPI Flashable partition running on FPGA: xilinx_u2_gen3x4_xdma_gc_base_2,[ID=0x8c8dfd8818ab79b2],[SC=INACTIVE] Flashable partitions installed in system: xilinx_u2_gen3x4_xdma_gc_base_2,[ID=0x8c8dfd8818ab79b2] ``` ### `xbutil validate -d 0000:07:00.1` ``` root@docker-bd:~# xbutil validate -d 0000:07:00.1 Starting validation for 1 devices Validate Device : [0000:07:00.1] Platform : xilinx_u2_gen3x4_xdma_gc_base_2 SC Version : 0.0.0 Platform ID : 0x0 ------------------------------------------------------------------------------- Test 1 [0000:07:00.1] : PCIE link Test Status : [PASSED] ------------------------------------------------------------------------------- Test 2 [0000:07:00.1] : SC version Test Status : [PASSED] ------------------------------------------------------------------------------- Test 3 [0000:07:00.1] : Verify kernel Test Status : [PASSED] ------------------------------------------------------------------------------- Test 4 [0000:07:00.1] : DMA Details : Host -> PCIe -> FPGA write bandwidth = 71.359753 MB/s Host <- PCIe <- FPGA read bandwidth = 71.354767 MB/s Test Status : [PASSED] ------------------------------------------------------------------------------- Test 5 [0000:07:00.1] : iops Details : IOPS: 5050 (hello) Test Status : [PASSED] ------------------------------------------------------------------------------- Test 6 [0000:07:00.1] : Bandwidth kernel Error(s) : Host buffer alignment 4096 bytes Compiled kernel = /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/bandwidth.xclbin Shell = b'xilinx_u2_gen3x4_xdma_gc_base_2' Index = 0 PCIe = GEN3 x 4 OCL Frequency = (1, 0) MHz DDR Bank = 0 Device Temp = 89 C MIG Calibration = True Finished downloading bitstream /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/bandwidth.xclbin CU[0] b'bandwidth1:bandwidth1_1' @0x1810000 CU[1] b'bandwidth2:bandwidth2_1' @0x1820000 [0] b'bank0' @0x4000000000 LOOP PIPELINE 16 beats Test 0, Throughput: 27 MB/s LOOP PIPELINE 64 beats Test 1, Throughput: 83 MB/s LOOP PIPELINE 256 beats Test 2, Throughput: 142 MB/s LOOP PIPELINE 1024 beats Test 3, Throughput: 142 MB/s TTTT: 27 Maximum throughput: 142 MB/s ERROR: Throughput is less than expected value of 10 GB/sec FAILED TEST Details : Maximum throughput: 142 MB/s Test Status : [FAILED] ------------------------------------------------------------------------------- Validation failed. Please run the command '--verbose' option for more details Validation Summary ------------------ 1 device(s) evaluated 0 device(s) validated successfully 1 device(s) had exceptions during validation Validated successfully [0 device(s)] Validation Exceptions [1 device(s)] - [0000:07:00.1] : xilinx_u2_gen3x4_xdma_gc_base_2 : First failure: 'Bandwidth kernel' Warnings produced during test [0 device(s)] (Note: The given test successfully validated) ``` This test failed because of low throughput rate than expected. I noticed that the smartSSD may not be sufficiently cooled while doing the validation. I did have a successful validation earlier today. ![](https://i.imgur.com/NyT7dPu.png) After the card got heated, I don't see it listed anymore with `lsblk` ``` root@docker-bd:/home/dingpf# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk └─sda1 8:1 0 931.5G 0 part /home/dingpf sdb 8:16 0 465.8G 0 disk ├─sdb1 8:17 0 433.9G 0 part / ├─sdb2 8:18 0 1K 0 part └─sdb5 8:21 0 31.9G 0 part [SWAP] sr0 11:0 1 1024M 0 rom ``` And `lspci -vs 06:00.0` gives me: ``` root@docker-bd:/home/dingpf# lspci -vs 06:00.0 06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a825 (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel modules: nvme ``` Did a cold reboot. ### Rerunning the validation, creating file system, and doing `fio` test. ``` root@docker-bd:~# source /opt/xilinx/xrt/setup.sh XILINX_XRT : /opt/xilinx/xrt PATH : /opt/xilinx/xrt/bin:/home/dingpf/.local/bin:/home/dingpf/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH : /opt/xilinx/xrt/lib: PYTHONPATH : /opt/xilinx/xrt/python: root@docker-bd:~# xbutil validate -d 0000:07:00.1 --verbose Verbose: Enabling Verbosity Starting validation for 1 devices Validate Device : [0000:07:00.1] Platform : xilinx_u2_gen3x4_xdma_gc_base_2 SC Version : 0.0.0 Platform ID : 0x0 ------------------------------------------------------------------------------- Test 1 [0000:07:00.1] : Aux connection Description : Check if auxiliary power is connected Details : Aux power connector is not available on this board Test Status : [SKIPPED] ------------------------------------------------------------------------------- Test 2 [0000:07:00.1] : PCIE link Description : Check if PCIE link is active Test Status : [PASSED] ------------------------------------------------------------------------------- Test 3 [0000:07:00.1] : SC version Description : Check if SC firmware is up-to-date Test Status : [PASSED] ------------------------------------------------------------------------------- Test 4 [0000:07:00.1] : Verify kernel Description : Run 'Hello World' kernel test Xclbin : /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/verify.xclbin Testcase : /opt/xilinx/xrt/test/22_verify.py Test Status : [PASSED] ------------------------------------------------------------------------------- Test 5 [0000:07:00.1] : DMA Description : Run dma test Details : Host -> PCIe -> FPGA write bandwidth = 3308.324361 MB/s Host <- PCIe <- FPGA read bandwidth = 3303.036681 MB/s Test Status : [PASSED] ------------------------------------------------------------------------------- Test 6 [0000:07:00.1] : iops Description : Run scheduler performance measure test Xclbin : /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/verify.xclbin Testcase : /opt/xilinx/xrt/test/xcl_iops_test.exe Details : IOPS: 161941 (hello) Test Status : [PASSED] ------------------------------------------------------------------------------- Test 7 [0000:07:00.1] : Bandwidth kernel Description : Run 'bandwidth kernel' and check the throughput Xclbin : /opt/xilinx/firmware/u2/gen3x4-xdma-gc/base/test/bandwidth.xclbin Testcase : /opt/xilinx/xrt/test/23_bandwidth.py Details : Maximum throughput: 15392 MB/s Test Status : [PASSED] ------------------------------------------------------------------------------- Test 8 [0000:07:00.1] : Peer to peer bar Description : Run P2P test Details : bank0 validated Test Status : [PASSED] ------------------------------------------------------------------------------- Test 9 [0000:07:00.1] : Memory to memory DMA Description : Run M2M test Details : M2M is not available Test Status : [SKIPPED] ------------------------------------------------------------------------------- Test 10 [0000:07:00.1] : Host memory bandwidth test Description : Run 'bandwidth kernel' when host memory is enabled Details : Address translator IP is not available Test Status : [SKIPPED] ------------------------------------------------------------------------------- Test 11 [0000:07:00.1] : vcu Description : Run decoder test Details : Verify xclbin not available or shell partition is not programmed. Skipping validation. Test Status : [SKIPPED] ------------------------------------------------------------------------------- Validation completed. Please run the command '--verbose' option for more details Validation Summary ------------------ 1 device(s) evaluated 1 device(s) validated successfully 0 device(s) had exceptions during validation Validated successfully [1 device(s)] - [0000:07:00.1] : xilinx_u2_gen3x4_xdma_gc_base_2 Validation Exceptions [0 device(s)] Warnings produced during test [0 device(s)] (Note: The given test successfully validated) Unsupported tests [1 device(s)] - [0000:07:00.1] : xilinx_u2_gen3x4_xdma_gc_base_2 : Test(s): 'Aux connection', 'Memory to memory DMA', 'Host memory bandwidth test', vcu root@docker-bd:~# mkfs.ext4 /dev/nvme0n1 mke2fs 1.45.5 (07-Jan-2020) Discarding device blocks: done Creating filesystem with 937684566 4k blocks and 234422272 inodes Filesystem UUID: eeedca44-6f0a-49b9-861a-30ef1858fa83 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544 Allocating group tables: done Writing inode tables: done Creating journal (262144 blocks): done Writing superblocks and filesystem accounting information: done root@docker-bd:~# root@docker-bd:~# root@docker-bd:~# fio --name=rand-write --ioengine=libaio --iodepth=256 --rw=randwrite --bs=4k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n 1 --group_reporting=1 rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256 ... fio-3.16 Starting 12 processes fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=1023840710656, buflen=4096 fio: pid=16240, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=603464712192, buflen=4096 fio: pid=16244, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2122233233408, buflen=4096 fio: pid=16245, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=565072101376, buflen=4096 fio: pid=16235, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=484806328320, buflen=4096 fio: pid=16237, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error Jobs: 12 (f=12): [f(1),w(1),f(1),w(1),f(1),w(3),f(2),w(2)][66.7%][w=42.2MiB/s][w=10fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2002030358528, buflen=4096 fio: pid=16236, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=247143903232, buflen=4096 fio: pid=16243, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2218376523776, buflen=4096 fio: pid=16241, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=2508332797952, buflen=4096 fio: pid=16246, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=1621292158976, buflen=4096 fio: pid=16242, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=613084512256, buflen=4096 fio: pid=16247, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error fio: io_u error on file /dev/nvme0n1: Input/output error: write offset=1119989624832, buflen=4096 fio: pid=16239, err=5/file:io_u.c:1787, func=io_u error, error=Input/output error Jobs: 10 (f=10): [f(7),X(1),f(2),X(1),f(1)][100.0%][eta 00m:00s] rand-write: (groupid=0, jobs=12): err= 5 (file:io_u.c:1787, func=io_u error, error=Input/output error): pid=16235: Thu Sep 16 16:28:04 2021 write: IOPS=248k, BW=970MiB/s (1017MB/s)(37.8GiB/39917msec); 0 zone resets slat (nsec): min=1547, max=15166k, avg=6264.92, stdev=9270.51 clat (usec): min=718, max=1091.7k, avg=12273.86, stdev=19672.83 lat (usec): min=722, max=1091.7k, avg=12280.34, stdev=19672.89 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 8], | 30.00th=[ 9], 40.00th=[ 10], 50.00th=[ 11], 60.00th=[ 12], | 70.00th=[ 14], 80.00th=[ 17], 90.00th=[ 21], 95.00th=[ 24], | 99.00th=[ 31], 99.50th=[ 35], 99.90th=[ 104], 99.95th=[ 127], | 99.99th=[ 1070] bw ( KiB/s): min= 3030, max=1560517, per=100.00%, avg=1003779.43, stdev=17961.45, samples=947 iops : min= 756, max=390129, avg=250944.50, stdev=4490.37, samples=947 lat (usec) : 750=0.01%, 1000=0.01% lat (msec) : 2=0.40%, 4=2.89%, 10=43.06%, 20=43.69%, 50=9.66% lat (msec) : 100=0.17%, 250=0.07%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2000=0.03% cpu : usr=9.43%, sys=12.62%, ctx=849954, majf=0, minf=237 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,9916743,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): WRITE: bw=970MiB/s (1017MB/s), 970MiB/s-970MiB/s (1017MB/s-1017MB/s), io=37.8GiB (40.6GB), run=39917-39917msec Disk stats (read/write): nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ``` **Validation is passed succssfullly, as well as creating file system. However ~50% into the fio random read/write test, the test failed with I/O error. This is very likely due to insufficient cooling of the device.** ### Temperature readings after the cold boot: ``` root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 | grep temp temperature : 69 C root@docker-bd:/home/dingpf# #20% into Bandwidth kernel test root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 |grep temp root@docker-bd:/home/dingpf# # after validation root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 | grep temp temperature : 73 C root@docker-bd:/home/dingpf# nvme smart-log /dev/nvme0n1 | grep temp # idling for another 3 minutes temperature : 78 C ``` ![](https://i.imgur.com/lmWCblC.png) Tempearture readings taken 1-2 minutes apart. ![](https://i.imgur.com/JE2SL1X.png) ### **Decide to take the smartSSD off from the motherboard until we have a solution for the cooling issue.** ### This is how the card is installed currently. ![](https://i.imgur.com/pxKYyvs.jpg) ### With a fan directing at the device. ![](https://i.imgur.com/Z5cMxtc.jpg) ## Raw SSD Read/Write test * Random write test; **`WRITE: bw=2362MiB/s (2477MB/s), 2362MiB/s-2362MiB/s (2477MB/s-2477MB/s), io=138GiB (149GB), run=60007-60007msec`** * Random read test; **`READ: bw=2867MiB/s (3007MB/s), 2867MiB/s-2867MiB/s (3007MB/s-3007MB/s), io=168GiB (180GB), run=60015-60015msec`** * Sequetial write test; **` WRITE: bw=2391MiB/s (2507MB/s), 2391MiB/s-2391MiB/s (2507MB/s-2507MB/s), io=141GiB (151GB), run=60284-60284msec`** * Sequential read test; **`READ: bw=2961MiB/s (3105MB/s), 2961MiB/s-2961MiB/s (3105MB/s-3105MB/s), io=175GiB (187GB), run=60356-60356msec`** ```= root@docker-bd:~# fio --name=rand-write --ioengine=libaio --iodepth=256 --rw=randwr ite --bs=4k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n 1 --group_reporting=1 rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256 ... fio-3.16 Starting 12 processes Jobs: 12 (f=12): [w(12)][100.0%][w=2431MiB/s][w=622k IOPS][eta 00m:00s] rand-write: (groupid=0, jobs=12): err= 0: pid=15750: Fri Sep 17 11:54:04 2021 write: IOPS=605k, BW=2362MiB/s (2477MB/s)(138GiB/60007msec); 0 zone resets slat (nsec): min=1524, max=10231k, avg=4488.34, stdev=8496.11 clat (usec): min=17, max=162066, avg=5072.02, stdev=3473.33 lat (usec): min=24, max=162074, avg=5076.65, stdev=3473.43 clat percentiles (usec): | 1.00th=[ 1614], 5.00th=[ 2376], 10.00th=[ 2769], 20.00th=[ 3195], | 30.00th=[ 3556], 40.00th=[ 3982], 50.00th=[ 4490], 60.00th=[ 5080], | 70.00th=[ 5800], 80.00th=[ 6587], 90.00th=[ 7898], 95.00th=[ 9110], | 99.00th=[ 11994], 99.50th=[ 13698], 99.90th=[ 50594], 99.95th=[ 74974], | 99.99th=[116917] bw ( MiB/s): min= 340, max= 2803, per=99.96%, avg=2361.14, stdev=26.85, samples=1433 iops : min=87142, max=717598, avg=604452.42, stdev=6873.47, samples=1433 lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01% lat (usec) : 750=0.02%, 1000=0.06% lat (msec) : 2=2.36%, 4=38.00%, 10=56.54%, 20=2.77%, 50=0.13% lat (msec) : 100=0.08%, 250=0.02% cpu : usr=16.19%, sys=20.36%, ctx=1648028, majf=0, minf=154 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwts: total=0,36286011,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): WRITE: bw=2362MiB/s (2477MB/s), 2362MiB/s-2362MiB/s (2477MB/s-2477MB/s), io=138GiB (149GB), run=60007-60007msec Disk stats (read/write): nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ``` ```= root@docker-bd:~# fio --name=rand-read --ioengine=libaio --iodepth=256 --rw=randread --bs=4k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n1 --group_reporting=1 rand-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256 ... fio-3.16 Starting 12 processes Jobs: 12 (f=12): [r(12)][100.0%][r=1893MiB/s][r=485k IOPS][eta 00m:00s] rand-read: (groupid=0, jobs=12): err= 0: pid=24614: Fri Sep 17 13:11:43 2021 read: IOPS=734k, BW=2867MiB/s (3007MB/s)(168GiB/60015msec) slat (nsec): min=1483, max=16053k, avg=4234.76, stdev=6314.14 clat (usec): min=25, max=44358, avg=4177.38, stdev=2575.76 lat (usec): min=29, max=44362, avg=4181.78, stdev=2575.83 clat percentiles (usec): | 1.00th=[ 619], 5.00th=[ 1106], 10.00th=[ 1500], 20.00th=[ 2057], | 30.00th=[ 2573], 40.00th=[ 3097], 50.00th=[ 3621], 60.00th=[ 4228], | 70.00th=[ 5014], 80.00th=[ 5932], 90.00th=[ 7504], 95.00th=[ 9110], | 99.00th=[13304], 99.50th=[14746], 99.90th=[16450], 99.95th=[16581], | 99.99th=[16909] bw ( MiB/s): min= 1740, max= 3588, per=100.00%, avg=2869.41, stdev=48.36, samples=1435 iops : min=445623, max=918698, avg=734568.90, stdev=12379.40, samples=1435 lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.43%, 750=1.38% lat (usec) : 1000=2.09% lat (msec) : 2=14.89%, 4=37.50%, 10=40.41%, 20=3.28%, 50=0.01% cpu : usr=18.96%, sys=28.65%, ctx=11591764, majf=0, minf=3200 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwts: total=44053216,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): READ: bw=2867MiB/s (3007MB/s), 2867MiB/s-2867MiB/s (3007MB/s-3007MB/s), io=168GiB (180GB), run=60015-60015msec Disk stats (read/write): nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ``` ```= root@docker-bd:~# fio --name=seq-write --ioengine=libaio --iodepth=64 --rw=write --bs=1024k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n1 --group_reporting=1 seq-write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64 ... fio-3.16 Starting 12 processes Jobs: 12 (f=12): [W(12)][100.0%][w=2402MiB/s][w=2402 IOPS][eta 00m:00s] seq-write: (groupid=0, jobs=12): err= 0: pid=28762: Fri Sep 17 13:15:03 2021 write: IOPS=2390, BW=2391MiB/s (2507MB/s)(141GiB/60284msec); 0 zone resets slat (usec): min=38, max=121102, avg=1548.95, stdev=3981.32 clat (msec): min=8, max=939, avg=319.50, stdev=120.71 lat (msec): min=8, max=948, avg=321.05, stdev=122.10 clat percentiles (msec): | 1.00th=[ 48], 5.00th=[ 99], 10.00th=[ 148], 20.00th=[ 213], | 30.00th=[ 259], 40.00th=[ 300], 50.00th=[ 338], 60.00th=[ 372], | 70.00th=[ 397], 80.00th=[ 418], 90.00th=[ 443], 95.00th=[ 481], | 99.00th=[ 609], 99.50th=[ 684], 99.90th=[ 827], 99.95th=[ 852], | 99.99th=[ 877] bw ( MiB/s): min= 1106, max= 4361, per=99.93%, avg=2388.90, stdev=48.92, samples=1440 iops : min= 1106, max= 4361, avg=2388.46, stdev=48.93, samples=1440 lat (msec) : 10=0.02%, 20=0.02%, 50=1.10%, 100=4.10%, 250=22.66% lat (msec) : 500=68.29%, 750=3.56%, 1000=0.26% cpu : usr=1.67%, sys=1.62%, ctx=126200, majf=0, minf=148 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.5% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,144118,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=2391MiB/s (2507MB/s), 2391MiB/s-2391MiB/s (2507MB/s-2507MB/s), io=141GiB (151GB), run=60284-60284msec Disk stats (read/write): nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ``` ```= root@docker-bd:~# fio --name=seq-read --ioengine=libaio --iodepth=64 --rw=read --bs=1024k --direct=1 --size=100% --numjobs=12 --runtime=60 --filename=/dev/nvme0n1 --group_reporting=1 seq-read: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64 ... fio-3.16 Starting 12 processes Jobs: 12 (f=12): [R(12)][100.0%][r=1887MiB/s][r=1886 IOPS][eta 00m:00s] seq-read: (groupid=0, jobs=12): err= 0: pid=32251: Fri Sep 17 13:17:46 2021 read: IOPS=2960, BW=2961MiB/s (3105MB/s)(175GiB/60356msec) slat (usec): min=22, max=41923, avg=1719.48, stdev=3517.71 clat (msec): min=24, max=973, avg=257.35, stdev=129.26 lat (msec): min=25, max=994, avg=259.07, stdev=130.95 clat percentiles (msec): | 1.00th=[ 39], 5.00th=[ 69], 10.00th=[ 99], 20.00th=[ 146], | 30.00th=[ 184], 40.00th=[ 222], 50.00th=[ 257], 60.00th=[ 284], | 70.00th=[ 300], 80.00th=[ 334], 90.00th=[ 426], 95.00th=[ 502], | 99.00th=[ 659], 99.50th=[ 735], 99.90th=[ 844], 99.95th=[ 877], | 99.99th=[ 944] bw ( MiB/s): min= 894, max= 6316, per=100.00%, avg=2965.37, stdev=87.73, samples=1440 iops : min= 894, max= 6316, avg=2965.06, stdev=87.74, samples=1440 lat (msec) : 50=2.32%, 100=8.08%, 250=37.52%, 500=46.95%, 750=4.72% lat (msec) : 1000=0.40% cpu : usr=0.22%, sys=2.31%, ctx=158027, majf=0, minf=196739 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=178708,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=2961MiB/s (3105MB/s), 2961MiB/s-2961MiB/s (3105MB/s-3105MB/s), io=175GiB (187GB), run=60356-60356msec Disk stats (read/write): nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ``` ## Device temperature readings The device stays at around 56C when idling and reaches ~60C while doing intensive read/writes (CPU<->SSD). ```= # Code boot from room temperature, kept idling for 5 minutes temperature : 49 C # before running xbmgt validation temperature : 49 C # started validation; readings taken ~10-20s apart temperature : 49 C temperature : 50 C temperature : 50 C temperature : 51 C temperature : 53 C temperature : 55 C temperature : 56 C temperature : 55 C temperature : 55 C # finished validation # idling for 3 minutes temperature : 55 C root@docker-bd:~# # started fio random write test temperature : 55 C temperature : 55 C temperature : 55 C temperature : 56 C temperature : 56 C temperature : 58 C root@docker-bd:~# # end of ramdon write test temperature : 57 C # idling for 2 minutes temperature : 57 C temperature : 57 C temperature : 57 C temperature : 56 C # started random read test temperature : 56 C # 10s temperature : 57 C temperature : 57 C # finished random read test temperature : 57 C # begin seq write test temperature : 57 C temperature : 57 C # 10s temperature : 57 C temperature : 58 C # 20s temperature : 59 C temperature : 59 C temperature : 59 C # finished seq write temperature : 60 C # seq read test temperature : 59 C temperature : 59 C # 10s temperature : 59 C temperature : 59 C # 20s temperature : 59 C temperature : 59 C # 40s temperature : 59 C temperature : 59 C # finished seq read test temperature : 59 C # 5 minutes idling temperature : 56 C ``` ## Remote power-cycle 1. Connect to `mwts.fnal.gov` via ssh; 2. Connect to its serial port with `screen /dev/ttyS0 9600,cs8`, and keep press "Enter" until "User Name" prompt shows up; 3. Use default `apc` username and password to login to the power strip's console; 4. Plug 1-3 are labeled: 1 for the Dell Tower where the smartSSD is installed, 2 for the Honeywell fan, 3 for the monitor; 5. APC's control user interface is pretty self-explained, simply follow the prompted menu items in the output to power-cycle plug 1 (refer to the following GIF to see it in action); 6. when finished, use `ctrl+a` followed by `\` to close the `screen` session; 7. you can also leave the `screen` session open, and use `screen -r` to resume, or `screen -ls` to list all sessions and use `screen -r <session_id>` if there are multiple sessions opened. The Dell Tower has been configured to start booting when power is restored. So power on/off the plug via the APC console should be enough for a "cold" boot. ![](https://i.imgur.com/se2hFLi.gif) ## P2P read/write test ### A brief summary of P2P test [DUNE FD DAQ requirements](https://docs.dunescience.org/cgi-bin/sso/ShowDocument?docid=11314) state that > “10 Gb/s average storage throughput; 100 Gb/s peak temporary storage throughput per single phase detector module”; and > “Average throughput estimated from physics and calibration requirements; peak throughput allowing for fast storage of SNB data ($\sim 10^4$ seconds to store 120 TB of data).” The P2P test results shows, for example: ``` SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 72729us 100.00% 1759.96MB/s p2p 56981us 78.35% 2246.36MB/s kernel 77416us 106.44% 1653.40MB/s XDMA 66177us 90.99% 1934.21MB/s HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 106873us 100.00% 1197.68MB/s p2p 59548us 55.72% 2149.53MB/s kernel 56630us 52.99% 2260.29MB/s XDMA 42131us 39.42% 3038.14MB/s ``` :bulb: If our firmware does not bring too much slowdown, 10 smart SSDs might be able to handle the throughput. However that does not satisfy the total storage requirement of 12TB, with each SSD being 3.5TB. That will require us to have ~40 smartSSDs per module, which I think is still very reasonable. With that 40 smartSSDs, each would handle 315MB/s throughput, at 20%-30% of the ultimate read/write speed shown in the p2p tests. That might give us a comfortable margin to play with compression/pattern recognition ML algorithms. * P2P Read test, the data flows from the SSD -> FPGA DDR -> Byte Copy Read (from FPGA DDR) -> Byte Copy Write (into FPGA DDR) -> Host DDR. The flow of data from SSD to FPGA DDR is called P2P Read. * P2P Write test, the data flows from Host DDR -> FPGA DDR -> Byte Copy Read from FPGA DDR -> Byte Copy Write into FPGA DDR -> SSD. The flow of data from FPGA DDR to SSD is called P2P Write. ### P2P test tools The testing tools (with prebuilt firmware) was obtained from Xilinx. They are placed under `/root/ug1382-smartssd-csd/scripts/shell_version_independent/xilinx-hotplug-tool`. ``` root@docker-bd:~# tree /root/ug1382-smartssd-csd/ /root/ug1382-smartssd-csd/ └── scripts ├── readme.txt ├── shell_version_dependent │   ├── bandwidth.xclbin │   ├── bytecopy_async.exe │   ├── bytecopy.exe │   ├── bytecopy.xclbin │   ├── kernel_bw.exe │   ├── README.txt │   ├── run_aync_bytecopy.sh │   ├── run_bytecopy.sh │   ├── validate.exe │   ├── verify.xclbin │   └── xrt.ini └── shell_version_independent ├── Flat_shell_WBSTAR_Python_Script │   └── WbstarFlow.py ├── PCIe_RW_scripts │   ├── axi_i2c_read.sh │   ├── clk_scaling_IP_reg.sh │   ├── clk_thrt_en.sh │   ├── clk_thrt_set_limit.sh │   ├── data_log │   │   ├── board_power_data │   │   ├── fpga_temp_data │   │   └── krnl_freq_data │   ├── ddr4_access_check.sh │   ├── fpga_dna_data_rd.sh │   ├── plot_graph_clk_scaling.py │   ├── README.txt │   ├── run_all.sh │   ├── rwmem │   ├── shell_board_revision_check.sh │   ├── shell_feature_rom_access_check.sh │   ├── shell_firmware_load_check.sh │   └── shell_version_check.sh └── xilinx-hotplug-tool ├── pci │   ├── __init__.py │   └── pci_devices.py ├── README ├── utils │   ├── common.py │   ├── __init__.py │   ├── parsing.py │   └── util_cmds.py └── xilinx-hotplug.py 9 directories, 38 files ``` ### How to run the P2P test **To run the tests, firstly source the `xrt` setup script:** ` source /opt/xilinx/xrt/setup.sh`. (Note: :bulb: Some of the scripts has `\r` in the text, which may causes bash errors when executing. Use ` sed -i 's/\r$//' run_all.sh` to replace those.) Under `/root/ug1382-smartssd-csd/scripts/shell_version_dependent`: 1. To run Hello world Kernel: `./validate.exe verify.xclbin` 2. To run Bandwidth Kernel: `./kernel_bw.exe bandwidth.xclbin` 3. To run regular/sync bytecopy kernel: `./run_bytecopy.sh` 4. To run aync_bytecopy kernel: `./run_aync_bytecopy.sh` :bulb: **The tests took less than 10 minutes. The device stayed around 56C throughout the tests.** ### Test results #### Hello World Kernel ```shell= root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./validate.exe verify.xclbin CL_PLATFORM_VENDOR Xilinx CL_PLATFORM_NAME Xilinx Get 1 devices Using 1th device loading verify.xclbin RESULT: Hello World ``` #### Bandwidth Kernel ```shell= root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./kernel_bw.exe bandwidth.xclbin Found 1 compute devices!: loading bandwidth.xclbin LOOP PIPELINE 16 beats Test : 0, Throughput: 5007.841797 MB/s LOOP PIPELINE 64 beats Test : 1, Throughput: 11004.667969 MB/s LOOP PIPELINE 256 beats Test : 2, Throughput: 15761.664062 MB/s LOOP PIPELINE 1024 beats Test : 3, Throughput: 16025.180664 MB/s TTTT : 5007.841797 Maximum throughput: 16025.180664 MB/s PASSED ``` #### Regular/Sync Bytecopy Kernel ```shell= root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./run_bytecopy.sh iteration 0 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file ^[[BINFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines INFO: Kick off test SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 73778us 100.00% 1734.93MB/s p2p 58146us 78.81% 2201.36MB/s kernel 75924us 102.91% 1685.90MB/s XDMA 69504us 94.21% 1841.62MB/s INFO: Evaluating test result INFO: Test passed iteration 1 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines INFO: Kick off test HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 105184us 100.00% 1216.92MB/s p2p 54818us 52.12% 2335.00MB/s kernel 56037us 53.28% 2284.21MB/s XDMA 41789us 39.73% 3063.01MB/s INFO: Evaluating test result INFO: Test passed iteration 2 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines INFO: Kick off test SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 72691us 100.00% 1760.88MB/s p2p 56693us 77.99% 2257.77MB/s kernel 82243us 113.14% 1556.36MB/s XDMA 72417us 99.62% 1767.54MB/s INFO: Evaluating test result INFO: Test passed iteration 3 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines INFO: Kick off test HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 106601us 100.00% 1200.74MB/s p2p 58838us 55.19% 2175.46MB/s kernel 56561us 53.06% 2263.04MB/s XDMA 42195us 39.58% 3033.53MB/s INFO: Evaluating test result INFO: Test passed iteration 4 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines INFO: Kick off test SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 72729us 100.00% 1759.96MB/s p2p 56981us 78.35% 2246.36MB/s kernel 77416us 106.44% 1653.40MB/s XDMA 66177us 90.99% 1934.21MB/s INFO: Evaluating test result INFO: Test passed iteration 5 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines INFO: Kick off test HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 106873us 100.00% 1197.68MB/s p2p 59548us 55.72% 2149.53MB/s kernel 56630us 52.99% 2260.29MB/s XDMA 42131us 39.42% 3038.14MB/s INFO: Evaluating test result INFO: Test passed iteration 6 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 8 pipelines ######## ## ## Pressed Ctrl+c to exit... ## ######## root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ``` #### Async_bytecopy Kernel ```shell= ^Croot@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ./run_aync_bytecopy.sh iteration 0 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines INFO: Kick off test SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 69095us 100.00% 1852.52MB/s p2p 39814us 57.62% 3214.95MB/s INFO: Evaluating test result INFO: Test passed iteration 1 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines INFO: Kick off test HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 106198us 100.00% 1205.30MB/s p2p 60463us 56.93% 2117.00MB/s INFO: Evaluating test result INFO: Test passed iteration 2 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines INFO: Kick off test SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 68189us 100.00% 1877.14MB/s p2p 40048us 58.73% 3196.16MB/s INFO: Evaluating test result INFO: Test passed iteration 3 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines INFO: Kick off test HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 106168us 100.00% 1205.64MB/s p2p 60204us 56.71% 2126.10MB/s INFO: Evaluating test result INFO: Test passed iteration 4 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines INFO: Kick off test SSD -> FPGA(p2p BO) -> FPGA(host BO) -> HOST overall 67784us 100.00% 1888.35MB/s p2p 39976us 58.98% 3201.92MB/s INFO: Evaluating test result INFO: Test passed iteration 5 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines INFO: Kick off test HOST -> FPGA(host BO) -> FPGA(p2p BO) -> SSD overall 107446us 100.00% 1191.30MB/s p2p 61535us 57.27% 2080.12MB/s INFO: Evaluating test result INFO: Test passed iteration 6 INFO: Successfully opened NVME SSD /dev/nvme0n1 Detected 1 devices, using the 0th one INFO: Importing ./bytecopy.xclbin INFO: Loaded file INFO: Created Binary INFO: Built Program INFO: Preparing 131072KB test data in 32 pipelines ######## ## ## Pressed Ctrl+c to exit... ## ######## root@docker-bd:~/ug1382-smartssd-csd/scripts/shell_version_dependent# ``` ### Install Vitis ![](https://i.imgur.com/kj0EeCp.png) ![](https://i.imgur.com/RFWhMAU.png) ![](https://i.imgur.com/qTlCkBV.png) ![](https://i.imgur.com/Bv10KEU.jpg)