NP04 raid performance

# NP04 raid performance ## Conclusion RAID10 on np04-srv-004 did not slow things down when tested in real-life scenarios, and we should move ahead with reconfiguring the other three datadisk servers to RAID10. ## `fio` tests Details of RAID configuration and `fio` test results on `np04-srv-014` can be found [here](https://hackmd.io/ySnMJ5hgQbir3EW2nHVKTg). ## Real-life read/write test from Giovanna and Steve ### Giovanna Lehmann Miotto 2022 Jul 18th at 1:16 AM > during the weekend I did some DAQ tests using the 4 storage volumes of srv-002/3/4. I could sustain a write througphput at 4.2 GB/s, with brief periods up to 4.9 GB/s, corresponding to 350-400 MB/s per volume. Of course this traffic is write only, but it looks more in line with what we had when pushing the DAQ during ProtoDUNE ### Steven Timm >350-400 MB/s per volume is consistent with the numbers I saw during most of the data challenge. Note that the RAID10 on np04-srv-004 worked faster at first but eventually bogged down to be about as slow as the other machines. I was only hitting one volume at once on any given machine. ### Ron Ron Rechenmacer 2022 Feb 24th at 1:13 PM Have you any ideas why I'm seeing poor raid 10 performance one np04-srv-004 and decreased performance with the different disks on np04-srv-003 (is there fewer disks involved))? Pengfei Ding 1:27 PM I was seeing similar issues when benchmarking on 004 with fio using 4k blocksize (similar to random workloads case), but using a bigger blocksize 1MB (for sequential workloads), I was able to get a better performance. 1:30 I saw you were using 1M block size in your script. I’ll give it a try with dd too, and maybe an even bigger size. Pengfei Ding 1:48 PM ![](https://i.imgur.com/efL9xmh.jpg) 1:49 I was comparing `fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=1024k --numjobs=4 --size=40G --runtime=60 --filename=/data2/test/pding/bench_mark.trash --group_reporting=1` on 004 and 002, **004 showed 717MB/s vs 290 MB/s on 002.** ### Steven Timm, test with `iperf` Jul 13th at 12:05 PM is it possible to get iperf installed on np04-srv-002 and np04-srv-004 15 replies Ron Rechenmacer 3 months ago @timm Back on Feb 24th (in a direct message with Pengfei) I asked about issues (poor performance) with np04-srv-003 and np04-srv-004 (edited) Steven Timm 3 months ago It has taken more than an hour to copy 250GB from np04-srv-002 to np04-srv-004.. of course np04-srv-002 is under pretty heavy load due to the data challenge Pengfei Ding 3 months ago It is installed on both of the servers now. Steven Timm 3 months ago thanks Pengfei Ding 3 months ago 003 and 004 has an older “raid” controller for JBODs. Looking at Steve’s numbers, it looks like the disk usage pattern in DC4 *did not* expose the poor performance @ron was seeing before. The numbers in 004 is in-line with the expected rate increase by changing from raid 5 to 10. Ron Rechenmacer 3 months ago @timm Approx 70 MB/s. Is this np04-srv-002 to np04-srv-004 copy a direct copy? Or is there a local copy on np04-srv-002 first? (edited) Steven Timm 3 months ago This was a direct copy.. Steven Timm 3 months ago **iperf is showing about 3 Gbit/s between the 2 machines** Ron Rechenmacer 3 months ago **With direct copy, I would expect closer to 500 MB/s** Steven Timm 3 months ago yes--but it should be noted that np04-srv-002 was under heavy load due to the data challenge Ron Rechenmacer 3 months ago Was the data challenge using a /data0 while the copy using a different /data[1-3] ? Steven Timm 3 months ago the data challenge rotates between all 4 of the disks Steven Timm 3 months ago and it used all of them at some point during the hour Steven Timm 3 months ago so you saw a spurt of access--some fast, some slow, fast, slow, etc. Ron Rechenmacer 3 months ago If the individual files are going to be 4GB or less, I would hope that an attempt to take advantage of buff/cache would be made. That should help.