# Performance report: SN max download speed (single thread)
The goal here is measure the maximum possible download speed from one single Storagenode using only one single thread.
**Motivation:** To achieve decent performance we usually suggest to use multiple parallel uploads/downloads.
From an offline conversation with [@littleskunk](https://github.com/littleskunk), I learned that:
* We don't really know what is the exact limitation of single-thread downloads
* Previous performance test showed that we have some kind of bottleneck as we couldn't fully utilize download bandwidth
Initial profiling showed that bottleneck is the lower chunk-size as we need separated RPC calls for each chunk (*including separated signature check/calculation!!!*).
This test tried to prove that chunk size need adjustments.
## Results
* It's proven that we have strong correlation between chunk size and max download throughput:
* Downloading with 64kb chunks: 250-270 Mbyte/s (34 RPC calls)
* Downloading with 1Mb chunks: 480-490Mbyte/s (3 RPC calls)
* Downloading with 64Kb --> 256Kb (production values): ~280 Mbyte/s (11 RPC calls)
* Our current chunk size algorithm should be adjusted
## What is the chunk size?
* All data are is in pieces on Storagenode.
* Piece is one erasure coded unit of a segment.
* As we have 64 Mb max segment size, and `29/35/80/110-256B` Reed-Solomon parameters (29 parts of useful data), our maximum piece size is `64Mb/29 ~= 2.2Mb`
But we don't download pieces in one part. We download *chunks*, smaller binary blobs. New chunks are requested with sending new orders (a signed pair of the orderlimit serial number + the amount what we willing to pay)
Using smaller chunks...:
* requires less memory on both side
* requires more RPC calls
* requires more signature calculation
* requires slightly more trust in the storagenodes (we pay in advance with the order itself)
Our current default chunk size is 64Kb, which is increased with each request (until 256Kb).
`private/piecestore/client.go`:
```go
// DefaultConfig are the default params used for upload and download.
var DefaultConfig = Config{
DownloadBufferSize: 256 * memory.KiB.Int64(),
InitialStep: 64 * memory.KiB.Int64(),
MaximumStep: 256 * memory.KiB.Int64(),
MessageTimeout: 10 * time.Minute,
}
```
The logic to increment it:
`private/piecestore/client.go`:
```go
// next allocation step find the next trusted step.
func (client *Client) nextAllocationStep(previous int64) int64 {
// TODO: ensure that this is frame idependent
next := previous * 3 / 2
if next > client.config.MaximumStep {
next = client.config.MaximumStep
}
return next
}
```
Chunk size is also limited on the storage node side to 1MB:
```go
var maximumChunkSize = 1 * memory.MiB.Int64()
```
## Test method
* Using local storj-up on a strong developer machine (NVMe, big memory, strong CPU)
* storj-up used the dev settings (`"4/6/8/10-256B"` RC configuration)
* To use piece size similar to the maximum piece size in production (64MB/29) we uploaded 8.82 Mb of file (8.82Mb / 4 ~= 64Mb / 29)
* Piece is downloaded directly from storagenode without using any metadata call
* Satellite identity is used locally to generate order-limits
* Same full piece (~2.2 Mb) is downloaded multiple times (usually 1000x)
* Test tool can be found [here](https://github.com/elek/stbb)
Cluster setup:
```bash
export STORJUP_PROJECT_DIR=/home/elek/j
storj-up init standalone db,minimal
direnv allow
#line
#export STORJ_DATABASE_OPTIONS_MIGRATION_UNSAFE=snapshot,testdata
#is added to the satellite-api.sh
supervisord
```
Test data:
```bash
uplink mb sj://bucket1
dd if=/dev/random of=/tmp/file count=8827586 bs=1
uplink cp /tmp/file sj://bucket1/testfile
uplink ls --encrypted sj://bucket1
```
```bash
stbb piece nodes sj://bucket1/AoBWUW0VDlDB4TzSqV8W41EQyEJ4Ky4CFsGFWe2jz9-rzJvX-0g=
12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
12mRefE33VmgNovxa1uH18ULYVCAKsJsA5LzJM7Pyw4bo1eZDa@127.0.0.1:30091 PR7IULJR63LW4NNKRIXHSZR5ZS2MAZNUI5OLWYK3U32GCN6XR62A 2212352
12sUpfAXxcCBD7E3LPiFnKqmKALsXy91v8nJSMw127iUHYCZQtB@127.0.0.1:30041 Y6CA5FKQXNTEUKCI36Q7HFKMPOEVEULSHU3CVNMA7P7FGEW7BFBQ 2212352
12owbjgnrj4ddzgvJRJ8GcqHNYFrJrUjsqHNvw9a52bcKYqsN1Q@127.0.0.1:30071 VCX7PTP4XZADKO5WT3PHUW4XSG3CQ3X5DLPHMUYXDXMU27UOFM5A 2212352
1b27dxNHp7QE8nq58AyzCzgHzU3dMwxsCD3yZNSN5r7BE6sMwL@127.0.0.1:30021 65PFCMRV7JLPIWBVFVDTQOUHE43Q2B4CHOIS2CTAOE5VP5CG7G5Q 2212352
```
## 64Mbyte chunks size
```
stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.950167 sec (with 34 chunk/RPC request in average), which is 265.386025 Mbytes/sec
stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.930422 sec (with 34 chunk/RPC request in average), which is 266.046779 Mbytes/sec
stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 8.158461 sec (with 34 chunk/RPC request in average), which is 258.610460 Mbytes/sec
stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.978887 sec (with 34 chunk/RPC request in average), which is 264.430762 Mbytes/sec
stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 8.296540 sec (with 34 chunk/RPC request in average), which is 254.306413 Mbytes/sec
```
### Full piece chunk size (2212352)
**!!!Please note that chunk size is maxed at 1MB on server side**!!!:
The real chunk size what we used here is 1MB (3 rpc calls):
```
1048576
1048576
115200
```
Results:
```
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.385481 sec (with 3 chunk/RPC request in average), which is 481.101861 Mbytes/sec
>stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.356697 sec (with 3 chunk/RPC request in average), which is 484.280470 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.407326 sec (with 3 chunk/RPC request in average), which is 478.717343 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.303351 sec (with 3 chunk/RPC request in average), which is 490.283762 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.327442 sec (with 3 chunk/RPC request in average), which is 487.554419 Mbytes/sec
```
### 64 -> 256kb
Using real world incremental chunk size (64kb * 1.5 * 1.5 ... 256kb)
```
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.528563 sec (with 11 chunk/RPC request in average), which is 280.247793 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.488968 sec (with 11 chunk/RPC request in average), which is 281.729517 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.335714 sec (with 11 chunk/RPC request in average), which is 287.615258 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.484290 sec (with 11 chunk/RPC request in average), which is 281.905585 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 7.304354 sec (with 11 chunk/RPC request in average), which is 288.850089 Mbytes/sec
```
### 2.2 Mb chunk size
This test is executed with patches storagenode:
```git
diff --git storagenode/piecestore/endpoint.go storagenode/piecestore/endpoint.go
index 16cf7b877..19fe03911 100644
--- storagenode/piecestore/endpoint.go
+++ storagenode/piecestore/endpoint.go
@@ -596,7 +596,7 @@ func (endpoint *Endpoint) Download(stream pb.DRPCPiecestore_DownloadStream) (err
group, ctx := errgroup.WithContext(ctx)
group.Go(func() (err error) {
- var maximumChunkSize = 1 * memory.MiB.Int64()
+ var maximumChunkSize = 3 * memory.MiB.Int64()
currentOffset := chunk.Offset
unsentAmount := chunk.ChunkSize
```
Results:
```
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.380212 sec (with 1 chunk/RPC request in average), which is 481.680665 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.445950 sec (with 1 chunk/RPC request in average), which is 474.558435 Mbytes/sec
> stbb piece piece-download 12q4SQZBKKWuk43BgBSe6xpBs7tM7w2jKTR9v2QwRtdJjaeqvug@127.0.0.1:30011 KXMJAAL3RH3BTJXG6FYXQZCXQVLTCF4WVWOEVPWMQC2ICNTNJIJA 2212352
2109 Mbytes are downloaded under 4.645408 sec (with 1 chunk/RPC request in average), which is 454.182514 Mbytes/sec
```
## References
### Versions
* **storj/storj**:
* `59b37db67019590c5a1008d73d23f644cd0dd745 2022-11-09T03:15:57+00:00 storagenode: overhaul QUIC check implementation`
* **storj/up**:
* `290e03daf29d1a8ed312e9258716c36cb98a361c 2022-11-07T11:17:32+01:00 cmd/testdata.go: allow multiple testdata project-usage commands (diff fix)`
* **elek/stbb** (test tool):
* `98aaed14278e9e6dd553a2a7b10d5a59da5fb589 2022-11-09T13:04:05+01:00 initial commit`
### Used hardware
Local developer machine. Negligable other services:
```
VENDOR
Manufacturer: ASUS
Product Name: System Product Name
MEMORY
Size: 32 GB
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
Size: 32 GB
Non-Volatile Size: None
Volatile Size: 32 GB
Cache Size: None
Logical Size: None
CPUs
Version: 12th Gen Intel(R) Core(TM) i5-12400
Max Speed: 4400 MHz
Core Enabled: 6
Thread Count: 12
DISKS
SSD 980 PRO 1TB
├─nvme0n1p1 100M part vfat /boot
├─nvme0n1p2 16M part
├─nvme0n1p3 243.4G part ntfs
├─nvme0n1p4 611M part ntfs
└─nvme0n1p5 687.4G part crypto_LUKS
└─root 687.4G crypt ext4
```