# filcollins storage provider setup
## table of contents
* [overview](#overview)
* [ssh config](#ssh-config)
* [firewall](#firewall)
* [hardware](#hardware)
* [required sp processes](#required-sp-processes)
* [long-term storage](#long-term-storage)
* [known issues](#known-issues)
## overview
The storage provider (SP) is running on Protocol Labs managed infrastructure.
https://filfox.info/en/address/f01953925
public ip: 209.94.92.6
public peer id: `12D3KooWNSRG5wTShNu6EXCPTkoH7dWsphKAPrbvQchHa5arfsDC`
---
## ssh config
```
Host worker-gpu-9
User nonsense
HostName 209.94.92.6
ForwardAgent yes
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
LogLevel QUIET
Host worker-gpu-10
Host worker-cpu-1-1
Host worker-cpu-1-4
```
---
## firewall
ufw is enabled on `worker-gpu-9` which is our public instance
ports 22, 24001, 2345 are enabled
---
## tailscale
tailscale / wireguard is installed on all hosts and a vpn is established with `sofiaminer`
---
## hardware
4 instances - 2 cpu instances ; 2 gpu instances
### cpu instance type
#### cpu
model name: AMD EPYC 7F32 8-Core Processor
#### storage
5 x 1.8TB HDDs running as a striped array, mounted at /mnt/hddvol, designated for scratch area
2 x NVMe running as a stripped array, mounted at /mnt/nvmevol, designated for scratch area
nonsense@worker-cpu-1-1:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
hddgroup 5 1 0 wz--n- 8.73t 33.65g
nvmegroup 2 1 0 wz--n- <2.62t <71.54g
#### memory
1 TB RAM
### gpu instance type
#### cpu
model name: Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz
#### memory
502 GB RAM
#### gpus
4 x NVIDIA Quadro RTX 6000 (24 GB GDDR6)
### gpu instance - worker-gpu-10
#### storage
500 TB Ceph system mounted at /mnt/ceph -- designated for long-term storage of sealed/unsealed sectors
3 x 900 GB HDDs running as a striped array, mounted at /storage/d1, designated for scratch area
### gpu instance - worker-gpu-9
#### storage
500 TB Ceph system mounted at /mnt/ceph -- designated for long-term storage of sealed/unsealed sectors
3 x 900 GB HDDs running as a striped array, mounted at /storage/d1
---
## required sp processes
### lotus daemon
* running in a tmux pane on worker-gpu-9
* uses / filesystem for chain storage
```
lotus daemon &>> lotus-daemon.log
```
### lotus-miner process
* running in a tmux pane on worker-gpu-9
* Make sure to use the below variable before starting the miner process
```
export CUDA_VISIBLE_DEVICES="0,1"
lotus-miner run &>> lotus-miner.log
```
### lotus-worker processes - sealing
1. running on worker-cpu-1-1 and worker-cpu-1-4
```
lotus-worker run --no-default --addpiece --precommit1 --data-cid
```
2. running on worker-gpu-10 (tmux GPU)
Sealing disk = /storage/d1/scratch
```
export CUDA_VISIBLE_DEVICES="0,1,2,3"
LOTUS_WORKER_PATH=/home/nonsense/.lotusworker lotus-worker run --no-default --precommit2 --commit --replica-update --prove-replica-update2 --regen-sector-key --name worker-gpu-10-worker0
```
3. running on worker-gpu-10 (tmux Storage)
Sealing disk = /storage/d1/scratch1
```
export CUDA_VISIBLE_DEVICES="4"
LOTUS_WORKER_PATH=/home/nonsense/.lotusworker1 lotus-worker run --no-default --no-local-storage --listen 0.0.0.0:3457 --name worker-gpu-10-STORAGE
```
### lotus-worker processes - windowpost on worker-gpu-9
```
export CUDA_VISIBLE_DEVICES="2"
LOTUS_WORKER_PATH=/home/nonsense/.lotusworkerpost lotus-worker run --no-default --windowpost --no-local-storage --listen 0.0.0.0:3457 --name worker-gpu-9-wdPost
```
### lotus-worker processes - winningPost on worker-gpu-9
```
export CUDA_VISIBLE_DEVICES="3"
LOTUS_WORKER_PATH=/home/nonsense/.lotuswinningpost lotus-worker run --no-default --winningpost --no-local-storage --listen 0.0.0.0:3458 --name worker-gpu-9-wnPost
```
### YugabyteDB docker container
* running on worker-gpu-10
```
sudo docker run -d --name yugabyte -p7000:7000 -p9000:9000 -p15433:15433 -p5433:5433 -p9042:9042 -v /home/nonsense/yb-home:/home/yugabyte yugabytedb/yugabyte:latest bin/yugabyted start --base_dir=/home/yugabyte/yb_data --daemon=false
```
### boostd-data service
* in TMUX on worker-gpu-10 after starting YugabyteDB container
```
boostd-data run yugabyte --hosts 127.0.0.1 --connect-string="postgresql://postgres:postgres@127.0.0.1:5433?sslmode=disable" --addr 0.0.0.0:8044
```
### boostd process
* running on worker-gpu-9
running as a service - `/etc/systemd/system/boostd.service`
incoming staging area is symlinked to Ceph
### additional configurations
* finalizeearly is `true`
* New sectors for deals are disabled
---
## long-term storage
we use Ceph system for long-term storage of sealed/unsealed sectors.
* 2x 500TiB is formatted with ext4
* worker-gpu-10 has access to one ceph device, mounted at `/mnt/ceph`
* worker-gpu-9 has access to another ceph device, mounted at `/mnt/ceph`
* devices are only attached to single machine, because they are currently RBD volumes (RWO).
---
## Tailscale
in order to be able to test out multi-boost / single LID flows, we have set up a VPN between sofiaminer and filcollins.
If the VPN needs to be restarted (eg it doesn't come back up when a machine is rebooted) use:
```
$ sudo tailscale logout
$ sudo tailscale login
Login credentials:
boostteam55@gmail.com
(password is in OnePassword)
```
---
## known issues
### pruning of chain data
at the moment we must manually prune the chain data on worker-gpu-9 as we are still not using splitstore with discard store
```
cd /mnt/ceph/tmp/ && aria2c -x5 https://snapshots.mainnet.filops.net/minimal/latest
lotus daemon stop
rm -rf /home/nonsense/.lotus/datastore/chain/
rm -rf /home/nonsense/.lotus/datastore/splitstore/
lotus daemon --import-snapshot /mnt/ceph/tmp/<snapshot-name>.car
```
### sealing pipeline gets blocked when trying to AddPiece if there are no extended Available sectors for snap deals
workaround: periodically use the `./new-extend-sectors.sh` script to extend sectors
TODO: Run on a daily cron
### lotus-miner almost never shuts down gracefully
when wanting to restart `lotus-miner` we have to `kill -9` it
```
ps -ef | grep "lotus-miner run" | head -1 | awk '{print $2}' | xargs kill -9
```
### replace username to something more generic
replace user from `nonsense` to `filadmin` or something
### no backups for repos
we should add periodic backups for all important repos -- lotus-miner