# Content management
## IPFS
### Background
IPFS (inter-planetary file system) is a peer-to-peer file system. It is a distributed system for storing and accessing files, websites, applications, and data. IPFS excels at providing CDN for publicly available data.
### Config
```json=
{
"API": {
"HTTPHeaders": {}
},
"Addresses": {
"API": "/ip4/0.0.0.0/tcp/5001",
"Announce": [],
"Gateway": "/ip4/0.0.0.0/tcp/8080",
"NoAnnounce": [
"/ip4/10.0.0.0/ipcidr/8",
"/ip4/100.64.0.0/ipcidr/10",
"/ip4/169.254.0.0/ipcidr/16",
"/ip4/172.16.0.0/ipcidr/12",
"/ip4/192.0.0.0/ipcidr/24",
"/ip4/192.0.2.0/ipcidr/24",
"/ip4/192.168.0.0/ipcidr/16",
"/ip4/198.18.0.0/ipcidr/15",
"/ip4/198.51.100.0/ipcidr/24",
"/ip4/203.0.113.0/ipcidr/24",
"/ip4/240.0.0.0/ipcidr/4",
"/ip6/100::/ipcidr/64",
"/ip6/2001:2::/ipcidr/48",
"/ip6/2001:db8::/ipcidr/32",
"/ip6/fc00::/ipcidr/7",
"/ip6/fe80::/ipcidr/10"
],
"Swarm": [
"/ip4/0.0.0.0/tcp/4001",
"/ip6/::/tcp/4001",
"/ip4/0.0.0.0/udp/4001/quic",
"/ip6/::/udp/4001/quic"
]
},
"AutoNAT": {},
"Bootstrap": [
"/ip4/10.244.0.175/tcp/4001/ipfs/12D3KooWFKir9yJFPzQma7MK45KeYq6Ye8b6iSXQ3ZE23veiEPb7"
],
"DNS": {
"Resolvers": {}
},
"Datastore": {
"BloomFilterSize": 1048576,
"GCPeriod": "1h",
"HashOnRead": false,
"Spec": {
"child": {
"path": "badgerds",
"syncWrites": false,
"truncate": true,
"type": "badgerds"
},
"prefix": "badger.datastore",
"type": "measure"
},
"StorageGCWatermark": 90,
"StorageMax": "180GB"
},
"Discovery": {
"MDNS": {
"Enabled": false,
"Interval": 10
}
},
"Experimental": {
"AcceleratedDHTClient": false,
"FilestoreEnabled": false,
"GraphsyncEnabled": false,
"Libp2pStreamMounting": false,
"P2pHttpProxy": false,
"ShardingEnabled": false,
"StrategicProviding": false,
"UrlstoreEnabled": false
},
"Gateway": {
"APICommands": [],
"HTTPHeaders": {
"Access-Control-Allow-Headers": [
"X-Requested-With",
"Range",
"User-Agent"
],
"Access-Control-Allow-Methods": [
"GET"
],
"Access-Control-Allow-Origin": [
"*"
]
},
"NoDNSLink": false,
"NoFetch": false,
"PathPrefixes": [],
"PublicGateways": null,
"RootRedirect": "",
"Writable": false
},
"Identity": {
"PeerID": "12D3KooWFKir9yJFPzQma7MK45KeYq6Ye8b6iSXQ3ZE23veiEPb7",
"PrivKey": "YOUR_PRIVATE_KEY"
},
"Internal": {},
"Ipns": {
"RecordLifetime": "",
"RepublishPeriod": "",
"ResolveCacheSize": 128
},
"Migration": {
"DownloadSources": [],
"Keep": ""
},
"Mounts": {
"FuseAllowOther": false,
"IPFS": "/ipfs",
"IPNS": "/ipns"
},
"Peering": {
"Peers": null
},
"Pinning": {
"RemoteServices": {}
},
"Plugins": {
"Plugins": null
},
"Provider": {
"Strategy": ""
},
"Pubsub": {
"DisableSigning": false,
"Router": ""
},
"Reprovider": {
"Interval": "12h",
"Strategy": "all"
},
"Routing": {
"Type": "dht"
},
"Swarm": {
"AddrFilters": [
"/ip4/10.0.0.0/ipcidr/8",
"/ip4/100.64.0.0/ipcidr/10",
"/ip4/169.254.0.0/ipcidr/16",
"/ip4/172.16.0.0/ipcidr/12",
"/ip4/192.0.0.0/ipcidr/24",
"/ip4/192.0.2.0/ipcidr/24",
"/ip4/192.168.0.0/ipcidr/16",
"/ip4/198.18.0.0/ipcidr/15",
"/ip4/198.51.100.0/ipcidr/24",
"/ip4/203.0.113.0/ipcidr/24",
"/ip4/240.0.0.0/ipcidr/4",
"/ip6/100::/ipcidr/64",
"/ip6/2001:2::/ipcidr/48",
"/ip6/2001:db8::/ipcidr/32",
"/ip6/fc00::/ipcidr/7",
"/ip6/fe80::/ipcidr/10"
],
"ConnMgr": {
"GracePeriod": "20s",
"HighWater": 2000,
"LowWater": 600,
"Type": "basic"
},
"DisableBandwidthMetrics": false,
"DisableNatPortMap": true,
"EnableAutoRelay": false,
"EnableRelayHop": false,
"Transports": {
"Multiplexers": {},
"Network": {},
"Security": {}
}
}
```
### Usage
We use IPFS in a private-CDN fashion. We add a swarm key to protect our peers - only peers that have the same swarm key and are configured in the bootstrap / peers list will be able to peer data to our network. We use IPFS in a single-node setup, which is not supported by IPFS, and consistently get the following error:
```
2022-01-26T21:19:21.613Z ERROR cmd/ipfs ipfs/daemon.go:567 failed to bootstrap (no peers found): consider updating Bootstrap or Peering section of your config
```
The container crashes, but as its policy is to always restart, it re-boots and IPFS continues to be operational. Until the next time it tries to replicate data to a peer...
### Bootstrap
We use the following script to bootstrap the IPFS configuration
```bash=
#!/bin/sh
set -e
set -x
# This is a custom entrypoint for k8s designed to run ipfs nodes in an appropriate
# setup for production scenarios.
if [ -f $IPFS_PATH/repo.lock ]; then
rm -f $IPFS_PATH/repo.lock
fi
#remove config if one is already attached and reconfigure
if [ -f $IPFS_PATH/config ]; then
rm -f $IPFS_PATH/config
fi
ipfs init --profile="server,badgerds"
ipfs config Datastore.StorageMax 180GB
ipfs config --json Swarm.ConnMgr.HighWater 2000
ipfs config --json Datastore.BloomFilterSize 1048576
ipfs config Addresses.API /ip4/0.0.0.0/tcp/5001
ipfs config Addresses.Gateway /ip4/0.0.0.0/tcp/8080
ipfs bootstrap rm --all
NODE_ID=`ipfs id -f="<id>"`
IPFS_POD_IP=`hostname -i`
ENDPOINT=/ip4/$IPFS_POD_IP/tcp/4001/ipfs/$NODE_ID
ipfs bootstrap add $ENDPOINT
chown -R ipfs $IPFS_PATH
# Always check for secret
[ -f $IPFS_PATH/swarm.key ] || {
echo "No swarm.key found, copying from mounted secret"
[ -f /etc/ipfs-secrets/swarm.key ] || {
echo "No swarm.key found in ipfs secret, please see"
echo "https://github.com/ipfs/go-ipfs/blob/v0.4.13/docs/experimental-features.md#private-networks"
echo "Then upload secret using something like:"
echo "kubectl create secret generic ipfs --from-file=./swarm.key"
exit 1
}
cp -v /etc/ipfs-secrets/swarm.key $IPFS_PATH/swarm.key
chmod 600 $IPFS_PATH/swarm.key
}
exit 0
```
What it does, in a nutshell:
1. It removes any config and lock files
2. It configures the node to be accessible from any IP that has access to it
3. It configures some storage metrics.
4. It removes all bootstrap peers. By default there are public peers that IPFS replicates to.
5. It gets the node ID, pod IP, builds an endpoint in an IPFS-compatible format, and adds it to the bootstrap list of peers. The problem with this IP is that this endpoint (ran in the bootstrap container, sidecar of the IPFS container) is exactly the same as the IPFS node endpoint.
6. Checks if there is a swarm key to protect the cluster, provides feedback if there isn't. If there is a swarm key (provided as k8s secret), it is taken from the volume mount of the secret and copied to the IPFS configuration.
After the configure-ipfs container is terminated, the ipfs container is booted in the same pod.
### Backup
Current backup procedure:
1. Terminal to an ipfs node.
2. Create an archive of the ipfs folder
```bash=
tar -czvf ipfs-backup.tar.gz /data/ipfs
```
3. Copy the archive locally
```bash=
kubectl cp default/[IPFS_POD]:/data/ipfs-backup.tar.gz ipfs-backup.tar.gz
```
### Restore Backup
1. Navigate to the folder where your backup is
2. Copy the backup to an IPFS pod
```bash=
kubectl cp ipfs-backup.tar.gz default/[IPFS_POD]:/data/ipfs-backup.tar.gz
```
3. Extract the contents of the archive to /data/ipfs
```bash=
tar -xvf /data/ipfs-backup.tar.gz /data/ipfs
```
**Note: Make sure you don't have nested folders inside the tar. Your archive should look like:**

4. Now is the awkward part of the restore - you have overwriten the configuration files, generated on-the-fly, with pre-defined values. Your Peer ID and Bootstrap Endpoint are wrong, as they are copied from another instance on another cluster. To remedy that, the easiest way is to remove the current IPFS deployment (/data/ipfs is persistent, so it won't be removed) and install it again. That will trigger the bootstrap script once again and it will override the node endpoint / bootstrap values with correct settings.
```bash=
kubectl delete deployment/ipfs-deployment
kubectl apply -f 11-ipfs-deployment.yml (from the specific environment folder)
```
### Issues
1. IPFS is in a consistent loop of restarts as there are no peers.
2. Adding a second node on the same cluster with the current setup is not possible as it leads to multi-attach-error - our PVC is ReadWriteOnce and doesn't support multiple containers writing to it. We have multiple mounts to the container.
3. Backing up and restoring IPFS is extremely cumbersome and error-prone.
4. Replication between nodes is not done, and we don't have daily backups.
### Remedies
1. Evaluate ipfs-cluster. It is made from the ground-up for peering.
2. Expose IPFS externally to internet and peer between different IPFS nodes. We won't even need backups if we do that - we will have all our data + hashes already replicated on a private swarm
3. Evaluate alternative CDN. We are currently abusing a system that is greet for p2p / public CDN and not-so-great and easy to manage for private one.