Storage Calculation system

# Storage Calculation system The current storage calculation for web3.storage is a bulky cron job that runs every few hours and looks through absolutely every single user. Already this has started to time out because of the high amount of usage. It is designed to loop through all the pins/uploads that a user has and calculate the used storage. Then afterwards, notify the customer. If the customer has lots and lots of pins or uploads, then this cron job will eventually fail. This proposal outlines a solution that is simple, scalable and cheap. > This proposal assumes that we have a trusted file size. ## How it works (briefly) 1. User uploads file and gets put into EIPFS (as normal) 2. Once we have the total size of the file, we emit a message to a new FIFO SQS queue. We also emit a message when a file is marked as successfully pinned. 3. The queue message triggers an SST based lambda that adds the file size as a running total to the users storage in DynamoDB 4. The updated row then triggers an SST based Lambda which looks up the available storage from the web3.storage API for the user (or defaults to 1TiB for now) and if the total storage is over the limit, then send an email. ## System Diagram ![](https://i.imgur.com/zBkaJOu.jpg) ## Database Schema Inside DynamoDB, the schema will look like this. ```ts interface UserStorage { user_id: String; total_storage: Number; updated_at: Date } ``` ## How to manage historical data As a quick fix, we can write a script to go through the uploads up until a certain date and emit queue messages manually for them. ## Tasks 1. Provision a new FIFO SQS queue, DynamoDB and Lambda. 2. Create a new project/workspace in the web3.storage repo. 3. When a file is uploaded completely, and we have the size, emit an SQS message with the userId, file size and timestamp. 4. Create the lambda to add the total file storage used based on the SQS message and update DynamoDB. 5. Create the temporary script to resolve historical data. 6. Create the consumer that gets the newly updated user storage record and looks up what their allowed storage should be. Email if they are over quota. ### Second Iteration Tasks > Now we have a basic implementation working, we can add support for user pricing plans 1. Create new Machine-to-Machine authentication middleware for the Web3.storage API (so that the lamdbas can communicate with the API). 2. Create a new endpoint on the web3.storage API to get the allowed storage for a customer based on their auth token. ## FAQ ### Why SQS? 1. Allows us to scale uploads independantly of the rest of the system. If we have a large number of uploads, the consumer can just churn through the queue. No impact to user facing systems. 2. Allows for resiliency. As is, if the storage cron fails, then we need to completely re-run the task. Using SQS, we can recover from failed requests with a dead-letter queue. ### DynamoDB or Postgres? The advantage of Dynamo is that we can use it as a trigger for the email notification lambda. It also means the logic is completely separate and can scale independently of the API and other systems. Being entirely on AWS, it lends itself to a "cloud-first" architecture. But the downsides are A) we have two sources of user data, B) it's got to be made accessible for the frontend. The advantage of Postgres is that we can keep data all in the same place. And join it with other data. The downsides are that we would need another system in place to trigger the email notifications when the record gets updated. ### How do we handle reporting? Reporting on the amount of data users across the system have used will always be a challenge. To accomplish this, we will need to get a list of users from Postgres and then scan DynamoDB for their records. Afterwards, we can join the records in the code itself. Due to DynamoDB's scan limit of 1MB (in a single request), based on a "row" size of 58 bytes then we can expect to only grab 17,241 rows at a time. ### How do we handle email de-duplication? An external email system would ideally handle email de-duplication along with the emailing to the customer. Alternatively, we would need to provide an interface over the `email_history` table. ### How to disable over-quota users automatically There are a few solutions to this problem. My preference is personally for solution 2 - due to keeping the logic separate. 1. We add a new column to the `users` table to record when the person breached their storage limit. If a grace-period elapses and they upload again, it will check the time the limit was breached and then disable their account. 2. Send a message to another SQS queue but with a visibility timeout set to the grace period allowed. After this period has elapsed, a Lambda would recieve the message, do a final check to see if the user is still over quota and then disable their account. We can de-duplicate messages to this queue based on their userId.