# Counting Downloads for crates.io
crates.io is currently counting downloads in the application's backend. Given the high volume of requests to the `/download` endpoint, this is starting to cause performance issues. From the perspective of the crates team, it would be preferable to avoid serving download requests through the application and instead serve them directly from the CDN. But this will require a new solution for counting crate downloads.
A proposed approach is to count downloads based on the request logs. This would provide a scalable mechanism that counts downloads in batches, reducing the load on the backend and database.
Request logs are currently uploaded to S3 from both CloudFront and Fastly. The format of the logs differs greatly, but both contain the requested URL which can be used to determine the crate name and version.
A simple mechanism could be to trigger an AWS Lambda function whenever a file is uploaded to the S3 bucket where logs are stored. The function would then parse the file, extract all downloads, count them, and call an API to update the counts in the database.
While I think this is reasonable, I am a bit concerned about the following problems:
- How can the function be developed and tested? We don't have tooling for development and testing in `simpleinfra`, which seems like the wrong place for the code anyways.
- How can the function be deployed? Is it feasible to deploy from GitHub Actions?
- How can we trigger the function manually, e.g. when parsing a file failed and it needs to be processed again?
- How can the crates team debug failures and retry files without access to the S3 bucket?
- How do we monitor the correct execution of the function and any errors that might occur?