# Copying a huge AWS S3 bucket between accounts
----
## The Challenge
Total Size: 3.1 TB of Data
Count: 906,4 Million files
Estimated avg. file size (percentile): 3KB
---
## The Price
----
### The Base for copy
- API Call: 9 Mio PUT requests: ~ 5000 USD
- API Call: 9 Mio GET request: ~ 400 USD
- Traffic: 3.1 TB: ~ 60 USD
---
## The Approaches
----
### Plain copy
single threaded with 1s per file
=> 29 years to copy
Who pays the bill for such a long running EC2 instance?
----
### Distributed Copy: s3distCopy
This will spin-up an EMR cluster, which needs to be right scaled and paid!
----
### AWS S3 Batch Operations
- ASBO To the rescue ;)
- Adding 1000 USD for the service and
- 9 Mio HEAD request to the bill
----
#### Some hard limits
> * 3,500 PUT/COPY/POST/DELETE
> * 5,500 GET/HEAD
> each request 85-90 MB/s
req/s/prefix
----
#### Some calculations
----
##### Resulting min duration speed (by API limits):
906,4 Million Objects / 3,500 COPY Ops per second = 71 hours
----
##### Resulting min duration (by troughput):
3100000 MB (3,1 TB) / 85 MB/s = 10 hours
----
##### From the team ....
> In the prepare Phase, you could count as much as 750K objects/minute, so a 10Million objects would require 10-15 mins to be prepared. During the active step (the actual copy), you could observe as much as 900 to 3000 TPS.
>
{"metaMigratedAt":"2023-06-15T09:43:02.176Z","metaMigratedFrom":"Content","title":"Copying a huge AWS S3 bucket between accounts","breaks":true,"contributors":"[{\"id\":\"130d8f78-250e-42fe-888c-6850e4fc2347\",\"add\":1413,\"del\":49}]"}