# Copying a huge AWS S3 bucket between accounts ---- ## The Challenge Total Size: 3.1 TB of Data Count: 906,4 Million files Estimated avg. file size (percentile): 3KB --- ## The Price ---- ### The Base for copy - API Call: 9 Mio PUT requests: ~ 5000 USD - API Call: 9 Mio GET request: ~ 400 USD - Traffic: 3.1 TB: ~ 60 USD --- ## The Approaches ---- ### Plain copy single threaded with 1s per file => 29 years to copy Who pays the bill for such a long running EC2 instance? ---- ### Distributed Copy: s3distCopy This will spin-up an EMR cluster, which needs to be right scaled and paid! ---- ### AWS S3 Batch Operations - ASBO To the rescue ;) - Adding 1000 USD for the service and - 9 Mio HEAD request to the bill ---- #### Some hard limits > * 3,500 PUT/COPY/POST/DELETE > * 5,500 GET/HEAD > each request 85-90 MB/s req/s/prefix ---- #### Some calculations ---- ##### Resulting min duration speed (by API limits): 906,4 Million Objects / 3,500 COPY Ops per second = 71 hours ---- ##### Resulting min duration (by troughput): 3100000 MB (3,1 TB) / 85 MB/s = 10 hours ---- ##### From the team .... > In the prepare Phase, you could count as much as 750K objects/minute, so a 10Million objects would require 10-15 mins to be prepared. During the active step (the actual copy), you could observe as much as 900 to 3000 TPS. >
{"metaMigratedAt":"2023-06-15T09:43:02.176Z","metaMigratedFrom":"Content","title":"Copying a huge AWS S3 bucket between accounts","breaks":true,"contributors":"[{\"id\":\"130d8f78-250e-42fe-888c-6850e4fc2347\",\"add\":1413,\"del\":49}]"}
    199 views