###### tags: `multi_az`, `redis` <!-- # Redis Multi-AZ Configuration ## Current Application in CC Listed below are the main Redis applications in CC project. Main features for each application will be validated during the migration (see *Checklist for Validation* section). - Tickers - Socket - Chat - Rate - Default Cache Storage for Rails - Low-level cache - Partial cache - Sidekiq Job Queue ## Spec Change ### Before | Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint | |:----:|:----:|:----:|:----:|:----:|:----:|:----:| | Staging | 3.2.10 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 | | Production | 3.2.10 | ap-northeast-1a | cache.t2.micro | prd-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 | ### After | Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint | |:----:|:----:|:----:|:----:|:----:|:----:|:----:| | Staging | 5.0.5 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 | | Production | 5.0.5 | ap-northeast-1a | cache.t2.micro | prd-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 | | Production | 5.0.5 | ap-northeast-1c | cache.t2.micro | prd-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 | ## Migration Flow (via AWS Console) ### (Prior to migration) - [ ] Enable timestamp Read/Write mechanism to Redis cluster ```ruby redis = Redis.new(host: 'prd-cc-redis', read_timeout: 0.2, write_timeout: 0.5, connect_timeout: 0.2, reconnect_delay: 1, reconnect_delay_max: 1) loop { puts "#{ redis.set('migration_monitoring', Time.current, ex: 10) rescue 'DOWN' } - #{Time.current}"; sleep(1) } ``` ### 1. Create Backup Under `Redis` in ElastiCache Dashboard: - [ ] Check the box of cluster `prd-cc-redis` -> click `Backup` -> click `Create Backup` ### 2. Add replication Under `Redis` in ElastiCache Dashboard: - [ ] Click on cluster name of `prd-cc-redis` -> click `Add Replication` -> click `Create` ### 3. Update Primary Endpoint for Tickers/Socket/Rails in _Route 53_ Under `Hosted zones/cc-local` in _Route 53_ Dashboard: - [ ] Update alias `prd-cc-redis.cc-local` (Redis endpoint) to the one shown on Redis cluster pane (see illustration below), in _Route 53_ ![](https://i.imgur.com/b6ptXRU.png) ### 4. Deploy Rails / Tickers / Socket - [ ] Update `/500.html` for maintenance announcement 1. Merge [PR](https://github.com/minkabu/cc_by_5ruby/pull/1836) & deploy code - [ ] Batch restart tickers with force new deployment - Go to `Task Definitions` on ECS Dashboard -> check `prd-cc-services-restarter` -> click `Actions` -> select `Run task` - [ ] Update socket with force new deployment - [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) and new endpoint are updated ### 5. Validate Changes *Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate the change of primary endopoint in step 3 before moving forward. ### 6. Create Replica Under `Redis` in ElastiCache Dashboard, click on cluster `prd-cc-redis` : - [ ] Create replica, `prd-cc-redis-1c` > Same node type, `cache.t2.micro`, will be applied to the new replica. ### 7. Enable Multi AZ Under `Redis` in ElastiCache Dashboard: - [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> check `Multi-AZ` & `Apply immediately` -> click `Update` ### 8. Execute Version Upgrade & Automatic Failover* Under `Redis` in ElastiCache Dashboard: - [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> on `Engine Version Compatibility`, select `5.0.5` -> click `Update` - [ ] Observe and log downtime* via timestamp > Known downtime: [READ/WRITE] on failing over to the other node Engine upgrade flow: 1. Rolling upgrade from replica(s) 2. Replica upgrade finishes 3. Replica becomes Primary (auto-failover)* 4. Old Primary upgrades 5. Upgrade finishes on all nodes *Note: version downgrade NOT available* ### 9. Validate Changes *Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate multi-AZ setup and engine upgrade (step 7 & 8). ### 10. Restore 500.html - [ ] Deploy Rails to remove the maintenance announcement in `/500.html` 1. Change [PR](https://github.com/minkabu/cc_by_5ruby/pull/1838)'s base branch to `develop` 2. Merge [PR](https://github.com/minkabu/cc_by_5ruby/pull/1838) 3. Draft Release PR 4. Merge Release PR 5. Deploy - [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) is restored ### 11. Remove Replica & Backup for Staging - [ ] Remove node `stg-cc-redis-1c` - [ ] Remove backup of `stg-cc-redis-1c` ## Downtime Estimate Actual downtime shall be updated after the test on Staging. ## Cost Estimate Monthly cost will increase by **$18.72** (cost for data transfer NOT included) should the read replica been added to `ap-northeast-1c` for production. There could be an additional **$9.36** from a 15-day (at most) test in Staging for the above setup, as it requires setting up a replica in `ap-northeast-1c`. ### Pricing Table for Multi-AZ Setup | Enviornment | Engine Version | Availibility Zone | Node Type | Memory | Pricing | Cost (30 days) | |:----:|:----:|:----:|:----:|:----:|:----:|:----:| | Staging | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 | | Production | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 | | Production | 5.0.4 | ap-northeast-1c | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 | ## Checklist for Validation - Partial Cache - [ ] [Mini-chart](https://cc.minkabu.jp) - Low-Level Cache - [ ] [Bar Graph](https://cc.minkabu.jp/pair/BTC_JPY) - [ ] TrendingArticlesService: `$ rake articles:update_trending_news` - Tickers ```ruby Exchange.joins(:currency_pairs_exchanges).uniq.select { |e|Time.at(e.currency_pairs_exchanges.joins(:currency_pair).order('currency_pairs.order ASC').first.latest_tick['time'] / 1000) < 30.seconds.ago } ``` - [Socket](https://cc.minkabu.jp/pair/BTC_JPY) - [ ] BTC/JPY rate on Header - [ ] Chat - [ ] pair#show - API - [ ] [pair#index (ticks in rate table)](https://cc.minkabu.jp/pair) - Sidekiq - [ ] SitemapRefreshWorker: update a column from backend to trigger the worker --> # Redis Multi-AZ Configuration Plan (Production) ## Current Application in CC Listed below are the main Redis applications in CC project. Main features for each application will be validated during the migration (see *Checklist for Validation* section). - Tickers - Socket - Chat - Rate - Default Cache Storage for Rails - Low-level cache - Partial cache - Sidekiq Job Queue ## Spec Change ### Before | Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint | |----|----|----|----|----| | Staging | 3.2.10 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 | | Production | 3.2.10 | ap-northeast-1a | cache.t2.micro | prd-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 | ### After | Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint | |----|----|----|----|----| | Staging | 5.0.5 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 | | Production | 5.0.5 | ap-northeast-1a | cache.t2.micro | Enpoint will be updated after *step 2* | | Production | 5.0.5 | ap-northeast-1c | cache.t2.micro | Enpoint will be updated after *step 2* | ## Migration Flow (via AWS Console) ### (Prior to migration) - [ ] Continuous Write to Redis to monitor status ```ruby redis = Redis.new(host: 'prd-cc-redis', read_timeout: 0.2, write_timeout: 0.5, connect_timeout: 0.2, reconnect_delay: 1, reconnect_delay_max: 1) loop { puts "#{ redis.set('migration_monitoring', Time.current, ex: 10) rescue 'DOWN' } - #{Time.current}"; sleep(1) } ``` ### 1. Create Backup Under `Redis` in ElastiCache Dashboard: - [ ] Check the box of cluster `prd-cc-redis` -> click `Backup` -> click `Create Backup` ### 2. Add replication Under `Redis` in ElastiCache Dashboard: - [ ] Click on cluster name of `prd-cc-redis` -> click `Add Replication` -> click `Create` ### 3. Update Primary Endpoint for Tickers/Socket/Rails in _Route 53_ Under `Hosted zones/cc-local` in _Route 53_ Dashboard: - [ ] Update alias `prd-cc-redis.cc-local` (Redis endpoint) to the one shown on Redis cluster pane (see illustration below), in _Route 53_ ![](https://i.imgur.com/b6ptXRU.png) ### 4. Deploy Rails / Tickers / Socket - [ ] Capistrano Deploy (Update `/500.html` for maintenance announcement) ```sh $ cap production deploy (enter) view/500_page#1814 ``` - [ ] Restart Ticker ECS Cluster `prd-cc-ticker` 1. Go to `Task Definitions` on ECS Dashboard 2. Check `prd-cc-services-restarter` 3. Click `Actions` 4. Select `Run task` - [ ] Restart Socket ECS Cluster `prd-cc-socket` 1. Go to `Clusters` on ECS Dashboard 2. Click on `prd-cc-socket` 3. Check `prd-cc-socket` 4. Click `Update` 5. Check `Force new deployment` 6. Click `Next Step` 7. Click `Next Step` 8. Click `Next Step` 9. Inspect details of the update 10. Click `Update Service` - [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) and new endpoint are updated ### 5. Validate Changes *Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate the change of primary endopoint in step 3 before moving forward. ### 6. Create Replica Under `Redis` in ElastiCache Dashboard, click on cluster `prd-cc-redis` : - [ ] Create replica, `prd-cc-redis-1c` > Same node type, `cache.t2.micro`, will be applied to the new replica. ### 7. Enable Multi AZ Under `Redis` in ElastiCache Dashboard: - [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> check `Multi-AZ` & `Apply immediately` -> click `Update` ### 8. Execute Version Upgrade & Automatic Failover* Under `Redis` in ElastiCache Dashboard: - [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> on `Engine Version Compatibility`, select `5.0.5` -> click `Update` - [ ] Observe and log downtime* via timestamp > Known downtime: [READ/WRITE] on failing over to the other node Engine upgrade flow: 1. Rolling upgrade from replica(s) 2. Replica upgrade finishes 3. Replica becomes Primary (auto-failover)* 4. Old Primary upgrades 5. Upgrade finishes on all nodes *Note: version downgrade NOT available* ### 9. Validate Changes *Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate multi-AZ setup and engine upgrade (step 7 & 8). ### 10. Restore 500.html - [ ] Capistrano Deploy (Remove the maintenance announcement in `/500.html`) ```sh $ cap production deploy (enter) master ``` - [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) is restored ### 11. Remove Replica & Backup for Staging - [ ] Remove node `stg-cc-redis-1c` - [ ] Remove backup of `stg-cc-redis-1c` ## Downtime Estimate Actual downtime shall be updated after the test on Staging. ## Cost Estimate Monthly cost will increase by **$18.72** (cost for data transfer NOT included) should the read replica been added to `ap-northeast-1c` for production. There could be an additional **$9.36** from a 15-day (at most) test in Staging for the above setup, as it requires setting up a replica in `ap-northeast-1c`. ### Pricing Table for Multi-AZ Setup | Enviornment | Engine Version | Availibility Zone | Node Type | Memory | Pricing | Cost (30 days) | |----|----|----|----|----|----|----| | Staging | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 | | Production | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 | | Production | 5.0.4 | ap-northeast-1c | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 | ## Checklist for Validation - Partial Cache - [ ] [Mini-chart](https://cc.minkabu.jp) - Low-Level Cache - [ ] [Bar Graph](https://cc.minkabu.jp/pair/BTC_JPY) - Tickers ```ruby Exchange.joins(:currency_pairs_exchanges).uniq.select { |e|Time.at(e.currency_pairs_exchanges.joins(:currency_pair).order('currency_pairs.order ASC').first.latest_tick['time'] / 1000) < 30.seconds.ago } ``` - [Socket](https://cc.minkabu.jp/pair/BTC_JPY) - [ ] BTC/JPY rate on Header - [ ] Chat - pair#show - [ ] [Sample 1](https://cc.minkabu.jp/pair/BTC_JPY) - [ ] [Sample 2](https://cc.minkabu.jp/pair/BNT_BTC) - [ ] [Sample 3](https://cc.minkabu.jp/pair/BTC_USDT) - API - [ ] [pair#index (ticks in rate table)](https://cc.minkabu.jp/pair) - Low-Level Cache - [ ] TrendingArticlesService: `$ rake articles:update_trending_news` - Sidekiq - [ ] SitemapRefreshWorker: update a column from backend to trigger the worker