###### tags: `multi_az`, `redis`
<!--
# Redis Multi-AZ Configuration
## Current Application in CC
Listed below are the main Redis applications in CC project. Main features for each application will be validated during the migration (see *Checklist for Validation* section).
- Tickers
- Socket
- Chat
- Rate
- Default Cache Storage for Rails
- Low-level cache
- Partial cache
- Sidekiq Job Queue
## Spec Change
### Before
| Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint |
|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
| Staging | 3.2.10 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 |
| Production | 3.2.10 | ap-northeast-1a | cache.t2.micro | prd-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 |
### After
| Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint |
|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
| Staging | 5.0.5 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 |
| Production | 5.0.5 | ap-northeast-1a | cache.t2.micro | prd-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 |
| Production | 5.0.5 | ap-northeast-1c | cache.t2.micro | prd-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 |
## Migration Flow (via AWS Console)
### (Prior to migration)
- [ ] Enable timestamp Read/Write mechanism to Redis cluster
```ruby
redis = Redis.new(host: 'prd-cc-redis', read_timeout: 0.2, write_timeout: 0.5, connect_timeout: 0.2, reconnect_delay: 1, reconnect_delay_max: 1)
loop { puts "#{ redis.set('migration_monitoring', Time.current, ex: 10) rescue 'DOWN' } - #{Time.current}"; sleep(1) }
```
### 1. Create Backup
Under `Redis` in ElastiCache Dashboard:
- [ ] Check the box of cluster `prd-cc-redis` -> click `Backup` -> click `Create Backup`
### 2. Add replication
Under `Redis` in ElastiCache Dashboard:
- [ ] Click on cluster name of `prd-cc-redis` -> click `Add Replication` -> click `Create`
### 3. Update Primary Endpoint for Tickers/Socket/Rails in _Route 53_
Under `Hosted zones/cc-local` in _Route 53_ Dashboard:
- [ ] Update alias `prd-cc-redis.cc-local` (Redis endpoint) to the one shown on Redis cluster pane (see illustration below), in _Route 53_

### 4. Deploy Rails / Tickers / Socket
- [ ] Update `/500.html` for maintenance announcement
1. Merge [PR](https://github.com/minkabu/cc_by_5ruby/pull/1836) & deploy code
- [ ] Batch restart tickers with force new deployment
- Go to `Task Definitions` on ECS Dashboard -> check `prd-cc-services-restarter` -> click `Actions` -> select `Run task`
- [ ] Update socket with force new deployment
- [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) and new endpoint are updated
### 5. Validate Changes
*Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate the change of primary endopoint in step 3 before moving forward.
### 6. Create Replica
Under `Redis` in ElastiCache Dashboard, click on cluster `prd-cc-redis` :
- [ ] Create replica, `prd-cc-redis-1c`
> Same node type, `cache.t2.micro`, will be applied to the new replica.
### 7. Enable Multi AZ
Under `Redis` in ElastiCache Dashboard:
- [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> check `Multi-AZ` & `Apply immediately` -> click `Update`
### 8. Execute Version Upgrade & Automatic Failover*
Under `Redis` in ElastiCache Dashboard:
- [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> on `Engine Version Compatibility`, select `5.0.5` -> click `Update`
- [ ] Observe and log downtime* via timestamp
> Known downtime: [READ/WRITE] on failing over to the other node
Engine upgrade flow:
1. Rolling upgrade from replica(s)
2. Replica upgrade finishes
3. Replica becomes Primary (auto-failover)*
4. Old Primary upgrades
5. Upgrade finishes on all nodes
*Note: version downgrade NOT available*
### 9. Validate Changes
*Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate multi-AZ setup and engine upgrade (step 7 & 8).
### 10. Restore 500.html
- [ ] Deploy Rails to remove the maintenance announcement in `/500.html`
1. Change [PR](https://github.com/minkabu/cc_by_5ruby/pull/1838)'s base branch to `develop`
2. Merge [PR](https://github.com/minkabu/cc_by_5ruby/pull/1838)
3. Draft Release PR
4. Merge Release PR
5. Deploy
- [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) is restored
### 11. Remove Replica & Backup for Staging
- [ ] Remove node `stg-cc-redis-1c`
- [ ] Remove backup of `stg-cc-redis-1c`
## Downtime Estimate
Actual downtime shall be updated after the test on Staging.
## Cost Estimate
Monthly cost will increase by **$18.72** (cost for data transfer NOT included) should the read replica been added to `ap-northeast-1c` for production.
There could be an additional **$9.36** from a 15-day (at most) test in Staging for the above setup, as it requires setting up a replica in `ap-northeast-1c`.
### Pricing Table for Multi-AZ Setup
| Enviornment | Engine Version | Availibility Zone | Node Type | Memory | Pricing | Cost (30 days) |
|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
| Staging | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 |
| Production | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 |
| Production | 5.0.4 | ap-northeast-1c | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 |
## Checklist for Validation
- Partial Cache
- [ ] [Mini-chart](https://cc.minkabu.jp)
- Low-Level Cache
- [ ] [Bar Graph](https://cc.minkabu.jp/pair/BTC_JPY)
- [ ] TrendingArticlesService: `$ rake articles:update_trending_news`
- Tickers
```ruby
Exchange.joins(:currency_pairs_exchanges).uniq.select { |e|Time.at(e.currency_pairs_exchanges.joins(:currency_pair).order('currency_pairs.order ASC').first.latest_tick['time'] / 1000) < 30.seconds.ago }
```
- [Socket](https://cc.minkabu.jp/pair/BTC_JPY)
- [ ] BTC/JPY rate on Header
- [ ] Chat
- [ ] pair#show
- API
- [ ] [pair#index (ticks in rate table)](https://cc.minkabu.jp/pair)
- Sidekiq
- [ ] SitemapRefreshWorker: update a column from backend to trigger the worker
-->
# Redis Multi-AZ Configuration Plan (Production)
## Current Application in CC
Listed below are the main Redis applications in CC project. Main features for each application will be validated during the migration (see *Checklist for Validation* section).
- Tickers
- Socket
- Chat
- Rate
- Default Cache Storage for Rails
- Low-level cache
- Partial cache
- Sidekiq Job Queue
## Spec Change
### Before
| Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint |
|----|----|----|----|----|
| Staging | 3.2.10 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 |
| Production | 3.2.10 | ap-northeast-1a | cache.t2.micro | prd-cc-redis.dt7t0e.0001.apne1.cache.amazonaws.com:6379 |
### After
| Enviornment | Engine Version | Availibility Zone | Node Type | Primary Endpoint |
|----|----|----|----|----|
| Staging | 5.0.5 | ap-northeast-1a | cache.t2.micro | stg-cc-redis.dt7t0e.ng.0001.apne1.cache.amazonaws.com:6379 |
| Production | 5.0.5 | ap-northeast-1a | cache.t2.micro | Enpoint will be updated after *step 2* |
| Production | 5.0.5 | ap-northeast-1c | cache.t2.micro | Enpoint will be updated after *step 2* |
## Migration Flow (via AWS Console)
### (Prior to migration)
- [ ] Continuous Write to Redis to monitor status
```ruby
redis = Redis.new(host: 'prd-cc-redis', read_timeout: 0.2, write_timeout: 0.5, connect_timeout: 0.2, reconnect_delay: 1, reconnect_delay_max: 1)
loop { puts "#{ redis.set('migration_monitoring', Time.current, ex: 10) rescue 'DOWN' } - #{Time.current}"; sleep(1) }
```
### 1. Create Backup
Under `Redis` in ElastiCache Dashboard:
- [ ] Check the box of cluster `prd-cc-redis` -> click `Backup` -> click `Create Backup`
### 2. Add replication
Under `Redis` in ElastiCache Dashboard:
- [ ] Click on cluster name of `prd-cc-redis` -> click `Add Replication` -> click `Create`
### 3. Update Primary Endpoint for Tickers/Socket/Rails in _Route 53_
Under `Hosted zones/cc-local` in _Route 53_ Dashboard:
- [ ] Update alias `prd-cc-redis.cc-local` (Redis endpoint) to the one shown on Redis cluster pane (see illustration below), in _Route 53_

### 4. Deploy Rails / Tickers / Socket
- [ ] Capistrano Deploy (Update `/500.html` for maintenance announcement)
```sh
$ cap production deploy
(enter) view/500_page#1814
```
- [ ] Restart Ticker ECS Cluster `prd-cc-ticker`
1. Go to `Task Definitions` on ECS Dashboard
2. Check `prd-cc-services-restarter`
3. Click `Actions`
4. Select `Run task`
- [ ] Restart Socket ECS Cluster `prd-cc-socket`
1. Go to `Clusters` on ECS Dashboard
2. Click on `prd-cc-socket`
3. Check `prd-cc-socket`
4. Click `Update`
5. Check `Force new deployment`
6. Click `Next Step`
7. Click `Next Step`
8. Click `Next Step`
9. Inspect details of the update
10. Click `Update Service`
- [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) and new endpoint are updated
### 5. Validate Changes
*Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate the change of primary endopoint in step 3 before moving forward.
### 6. Create Replica
Under `Redis` in ElastiCache Dashboard, click on cluster `prd-cc-redis` :
- [ ] Create replica, `prd-cc-redis-1c`
> Same node type, `cache.t2.micro`, will be applied to the new replica.
### 7. Enable Multi AZ
Under `Redis` in ElastiCache Dashboard:
- [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> check `Multi-AZ` & `Apply immediately` -> click `Update`
### 8. Execute Version Upgrade & Automatic Failover*
Under `Redis` in ElastiCache Dashboard:
- [ ] Check the box of cluster `prd-cc-redis` -> click `Modify` -> on `Engine Version Compatibility`, select `5.0.5` -> click `Update`
- [ ] Observe and log downtime* via timestamp
> Known downtime: [READ/WRITE] on failing over to the other node
Engine upgrade flow:
1. Rolling upgrade from replica(s)
2. Replica upgrade finishes
3. Replica becomes Primary (auto-failover)*
4. Old Primary upgrades
5. Upgrade finishes on all nodes
*Note: version downgrade NOT available*
### 9. Validate Changes
*Checklist for validation* section inlcudes the key Redis applications in CC. Each item will be examined to validate multi-AZ setup and engine upgrade (step 7 & 8).
### 10. Restore 500.html
- [ ] Capistrano Deploy (Remove the maintenance announcement in `/500.html`)
```sh
$ cap production deploy
(enter) master
```
- [ ] Check whether [/500.html](https://cc.minkabu.jp/500.html) is restored
### 11. Remove Replica & Backup for Staging
- [ ] Remove node `stg-cc-redis-1c`
- [ ] Remove backup of `stg-cc-redis-1c`
## Downtime Estimate
Actual downtime shall be updated after the test on Staging.
## Cost Estimate
Monthly cost will increase by **$18.72** (cost for data transfer NOT included) should the read replica been added to `ap-northeast-1c` for production.
There could be an additional **$9.36** from a 15-day (at most) test in Staging for the above setup, as it requires setting up a replica in `ap-northeast-1c`.
### Pricing Table for Multi-AZ Setup
| Enviornment | Engine Version | Availibility Zone | Node Type | Memory | Pricing | Cost (30 days) |
|----|----|----|----|----|----|----|
| Staging | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 |
| Production | 5.0.4 | ap-northeast-1a | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 |
| Production | 5.0.4 | ap-northeast-1c | cache.t2.micro | 555 MB | $0.026 / hour | 18.72 |
## Checklist for Validation
- Partial Cache
- [ ] [Mini-chart](https://cc.minkabu.jp)
- Low-Level Cache
- [ ] [Bar Graph](https://cc.minkabu.jp/pair/BTC_JPY)
- Tickers
```ruby
Exchange.joins(:currency_pairs_exchanges).uniq.select { |e|Time.at(e.currency_pairs_exchanges.joins(:currency_pair).order('currency_pairs.order ASC').first.latest_tick['time'] / 1000) < 30.seconds.ago }
```
- [Socket](https://cc.minkabu.jp/pair/BTC_JPY)
- [ ] BTC/JPY rate on Header
- [ ] Chat
- pair#show
- [ ] [Sample 1](https://cc.minkabu.jp/pair/BTC_JPY)
- [ ] [Sample 2](https://cc.minkabu.jp/pair/BNT_BTC)
- [ ] [Sample 3](https://cc.minkabu.jp/pair/BTC_USDT)
- API
- [ ] [pair#index (ticks in rate table)](https://cc.minkabu.jp/pair)
- Low-Level Cache
- [ ] TrendingArticlesService: `$ rake articles:update_trending_news`
- Sidekiq
- [ ] SitemapRefreshWorker: update a column from backend to trigger the worker