# TiDB 切換SOP
## PD 設定
### Step 1 : 調整PD Leader權重讓切換測高於原side
pd-ctl -u <pd-ip>:<port> member leader_priority <member-name> <value>
### Step 2 : 轉換Leader
pd-ctl -u <pd-ip>:<port> member leader transfer <member-name>
### Step 3 : 刪除PD讓切換的Side為多數
pd-ctl -u <pd-ip>:<port> member delete <member-name>
## KV設定
### Check Region State
方法1. 用<pd-ctl>
pd-ctl -u <pd-ip>:<port> region check [miss-peer | down-peer | pending-peer]
方法2. 用grafana

以上 Region 信息說明如下:
miss-peer-region-count:缺副本的 Region 數量,不會一直大於 0。
extra-peer-region-count:多副本的 Region 數量,調度過程中會產生。
empty-region-count:空 Region 的數量,一般是 TRUNCATE TABLE/DROP TABLE 語句導致。如果數量較多,可以考慮開啟跨表 Region merge。
pending-peer-region-count:Raft log 落後的 Region 數量。由於調度產生少量的 pending peer 是正常的,但是如果 pending peer 的數量持續(超過 30 分鐘)很高,可能存在問題。
down-peer-region-count:Raft leader 上報有不響應 peer 的 Region 數量。
offline-peer-region-count:peer 下線過程中的 Region 數量。
原則上來說,該監控面板偶爾有數據是符合預期的。但長期有數據,需要排查是否存在問題。
:::info
正常來說 miss-peer | down-peer | pending-peer 都為零
:::
### 切換DR
#### 匯出現有KV放置規則
```
pd-ctl -u <pd-ip>:<port> config placement-rules rule-bundle load --out="rules.json"
```
#### 修改rules.json <Example>
:::danger
請根據實際狀況更改
:::
原始
```
[
{
"group_id": "pd",
"group_index": 0,
"group_override": false,
"rules": [
{
"group_id": "pd",
"id": "voters",
"start_key": "",
"end_key": "",
"role": "voter",
"count": 2,
"location_labels": ["dc", "host"],
"label_constraints": [{"key": "dc", "op": "in", "values": ["dc1"]}]
},
{
"group_id": "pd",
"id": "voterswoleader",
"start_key": "",
"end_key": "",
"role": "follower",
"count": 1,
"location_labels": ["dc", "host"],
"label_constraints": [{"key": "dc", "op": "in", "values": ["dc2"]}]
},
{
"group_id": "pd",
"id": "learners",
"start_key": "",
"end_key": "",
"role": "learner",
"count": 1,
"location_labels": ["dc", "host"],
"label_constraints": [{"key": "dc", "op": "in", "values": ["dc2"]}]
}
]
}
]
```
修改後
```
[
{
"group_id": "pd",
"group_index": 0,
"group_override": false,
"rules": [
{
"group_id": "pd",
"id": "voters",
"start_key": "",
"end_key": "",
"role": "voter",
"count": 2,
"location_labels": ["dc", "host"],
"label_constraints": [{"key": "dc", "op": "in", "values": ["dc2"]}]
},
{
"group_id": "pd",
"id": "voterswoleader",
"start_key": "",
"end_key": "",
"role": "follower",
"count": 1,
"location_labels": ["dc", "host"],
"label_constraints": [{"key": "dc", "op": "in", "values": ["dc1"]}]
},
{
"group_id": "pd",
"id": "learners",
"start_key": "",
"end_key": "",
"role": "learner",
"count": 1,
"location_labels": ["dc", "host"],
"label_constraints": [{"key": "dc", "op": "in", "values": ["dc1"]}]
}
]
}
]
```
#### 載入新配置
```
pd-ctl -u <pd-ip>:<port> config placement-rules rule-bundle save --in="rules.json"
```
#### 切換dr配置
```
pd-ctl -u <pd-ip>:<port> config set replication-mode dr-auto-sync primary dc2
pd-ctl -u <pd-ip>:<port> config set replication-mode dr-auto-sync dr dc1
pd-ctl -u <pd-ip>:<port> config set replication-mode dr-auto-sync config set replication-mode dr-auto-sync dr-replicas 0
```
#### 確認狀態
```
curl http://<pd-ip>:<port>/pd/api/v1/replication_mode/status
```
## DR 切換
### 恢復PD <方法一>
#### Step 1 取得 ClusterID
```
cat {{/path/to}}/pd.log | grep "init cluster id"
```
#### Step 2 取得 最大Alloc ID
##### 方法一: 從Grafana Dashboard
##### 方法二: PD leader LOG
```
cat {{/path/to}}/pd*.log | grep "idAllocator allocates a new id" | awk -F'=' '{print $2}' | awk -F']' '{print $1}' | sort -r -n | head -n 1
```
#### Step 3 恢復
```
./pd-recover -endpoints http://127.0.0.1:2379 -cluster-id <cluster_ID> -alloc-id <mac_alloc_id>
```
### 恢復PD <方法二>
找一台存活的pd-server 其餘關閉
透過存活的 pd-server 新增啟動參數
--force-new-cluster
### 刪除故障節點
tikv-ctl --db /path/to/tikv-data/db unsafe-recover remove-fail-stores -s <s1,s2> --all-regions