# 4. RNN 及混搭模型
**testing gpu**
```python=
import tensorflow as tf
print("Num GPUs Available: ", tf.test.gpu_device_name())
```
GPU Group 1:
1. Bob [Work Space](http://172.18.38.12:1002/?token=282997a54104b90778c273cc0844ab14b9198d066a2aa1dc)
2. Eunice [Work Space](http://172.18.38.12:1004/?token=07a307dece1eb5ca15061ed9bf66e88dd376dad8a3a30e52)
3. Mia [Work Space](http://172.18.38.12:1002/?token=4582bfc88f15179027c71543672099cd843b6f9a634e7bc0)
4. YANG [Work Space](http://172.18.38.12:1001/?token=54adfb3463c1e30f45777f4bf1bd0ccfb3a13e008f3c82cb)
GPU Group 2:
1. RAY [Work Space](http://172.18.38.13:1003/?token=161759abc2ea0d978e74845e59e2e0cc17c01cfd171835e3)
2. EEer [Work Space](http://172.18.38.13:1001/?token=6940117f04216b4dd263bf18905cf288fe7501ed0fdd45a7)
3. 博文 [Work Space](http://172.18.38.13:1004/?token=d4dcb106bf46533af3434b1bfbb852e9b188857c3eba87bf)
4. Andrea [Work Space](http://172.18.38.13:1002/?token=1acd5b88de9c28240c3afbd562d6dde4925f9ab2f9741687)
- AUG [Work Space](http://172.18.38.14:1001/?token=8651274b94785055710a9fafcf562de4631f5439cdc20014)
## Sec-01 Researcher Ethic
http://covid-19.nchc.org.tw/

## Task 1: MNIST RNN (no ok, slow)
model: https://github.com/keras-team/keras/blob/master/examples/mnist_irnn.py
fix code: https://hackmd.io/G1z7JV_wQj2QXgVegO9wSQ#Task-1-MNIST-MLP :point_left:

fix

trainable, but slow
## Task 2: IMDB LSTM
model: https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py
install pandas first
```python
!pip install pandas
```
change model
```python=
history = model.fit(x_train[:1000], y_train[:1000],
batch_size=batch_size,
epochs=5,
validation_data=(x_test[:1000], y_test[:1000]))
import pandas as pd
%matplotlib inline
df = pd.DataFrame(history.history)
df[['acc', 'val_acc']].plot()
```
### discuss
train_set = 1000, test_set = 1000

train_set = 2000, test_set = 1000

understanding RNN, LSTM, GRU
{%youtube UNmqTiOnRfg%}



#### comparing different recurrent cells
**LSTM train_set: 5000, test_set:1000**

**GRU train_set: 5000, test_set:1000**

**SimpleRNN train_set: 5000, test_set:1000**

#### comparing multi-recurrent layers
```python
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 64)) # same with below
model.add(GRU(64, dropout=0.2, recurrent_dropout=0.2, return_sequences=True)) # return_sequences=True is must
model.add(GRU(64, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
```
two layers of GRU, total 128

three layers of GRU, slow

#### faster model using CuDNNGRU or CuDNNLSTM
```python
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 64))
model.add(CuDNNGRU(64, return_sequences=True))
model.add(CuDNNGRU(64))
model.add(Dense(1, activation='sigmoid'))
```

## Task 3: using Johns Hopkins's COVID-19 data
- dataset from [CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19)
get dataset

using command:
```
git clone https://github.com/CSSEGISandData/COVID-19
```

```python=
df = pd.read_csv("./COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")
df_by_country = df.groupby(['Country/Region']).agg(['sum'])
#df_by_country.sort_values(by=('3/27/20', 'sum'), ascending=False)
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return pd.Series(diff)
def get_country(name="Taiwan*", last_days=21):
df_tmp = df_by_country.loc[name][2:]
df_tmp_diff = difference(df_tmp)
df_tmp_diff.index = pd.to_datetime([ x[0] for x in df_tmp.index])[1:]
return df_tmp_diff[-1*last_days:]
def plot_contry(name):
get_country(name).plot(title=name)
plot_contry("China")
```
**caclulated correlation coefficients**
https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html
* get a value

```python=
import numpy as np
def corelate(cntr1="United Kingdom", cntr2="Taiwan*"):
print("%s and %s correlation coefficients %.4f"%(cntr1, cntr2,
np.corrcoef(get_country(cntr1), get_country(cntr2))[1,0]))
corelate("Italy")
```
* calculate corelation between 176 countries
```python=
import numpy as np
def corelate(cntr1="United Kingdom", cntr2="Taiwan*"):
return np.corrcoef(get_country(cntr1), get_country(cntr2))[1,0]
# print("%s and %s correlation coefficients %.4f"%(cntr1, cntr2,
# np.corrcoef(get_country(cntr1), get_country(cntr2))[1,0]))
corelate("US", "United Kingdom")
results = []
for idx in range(len(df_by_country.index)):
for idy in range(idx, len(df_by_country.index)):
if not idx == idy:
cntr1 = df_by_country.index[idx]
cntr2 = df_by_country.index[idy]
results.append((cntr1, cntr2, corelate(cntr1, cntr2)))
cntrs = []
for (cntr1, cntr2, corrlet) in results:
if corrlet > 0.8:
cntrs.append(cntr1)
cntrs.append(cntr2)
# print(cntr1, cntr2, corrlet)
# most corelated countries
print(len(set(cntrs)), ", ".join(set(cntrs)) )
# most not corelated countries
print(", ".join(set(df_by_country.index) - set(cntrs)))
```
### visualize data
```python=
import json
links = []
for (cntr1, cntr2, corelat) in results:
links.append({"source": cntr1, "target":cntr2, "value":corelat*10})
nodes = []
for cntr in df_by_country.index:
nodes.append({"id":cntr, "group":1})
data={"nodes":nodes, "links":links}
open("covid-19.json", 'w').write(json.dumps(data, sort_keys=True, indent=4))
```
* [pairwise coreelations](https://seaborn.pydata.org/examples/many_pairwise_correlations.html)
```python=
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
# d = pd.DataFrame(data=rs.normal(size=(100, 26)),
# columns=list(ascii_letters[26:]))
all_cntr_data = {}
for x in df_by_country.index:
all_cntr_data[x] = get_country(x)
d = pd.DataFrame(all_cntr_data)
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=np.bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(21, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
# sns.heatmap(corr, mask=mask, cmap=cmap, vmax=1, center=.5,
# square=True, linewidths=.5, cbar_kws={"shrink": .5})
sns.clustermap(corr, mask=mask, cmap=cmap, vmax=1, center=.8)
```


### predict tomorrow number in Taiwan


:point_right: [Download Program](https://drive.google.com/file/d/1NDn7m-T8tedyTFUp9YMsor8ln6fdLrGJ/view?usp=sharing)
:::info
依據最近30天的台灣感染人數資料進行預測,模型預測結果為 0.02214,即明天會在台灣的感染人數為相等或小於今日的感染人數,顯示台灣地區的疫情已經逐步放緩。
本研究模型摘用 Johns Hopkins's COVID-19 data [CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19) 資料,除進行初步的統計分析,供取過去65日裡的各國感染人數分佈相關性高於0.93的國家資料進行建模,合計此類國家共計53國。將上述資料依30日的感染人數對應是否會感染人數增加為模型訓練結果,進行雙向 GRU 類神經模基訓練。在 256個雙向 GRU 類神經元架構中,經過20次的訓練階段,我們的得到了0.9336 測試資料準確度。各次訓練資料及測試資料的準確度如下圖:

:::