# lecture8_note
###### tags: `Lecture8`
## Deep learning hardware
### CPU:
Fewer cores,but each core is much faster and much more capable; great at sequential tasks
### GPU:
More cores,but each core is much slower and “dumber”; great for parallel tasks
### TPU:
Specialized hardware for deep learning


**reference**:https://zh.wikipedia.org/wiki/%E6%AF%8F%E7%A7%92%E6%B5%AE%E9%BB%9E%E9%81%8B%E7%AE%97%E6%AC%A1%E6%95%B8
### Example in GPU and CPU

GPU在算這種矩陣乘法,他會爆炸性的(blast out and parallel)算出所有元素,把他分割成很多core,再分別去算,效率一定高
但是對於CPU來說,因他是一個一個去算(sequential),所以會比GPU慢上許多
## Deep learning software
先上一個例子

你用numpy寫完
是不是還是得要自己寫backward pass
那是一件很崩潰的事情
功課也寫過
然後就是numpy不能在GPU上面跑
numpy is definitely CPU only
Question:Are there something "framework" can run on GPU,and let you write code in forward pass that looks similar to numpy,and lets you run automatically compute gradients?
這就是大部分的framework的目的
以下介紹第一個
### Pytorch
啊你要用就要裝
可以到[官網](https://pytorch.org/)上面裝一下

他在算forward pass的時候跟numpy幾乎是一樣的

而且你只要設一下參數
他就可以把你需要的所有gradient回傳!
神奇寶傑!
啊先講一下pytorch裡面有的一些功能屬性

### tensor(參code)
他其實就是numpy裡的array
### Autograd(參code)
1.Creating Tensors with requires_grad=True enables autograd
2.Operations on Tensors with requires_grad=True cause PyTorch to build a computational graph
```
import torch
device=torch.device("cpu")
N,D_in,H,D_out=64,1000,100,10
x=torch.randn(N,D_in,device=device)
y=torch.randn(N,D_out,device=device)
#=============================================
w1=torch.randn(D_in,H,device=device,requires_grad=True)
#Creating Tensors with requires_grad=True enables autograd
w2=torch.randn(H,D_out,device=device,requires_grad=True)
#=============================================
learning_rate=1e-6
for t in range(500):
# Forward pass looks exactly
# the same as before, but we
# don’t need to track
# intermediate values -
# PyTorch keeps track of
# them for us in the graph
h=x.mm(w1)
h_relu=h.clamp(min=0)
y_pred=h_relu.mm(w2)
loss=(y_pred-y).pow(2).sum()
#=============================================
# 原本的backwardpass 很複雜
# grad_y_pred=2.0*(y_pred-y)
# grad_w2=h_relu.t().mm(grad_y_pred)
# grad_h_relu=grad_y_pred.mm(w2.t())
# grad_h=grad_h_relu.clone()
# grad_h[h<0]=0
# grad_w1=x.t().mm(grad_h)
#=============================================
#但如果你用pytorch會簡單很多
loss.backward()
# Compute gradient of loss with respect to w1 and w2
#=============================================
with torch.no_grad():
w1-=learning_rate*w1.grad
w2-=learning_rate*w2.grad
w1.grad.zero_()
w2.grad.zero_()
```
### New autograd(參圖)
平常是用不太到啦
這個意思就是你可以自己去寫一個新的
Define your own autograd functions by writing forward and backward functions for Tensors
Define a helper function to make it easy to use the new function

### nn(參code)
就像tensoflow裡邊的keras一樣
恩恩這個package提供給你更高級的API使用
```
import torch
device=torch.device("cpu")
N,D_in,H,D_out=64,1000,100,10
x=torch.randn(N,D_in,device=device)
y=torch.randn(N,D_out,device=device)
#===============================================
# Define our model as a
# sequence of layers; each
# layer is an object that
# holds learnable weights
#===============================================
model=torch.nn.Sequential(
torch.nn.Linear(D_in,H),
torch.nn.ReLU(),
torch.nn.Linear(H,D_out)
)
#===============================================
learning_rate=1e-2
for t in range(500):
#===============================================
# Forward pass: feed data to
# model, and compute loss
y_pred=model(x)
loss=torch.nn.functional.mse_loss(y_pred,y)
# torch.nn.functional has useful
# helpers like loss functions
#reference=====>https://pytorch-cn.readthedocs.io/zh/latest/package_references/functional/#_1
#===============================================
loss.backward()
# Backward pass: compute
# gradient with respect to all
# model weights (they have
# requires_grad=True)
#===============================================
with torch.no_grad():
for param in model.parameters():
param-=learning_rate*param.grad
model.zero_grad()
# Make gradient step on
# each model parameter
# (with gradients disabled)
```
### optimizer(參code)
```
import torch
device=torch.device("cpu")
N,D_in,H,D_out=64,1000,100,10
x=torch.randn(N,D_in,device=device)
y=torch.randn(N,D_out,device=device)
model=torch.nn.Sequential(
torch.nn.Linear(D_in,H),
torch.nn.ReLU(),
torch.nn.Linear(H,D_out)
)
learning_rate=1e-4
#===============================================
# Use an optimizer for
# different update rules
optimizer=torch.optim.Adam(model.parameters(),
lr=learing_rate)
#===============================================
for t in range(500):
y_pred=model(x)
loss=torch.nn.functional.mse_loss(y_pred,y)
loss.backward()
#==============================================
# After computing gradients, use
# optimizer to update params
# and zero gradients
optimizer.step()
optimizer.zero_grad()
#==============================================
```
### New module(參code)
You can define your own Modules using autograd!
Modules can contain weights or other modules
```
import torch
#==============================================
#Define our whole model
# as a single Module
class TwoLayerNet(torch.nn.module):
def __init__(self,D_in,H,D_out):
super(TwoLayerNet,self).__init()
self.linear1=torch.nn.Linear(D_in,H)
self.linear2=torch.nn.Linear(H,D_out)
def forward(self,x):#Define forward pass using child modules
h_relu=self.linear1(x).clamp(min=0)
y_pred=self.linear2(h_relu)
return y_pred
#No need to define
# backward - autograd will
# handle it
#==============================================
N,D_in,H,D_out=64,1000,100,10
x=torch.randn(N,D_in,device=device)
y=torch.randn(N,D_out,device=device)
#==============================================
# Construct and train an
# instance of our model
model=torch.TwoLayerNet(D_in,H,D_out)
optimizer=torch.optim.SGD(model.parameters(),lr=1e-4)
learning_rate=1e-4
for t in range(500):
y_pred=model(x)
loss=torch.nn.functional.mse_loss(y_pred,y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
#==============================================
```
### Dataloader(參圖)


### pretrain model

### Visdom
是在pytorch中一個visualize的工具
reference:
https://github.com/facebookresearch/visdom
https://zhuanlan.zhihu.com/p/32025746
### static dynamic optimization
#### dynamic:

#### static:

ps:This note only focus on pytorch rather than tensoflow.