lecture8_note - HackMD

# lecture8_note ###### tags: `Lecture8` ## Deep learning hardware ### CPU: Fewer cores,but each core is much faster and much more capable; great at sequential tasks ### GPU: More cores,but each core is much slower and “dumber”; great for parallel tasks ### TPU: Specialized hardware for deep learning ![](https://i.imgur.com/at84msH.png) ![](https://i.imgur.com/9YIFcBI.png) **reference**:https://zh.wikipedia.org/wiki/%E6%AF%8F%E7%A7%92%E6%B5%AE%E9%BB%9E%E9%81%8B%E7%AE%97%E6%AC%A1%E6%95%B8 ### Example in GPU and CPU ![](https://i.imgur.com/jWcB5Du.png) GPU在算這種矩陣乘法，他會爆炸性的（blast out and parallel）算出所有元素，把他分割成很多core，再分別去算，效率一定高但是對於CPU來說，因他是一個一個去算(sequential)，所以會比GPU慢上許多 ## Deep learning software 先上一個例子 ![](https://i.imgur.com/g8bEr4s.png) 你用numpy寫完是不是還是得要自己寫backward pass 那是一件很崩潰的事情功課也寫過然後就是numpy不能在GPU上面跑 numpy is definitely CPU only Question:Are there something "framework" can run on GPU,and let you write code in forward pass that looks similar to numpy,and lets you run automatically compute gradients? 這就是大部分的framework的目的以下介紹第一個 ### Pytorch 啊你要用就要裝可以到[官網](https://pytorch.org/)上面裝一下 ![](https://i.imgur.com/yDA9mIM.png) 他在算forward pass的時候跟numpy幾乎是一樣的 ![](https://i.imgur.com/5qR6U6z.png) 而且你只要設一下參數他就可以把你需要的所有gradient回傳！神奇寶傑！啊先講一下pytorch裡面有的一些功能屬性 ![](https://i.imgur.com/sU8vyJq.png) ### tensor(參code) 他其實就是numpy裡的array ### Autograd(參code) 1.Creating Tensors with requires_grad=True enables autograd 2.Operations on Tensors with requires_grad=True cause PyTorch to build a computational graph ``` import torch device=torch.device("cpu") N,D_in,H,D_out=64,1000,100,10 x=torch.randn(N,D_in,device=device) y=torch.randn(N,D_out,device=device) #============================================= w1=torch.randn(D_in,H,device=device,requires_grad=True) #Creating Tensors with requires_grad=True enables autograd w2=torch.randn(H,D_out,device=device,requires_grad=True) #============================================= learning_rate=1e-6 for t in range(500): # Forward pass looks exactly # the same as before, but we # don’t need to track # intermediate values - # PyTorch keeps track of # them for us in the graph h=x.mm(w1) h_relu=h.clamp(min=0) y_pred=h_relu.mm(w2) loss=(y_pred-y).pow(2).sum() #============================================= # 原本的backwardpass 很複雜 # grad_y_pred=2.0*(y_pred-y) # grad_w2=h_relu.t().mm(grad_y_pred) # grad_h_relu=grad_y_pred.mm(w2.t()) # grad_h=grad_h_relu.clone() # grad_h[h<0]=0 # grad_w1=x.t().mm(grad_h) #============================================= #但如果你用pytorch會簡單很多 loss.backward() # Compute gradient of loss with respect to w1 and w2 #============================================= with torch.no_grad(): w1-=learning_rate*w1.grad w2-=learning_rate*w2.grad w1.grad.zero_() w2.grad.zero_() ``` ### New autograd（參圖）平常是用不太到啦這個意思就是你可以自己去寫一個新的 Define your own autograd functions by writing forward and backward functions for Tensors Define a helper function to make it easy to use the new function ![](https://i.imgur.com/Y5hc0Vh.jpg) ### nn（參code）就像tensoflow裡邊的keras一樣恩恩這個package提供給你更高級的API使用 ``` import torch device=torch.device("cpu") N,D_in,H,D_out=64,1000,100,10 x=torch.randn(N,D_in,device=device) y=torch.randn(N,D_out,device=device) #=============================================== # Define our model as a # sequence of layers; each # layer is an object that # holds learnable weights #=============================================== model=torch.nn.Sequential( torch.nn.Linear(D_in,H), torch.nn.ReLU(), torch.nn.Linear(H,D_out) ) #=============================================== learning_rate=1e-2 for t in range(500): #=============================================== # Forward pass: feed data to # model, and compute loss y_pred=model(x) loss=torch.nn.functional.mse_loss(y_pred,y) # torch.nn.functional has useful # helpers like loss functions #reference=====>https://pytorch-cn.readthedocs.io/zh/latest/package_references/functional/#_1 #=============================================== loss.backward() # Backward pass: compute # gradient with respect to all # model weights (they have # requires_grad=True) #=============================================== with torch.no_grad(): for param in model.parameters(): param-=learning_rate*param.grad model.zero_grad() # Make gradient step on # each model parameter # (with gradients disabled) ``` ### optimizer(參code) ``` import torch device=torch.device("cpu") N,D_in,H,D_out=64,1000,100,10 x=torch.randn(N,D_in,device=device) y=torch.randn(N,D_out,device=device) model=torch.nn.Sequential( torch.nn.Linear(D_in,H), torch.nn.ReLU(), torch.nn.Linear(H,D_out) ) learning_rate=1e-4 #=============================================== # Use an optimizer for # different update rules optimizer=torch.optim.Adam(model.parameters(), lr=learing_rate) #=============================================== for t in range(500): y_pred=model(x) loss=torch.nn.functional.mse_loss(y_pred,y) loss.backward() #============================================== # After computing gradients, use # optimizer to update params # and zero gradients optimizer.step() optimizer.zero_grad() #============================================== ``` ### New module(參code) You can define your own Modules using autograd! Modules can contain weights or other modules ``` import torch #============================================== #Define our whole model # as a single Module class TwoLayerNet(torch.nn.module): def __init__(self,D_in,H,D_out): super(TwoLayerNet,self).__init() self.linear1=torch.nn.Linear(D_in,H) self.linear2=torch.nn.Linear(H,D_out) def forward(self,x):#Define forward pass using child modules h_relu=self.linear1(x).clamp(min=0) y_pred=self.linear2(h_relu) return y_pred #No need to define # backward - autograd will # handle it #============================================== N,D_in,H,D_out=64,1000,100,10 x=torch.randn(N,D_in,device=device) y=torch.randn(N,D_out,device=device) #============================================== # Construct and train an # instance of our model model=torch.TwoLayerNet(D_in,H,D_out) optimizer=torch.optim.SGD(model.parameters(),lr=1e-4) learning_rate=1e-4 for t in range(500): y_pred=model(x) loss=torch.nn.functional.mse_loss(y_pred,y) loss.backward() optimizer.step() optimizer.zero_grad() #============================================== ``` ### Dataloader（參圖） ![](https://i.imgur.com/IOfpSuG.png) ![](https://i.imgur.com/e8N1emL.png) ### pretrain model ![](https://i.imgur.com/8IYKDQc.png) ### Visdom 是在pytorch中一個visualize的工具 reference: https://github.com/facebookresearch/visdom https://zhuanlan.zhihu.com/p/32025746 ### static dynamic optimization #### dynamic: ![](https://i.imgur.com/wEYCyxR.png) #### static: ![](https://i.imgur.com/gmcgLSC.jpg) ps:This note only focus on pytorch rather than tensoflow.