question - HackMD

###### tags: `Lecture6` # question 1.數據擴張 dropout… note3 Remember to turn off dropout/augmentations. When performing gradient check, remember to turn off any non-deterministic effects in the network, such as dropout, random data augmentations, etc. Otherwise these can clearly introduce huge errors when estimating the numerical gradient. The downside of turning off these effects is that you wouldn’t be gradient checking them (e.g. it might be that dropout isn’t backpropagated correctly). **Therefore, a better solution might be to force a particular random seed before evaluating both f(x+h) and f(x−h), and when evaluating the analytic gradient**. 可以解釋一下這段話到底想要說什麼嗎，為什麼dropout跟random seed有啥關係，，謝謝你 2.我們如何去辨別一筆資料經過n個layers後，他所呈現的分布會是一個好的分布? :thinking_face: :thinking_face: 3.![](https://i.imgur.com/wmLrZnu.png) 目前介紹的Activation Functions在負數部分大多為0或負的，有沒有可能會用到如上圖的這種Activation Functions? 還是在資料前處理時就會做適當的處置，所以不會用到這種的? --- $$ Var(W_{i} X_{i})=E(W_{i}^2 X_{i}^2) - E(W_{i} X{i})^2 \\= E(W_{i}^2) E(X_{i}^2) - E(W_{i})^2 E(X_{i})^2 \\= E(X_{i})^2[E(W_{i}^2)-E(W_{i})^2] + E(W_{i})^2[E(X_{i}^2) - E(Ｘ_{i})^2] \\+[E(W{i}^2)E(X_{i}^2) - E(W{i})^2E(X_{i}^2) - E(W{i}^2)E(X_{i})^2 + E(W{i})^2E(X_{i})^2] \\= E(X_{i})^2Var(W_{i}) + E(W_{i})^2Var(X_{i}) + Var(W_{i})Var(X_{i}) $$ --- ![](https://i.imgur.com/gCjUbFc.png) ![](https://i.imgur.com/ugZornk.png) ![](https://i.imgur.com/IBGUJEl.png) ![](https://i.imgur.com/QPAQcfc.png)