Rectifier Nonlinearities Improve Neural Network Acoustic Models

{%hackmd SybccZ6XD %} ###### tags: `paper` # Rectifier Nonlinearities Improve Neural Network Acoustic Models ## Rectifier Nonlinearities Problem: sigmoidal DNNs can suffer from the **vanishing gradient problem** > When the input is a large number, the output is saturation(1 or -1). Problem: ReLU avoid the saturation when input a positive number > The gradient is 0 whenever the input is negative. Why ReLU let the negative inputs equal to zero? > That is kind of denoise. Solve: the saturation of ReLU problem. > leaky ReLU The comparison between activation function. > ![](https://i.imgur.com/8p3L570.png) The formula of leaky ReLU ![](https://i.imgur.com/4Uaf3Ay.png) ## Result Speech recognition task > ![](https://i.imgur.com/Vf8Mnte.png) Empirical activation probability > The activation probability of last hidden layer in the 10,000 input samples. > ![](https://i.imgur.com/R4QUY4N.png)