{%hackmd SybccZ6XD %} ###### tags: `paper` # Mish: A Self Regularized Non-Monotonic Activation Function $f(x) = xtanh(ln(1 + e^x))$ Problem: ReLU exists weaknesses. Whenever the input is negative, the gradient is 0. > Leaky ReLU [32], ELU[6], and SELU [23]. Swish [37] > and Mish in this paper The possible reason why this method is effective. > The landscape of Mish is smoother than ReLU. > ![](https://i.imgur.com/xKud1sW.png) How to generate the landscape? > The landscapes were generated by passing in the co-ordinates to a five-layered randomly initialized neural network which outputs the corresponding scalar magnitude The possible reason why this method is effective. > loss landscapes > ![](https://i.imgur.com/od3skvn.png) Why the accurcy of ReLU decreases sharply? > ![](https://i.imgur.com/KcmDLUt.png)