{%hackmd SybccZ6XD %}
###### tags: `paper`
# Mish: A Self Regularized Non-Monotonic Activation Function
$f(x) = xtanh(ln(1 + e^x))$
Problem: ReLU exists weaknesses. Whenever the input is negative, the gradient is 0.
> Leaky ReLU [32], ELU[6], and SELU [23]. Swish [37]
> and Mish in this paper
The possible reason why this method is effective.
> The landscape of Mish is smoother than ReLU.
> 
How to generate the landscape?
> The landscapes were generated by passing in the co-ordinates to a five-layered randomly initialized neural network which outputs the corresponding scalar magnitude
The possible reason why this method is effective.
> loss landscapes
> 
Why the accurcy of ReLU decreases sharply?
> 