Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Activation Functions in Neural Network

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

GELU

GELU is the best activation function currently (at least in NLP).

GELU(x)==xΦ(x)GELU(x) == x \Phi(x)

,

where Φ(x)\Phi(x) is the cumulative distribution function of the standard normal distribution.

ReLU

ELU

Swish

f(x)=xσ˙(x)f(x) = x \dot \sigma(x)

, where σ(x)\sigma(x) is the sigmoid function.

Sigmoid

References

https://mp.weixin.qq.com/s/LEPalstOc15CX6fuqMRJ8Q