Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
GELU¶
GELU is the best activation function currently (at least in NLP).
,
where is the cumulative distribution function of the standard normal distribution.
ReLU¶
ELU¶
Swish¶
, where is the sigmoid function.