Posts Tagged ‘neural-networks’
ReLu or not ReLu?
The ReLu activation function is very popular in neural networks, but many more activation function have been proposed after its introduction. This paper analyzes them from the theoretical point of view:
Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. Shijun Zhang, Jianfeng Lu, Hongkai Zhao
Journal of Machine Learning Research 25 (2024) 1-39
There is a real zoo of activation functions: LeakyReLU, ReLU2, ELU, CELU, SELU, Softplus, GELU, SiLU, Swish, Mish, Sigmoid, Tanh, Arctan, Softsign, dSiLU, SRS… The paper defines some families of activation functions called A1,k ,A2 and A3 that are supersets of all those functions. These sets include also all the functions you can obtain doing translations or reflections. Surprisingly it has been possible to demonstrate that you can approximate as much as you want the output of a ReLU network with any of the listed activation functions, provided that you can afford building a network 3 times wider and 2 times deeper. More formally

Scared? Me too, but at least you know that you can play a bit with the activation and there is a bound to the damage you can make. Moreover if you pick just functions very similar to ReLU, (ELU, CELU, SELU, Softplus, GELU, SiLU, Swish, and Mish) the boundary is much more tight.