The swish function is a family of mathematical function defined as follows:
where β {\displaystyle \beta } can be constant (usually set to 1) or trainable.
The swish family was designed to smoothly interpolate between a linear function and the ReLU function.
When considering positive values, Swish is a particular case of doubly parameterized sigmoid shrinkage function defined in : Eq 3 . Variants of the swish function include Mish.