Article 6CN3R Activation functions and Iverson brackets

Activation functions and Iverson brackets

by
John
from John D. Cook on (#6CN3R)

Neural network activation functions transform the output of one layer of the neural net into the input for another layer. These functions are nonlinear because the universal approximation theorem, the theorem that basically says a two-layer neural net can approximate any function, requires these functions to be nonlinear.

heaviside_plot.png

Activation functions often have two-part definitions, defined one way for negative inputs and another way for positive inputs, and so they're ideal for Iverson notation. For example, the Heaviside function plotted above is defined to be

heaviside.svg

Kenneth Iverson's bracket notation, first developed for the APL programming language but adopted more widely, uses brackets around a Boolean expression to indicate the function that is 1 when the expression is true and 0 otherwise. With this notation, the Heaviside function can be written simply as

heaviside_iverson.svg

Iverson notation is fairly common, but not quite so common that I feel like I can use it without explanation. I find it very handy and would like to popularize it. The result of the post will give more examples.

ReLU

ReLU_plot.png

The popular ReLU (rectified linear unit) function is defined as

ReLU.svg

and with Iverson bracket notation as

ReLU_iverson.svg

The ReLU activation function is the identity function multiplied by the Heaviside function. It's not the best example of the value of bracket notation since it could be written simply as max(0, x). The next example is better.

ELU

ELU_plot.png

The ELU (exponential linear unit) is a variation on the ReLU that, unlike the ReLU, is differentiable at 0.

ELU.svg

The ELU can be described succinctly in bracket notation.

ELU_iverson.svg

PReLU

PReLU_plot.png

The PReLU (parametric rectified linear unit) depends on a small positive parameter a. This parameter must not equal 1, because then the function would be linear and the universal approximation theorem would not apply.

PReLU.svg

In Iverson notation:

PReLU_iverson2.svg

Related postsThe post Activation functions and Iverson brackets first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments