Article 6VR7C A simpler GELU activation function approximation

A simpler GELU activation function approximation

by
John
from John D. Cook on (#6VR7C)

The GELU (Gaussian Error Linear Units) activation function was proposed in [1]. This function is x (x) where is the CDF of a standard normal random variable. As you might guess, the motivation for the function involves probability. See [1] for details.

gelu_relu.png

The GELU function is not too far from the more familiar ReLU, but it has advantages that we won't get into here. In this post I wanted to look at approximations to the GELU function.

Since an implementation of is not always available, the authors provide the following approximation:

gelu_tanh.svg

I wrote about a similar but simpler approximation for a while back, and multiplying by x gives the approximation

gelu_tanh2.svg

The approximation in [1] is more accurate, though the difference between the exact values of GELU(x) and those of the simpler approximation are hard to see in a plot.

gelu_approx.png

Since model weights are not usually needed to high precision, the simpler approximation may be indistinguishable in practice from the more accurate approximation.

Related posts

[1] Dan Hendrycks, Kevin Gimpel. Gaussian Error Linear Units (GELUs). Available on arXiv.

The post A simpler GELU activation function approximation first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments