A simpler GELU activation function approximation

John

from John D. Cook on 2025-03-06 20:59 (#6VR7C)

The GELU (Gaussian Error Linear Units) activation function was proposed in [1]. This function is x (x) where is the CDF of a standard normal random variable. As you might guess, the motivation for the function involves probability. See [1] for details.

The GELU function is not too far from the more familiar ReLU, but it has advantages that we won't get into here. In this post I wanted to look at approximations to the GELU function.

Since an implementation of is not always available, the authors provide the following approximation:

I wrote about a similar but simpler approximation for a while back, and multiplying by x gives the approximation

The approximation in [1] is more accurate, though the difference between the exact values of GELU(x) and those of the simpler approximation are hard to see in a plot.

Since model weights are not usually needed to high precision, the simpler approximation may be indistinguishable in practice from the more accurate approximation.

[1] Dan Hendrycks, Kevin Gimpel. Gaussian Error Linear Units (GELUs). Available on arXiv.

The post A simpler GELU activation function approximation first appeared on John D. Cook.

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title	John D. Cook
Feed Link	https://www.johndcook.com/blog