mygrad.nnet.activations.glu#

mygrad.nnet.activations.glu(x: ArrayLike, axis: int = -1, *, constant: bool | None = None) Tensor[source]#

Returns the Gated Linear Unit A * σ(B), where A and B are split from x.

Parameters:
xArrayLike

The input.

axisint, optional (default=-1)

The axis along which to split the input in half and apply the GLU.

constantOptional[bool]

If True, the returned tensor is a constant (it does not back-propagate a gradient).

Returns:
mygrad.Tensor

The result of applying the Gated Linear Unit elementwise to the input.

Notes

The Gated Linear Unit was proposed in the paper

“Language Modeling with Gated Convolutional Networks” Yann Dauphin, Angela Fan, Michael Auli, David Grangier

available at https://arxiv.org/abs/1612.08083

The GLU operation splits the input x in half along axis, storing the first half in A and the second in B. The return value is then A ⊙ σ(B), where ⊙ is elementwise multiplication and σ is the sigmoid function.

Examples

>>> import mygrad as mg
>>> from mygrad.nnet.activations import glu
>>> x = mg.arange(-5., 5.)
>>> x
Tensor([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])
>>> y = glu(x); y
Tensor([-2.5       , -2.92423431, -2.64239123, -1.90514825, -0.98201379])
>>> y.backward()
>>> x.grad
array([ 0,  0,  0,  0,  0, -1,  0,  0,  0,  0])