mygrad.nnet.activations.glu#

mygrad.nnet.activations.glu(x: ArrayLike, axis: int = -1, *, constant: bool | None = None) → Tensor[source]#

Returns the Gated Linear Unit A * σ(B), where A and B are split from x.

Parameters:

xArrayLike: The input.
axisint, optional (default=-1): The axis along which to split the input in half and apply the GLU.
constantOptional[bool]: If True, the returned tensor is a constant (it does not back-propagate a gradient).

Returns:

mygrad.Tensor: The result of applying the Gated Linear Unit elementwise to the input.

Notes

The Gated Linear Unit was proposed in the paper: “Language Modeling with Gated Convolutional Networks” Yann Dauphin, Angela Fan, Michael Auli, David Grangier

available at https://arxiv.org/abs/1612.08083

The GLU operation splits the input x in half along axis, storing the first half in A and the second in B. The return value is then A ⊙ σ(B), where ⊙ is elementwise multiplication and σ is the sigmoid function.

Examples

>>> import mygrad as mg
>>> from mygrad.nnet.activations import glu
>>> x = mg.arange(-5., 5.)
>>> x
Tensor([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])
>>> y = glu(x); y
Tensor([-2.5       , -2.92423431, -2.64239123, -1.90514825, -0.98201379])
>>> y.backward()
>>> x.grad
array([ 0,  0,  0,  0,  0, -1,  0,  0,  0,  0])