mygrad.nnet.activations.glu#
- mygrad.nnet.activations.glu(x: ArrayLike, axis: int = -1, *, constant: bool | None = None) Tensor [source]#
Returns the Gated Linear Unit A * σ(B), where A and B are split from x.
- Parameters:
- xArrayLike
The input.
- axisint, optional (default=-1)
The axis along which to split the input in half and apply the GLU.
- constantOptional[bool]
If
True
, the returned tensor is a constant (it does not back-propagate a gradient).
- Returns:
- mygrad.Tensor
The result of applying the Gated Linear Unit elementwise to the input.
Notes
- The Gated Linear Unit was proposed in the paper
“Language Modeling with Gated Convolutional Networks” Yann Dauphin, Angela Fan, Michael Auli, David Grangier
available at https://arxiv.org/abs/1612.08083
The GLU operation splits the input x in half along axis, storing the first half in A and the second in B. The return value is then A ⊙ σ(B), where ⊙ is elementwise multiplication and σ is the sigmoid function.
Examples
>>> import mygrad as mg >>> from mygrad.nnet.activations import glu >>> x = mg.arange(-5., 5.) >>> x Tensor([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4.]) >>> y = glu(x); y Tensor([-2.5 , -2.92423431, -2.64239123, -1.90514825, -0.98201379]) >>> y.backward() >>> x.grad array([ 0, 0, 0, 0, 0, -1, 0, 0, 0, 0])