mygrad.nnet.layers.gru#
- mygrad.nnet.layers.gru(X, Uz, Wz, bz, Ur, Wr, br, Uh, Wh, bh, s0=None, bp_lim=None, dropout=0.0, constant=None)[source]#
Performs a forward pass of sequential data through a Gated Recurrent Unit layer, returning the ‘hidden-descriptors’ arrived at by utilizing the trainable parameters as follows:
Z_{t} = sigmoid(X_{t} Uz + S_{t-1} Wz + bz) R_{t} = sigmoid(X_{t} Ur + S_{t-1} Wr + br) H_{t} = tanh(X_{t} Uh + (R{t} * S_{t-1}) Wh + bh) S_{t} = (1 - Z{t}) * H{t} + Z{t} * S_{t-1}
- Parameters:
- Xarray_like, shape=(T, N, C)
The sequential data to be passed forward.
- Uzarray_like, shape=(C, D)
The weights used to map sequential data to its hidden-descriptor representation
- Wzarray_like, shape=(D, D)
The weights used to map a hidden-descriptor to a hidden-descriptor.
- bzarray_like, shape=(D,)
The biases used to scale a hidden-descriptor.
- Urarray_like, shape=(C, D)
The weights used to map sequential data to its hidden-descriptor representation
- Wrarray_like, shape=(D, D)
The weights used to map a hidden-descriptor to a hidden-descriptor.
- brarray_like, shape=(D,)
The biases used to scale a hidden-descriptor.
- Uharray_like, shape=(C, D)
The weights used to map sequential data to its hidden-descriptor representation
- Wharray_like, shape=(D, D)
The weights used to map a hidden-descriptor to a hidden-descriptor.
- bharray_like, shape=(D,)
The biases used to scale a hidden-descriptor.
- s0Optional[array_like], shape=(N, D)
The ‘seed’ hidden descriptors to feed into the RNN. If None, a Tensor of zeros of shape (N, D) is created.
- bp_limOptional[int]
This feature is experimental and is currently untested. The (non-zero) limit of the depth of back propagation through time to be performed. If None back propagation is passed back through the entire sequence.
E.g. bp_lim=3 will propagate gradients only up to 3 steps backward through the recursive sequence.
- dropoutfloat (default=0.), 0 <= dropout < 1
If non-zero, the dropout scheme described in [1] is applied. See Notes for more details.
- constantbool, optional (default=False)
If True, the resulting Tensor is a constant.
- Returns:
- mygrad.Tensor, shape=(T+1, N, D)
The sequence of ‘hidden-descriptors’ produced by the forward pass of the RNN.
Notes
\(T\) : Sequence length
\(N\) : Batch size
\(C\) : Length of single datum
\(D\) : Length of ‘hidden’ descriptor
The GRU system of equations is given by:
\[ \begin{align}\begin{aligned}Z_{t} = \sigma (X_{t} U_z + S_{t-1} Wz + bz)\\R_{t} = \sigma (X_{t} U_r + S_{t-1} Wr + br)\\H_{t} = tanh(X_{t} U_h + (R_{t} * S_{t-1}) W_h + b_h)\\S_{t} = (1 - Z_{t}) * H_{t} + Z_{t} * S_{t-1}\end{aligned}\end{align} \]Following the dropout scheme specified in [1], the hidden-hidden weights (Wz/Wr/Wh) randomly have their weights dropped prior to forward/back-prop. The input connections (via Uz/Ur/Uh) have variational dropout ([2]) applied to them with a common dropout mask across all t. That is three static dropout masks, each with shape-(N,D), are applied to
\[ \begin{align}\begin{aligned}X_{t} U_z\\X_{t} U_r\\X_{t} U_h\end{aligned}\end{align} \]respectively, for all \(t\).
References