mygrad.nnet.activations.softmax#
- mygrad.nnet.activations.softmax(x: ArrayLike, axis: None | int | Tuple[int, ...] = -1, *, constant: bool | None = None) Tensor [source]#
Applies the softmax activation function:
f(x) = exp(x) / sum( exp(x) )
Computes the softmax over one or more axes of an ND-tensor.
- Parameters:
- xarray_like
- axisUnion[None, int, Tuple[int, …]], optional (default=-1)
The axis/axes over which to compute the softmax. By default, the softmax is computed over the trailing axis.
- constantbool, optional(default=False)
If
True
, the returned tensor is a constant (it does not back-propagate a gradient)
- Returns:
- mygrad.Tensor
Notes
\(N\) is the number of samples in the batch.
\(C\) is the number of possible classes for which scores are provided.
This implements a numerically-stable version of softmax, however log-softmax is still the more numerically stable activation function.
Given the shape-\((N, C)\) tensor of scores,
x
, the softmax classification probabilities are computed. That is, the score for class-\(k\) of a given datum (\(s_{k}\)) is normalized using the ‘softmax’ transformation:\[p_{k} = \frac{e^{s_k}}{\sum_{i=1}^{C}{e^{s_i}}}\]Examples
>>> import mygrad as mg >>> from mygrad.nnet import softmax >>> x = mg.Tensor([[ 2., 2., 2.], ... [2E50, 2E50, 1E50]]) >>> softmax(x) Tensor([[0.33333333, 0.33333333, 0.33333333], [0.5 , 0.5 , 0. ]])