mygrad.nnet.activations.logsoftmax#

mygrad.nnet.activations.logsoftmax(x: ArrayLike, axis: None | int | Tuple[int, ...] = -1, *, constant: bool | None = None) → Tensor[source]#

Applies the log-softmax activation function:

f(x) = log ( exp(x) / sum( exp(x) ) )

Computes the log-softmax over one or more axes of an ND-tensor.

Parameters:

xArrayLike
axisUnion[None, int, Tuple[int, …]], optional (default=-1): The axis/axes over which to compute the log-softmax. By default, the log-softmax is computed over the trailing axis.
constantconstantOptional[bool]: If True, the returned tensor is a constant (it does not back-propagate a gradient)

Returns:

log_softmaxmygrad.Tensor: Tensor with same shape as x

Notes

\(N\) is the number of samples in the batch.
\(C\) is the number of possible classes for which scores are provided.

This implements a numerically-stable version of log-softmax, compared to the naive implementation using mygrad.log, mygrad.exp, and mygrad.sum.

Given the shape-\((N, C)\) tensor of scores, x, the softmax classification probabilities are computed. That is, the score for class-\(k\) of a given datum (\(s_{k}\)) is normalized using the ‘softmax’ transformation:

\[p_{k} = \log{\frac{e^{s_k}}{\sum_{i=1}^{C}{e^{s_i}}}}\]

Examples

>>> import mygrad as mg
>>> from mygrad.nnet import logsoftmax
>>> x = mg.Tensor([[  2.,   2.,    2.],
...                [2E50, 2E50,  1E50]])
>>> logsoftmax(x)
Tensor([[-1.09861229e+00, -1.09861229e+00, -1.09861229e+00],
        [ 0.00000000e+00,  0.00000000e+00, -1.00000000e+50]])