mygrad.nnet.losses.softmax_crossentropy#
- mygrad.nnet.losses.softmax_crossentropy(x: ArrayLike, y_true: ArrayLike, *, constant: bool | None = None) Tensor [source]#
Given the classification scores of C classes for N pieces of data,
computes the NxC softmax classification probabilities. The cross entropy is then computed by using the true classification labels.
log-softmax is used for improved numerical stability.
- Parameters:
- xArrayLike, shape=(N, C)
The C class scores for each of the N pieces of data.
- y_trueArrayLike, shape=(N,)
The correct class-indices, in [0, C), for each datum.
- constantbool, optional(default=False)
If
True
, the returned tensor is a constant (it does not back-propagate a gradient)
- Returns:
- The average softmax loss
- Raises:
- ValueError
Bad dimensionalities for
x
ory_true
Notes
\(N\) is the number of samples in the batch.
\(C\) is the number of possible classes for which scores are provided.
Given the shape-\((N, C)\) tensor of scores,
x
, the softmax classification probabilities are computed. That is, the score for class-\(k\) of a given datum (\(s_{k}\)) is normalized using the ‘softmax’ transformation:\[p_{k} = \frac{e^{s_k}}{\sum_{i=1}^{C}{e^{s_i}}}\]This produces the “prediction probability distribution”, \(p\), for each datum. The cross-entropy loss for that datum is then computed according to the true class-index for that datum, as reported in
y_true
. That is the “true probability distribution”, \(t\), for the datum is \(1\) for the correct class-index and \(0\) elsewhere.The cross-entropy loss for that datum is thus:
\[l = - \sum_{k=1}^{C}{t_{k} \log{p_{k}}}\]Having computed each per-datum cross entropy loss, this function then returns the loss averaged over all \(N\) pieces of data:
\[L = \frac{1}{N}\sum_{i=1}^{N}{l_{i}}\]Examples
>>> import mygrad as mg >>> from mygrad.nnet import softmax_crossentropy
Let’s take a simple case where N=1, and C=3. We’ll thus make up classification scores for a single datum. Suppose the scores are identical for the three classes and that the true class is class-0:
>>> x = mg.Tensor([[2., 2., 2.]]) # a shape-(1, 3) tensor of scores >>> y_true = mg.Tensor([0]) # the correct class for this datum is class-0
Because the scores are identical for all three classes, the softmax normalization will simply produce \(p = [\frac{1}{3}, \frac{1}{3}, \frac{1}{3}]\). Because class-0 is the “true” class, \(t = [1., 0., 0.]\). Thus our softmax cross-entropy loss should be:
\[-(1 \times \log{\frac{1}{3}} + 0 \times \log{\frac{1}{3}} + 0 \times \log{\frac{1}{3}}) = \log(3) \approx 1.099\]Let’s see that this is what
softmax_crossentropy
returns:>>> softmax_crossentropy(x, y_true) Tensor(1.09861229)
Similarly, suppose a datum’s scores are \([0, 0, 10^6]\), then the softmax normalization will return \(p \approx [0., 0., 1.]\). If the true class for this datum is class-2, then the loss should be nearly 0, since \(p\) and \(t\) are essentially identical:
\[-(0 \times \log{0} + 0 \times \log{0} + 1 \times \log{1}) = -\log(1) = 0\]Now, let’s construct
x
andy_true
so that they incorporate the scores/labels for both of the data that we have considered:>>> x = mg.Tensor([[2., 2., 2.], # a shape-(2, 3) tensor of scores ... [0., 0., 1E6]]) >>> y_true = mg.Tensor([0, 2]) # the class IDs for the two data
softmax_crossentropy(x, y_true)
will return the average loss of these two data, \(\frac{1}{2}(1.099 + 0) \approx 0.55\):>>> softmax_crossentropy(x, y_true) Tensor(0.54930614)