mygrad.Tensor.backward#
- Tensor.backward(grad: ArrayLike | None = None)[source]#
Trigger backpropagation and compute the derivatives of this tensor.
Designating this tensor as the tensor ℒ, compute dℒ/dx for all (non-constant) tensors that preceded ℒ in its computational graph, and store each of these derivatives in
x.grad
respectively.Once back-propagation is finished, the present tensor is removed from all computational graphs, and the preceding graph is cleared.
If ℒ is a non-scalar tensor (i.e.
ℒ.ndim
is greater than 0), then callingℒ.backward()
will behave as if ℒ was first reduced to a scalar via summation. I.e. it will behave identically toℒ.sum().backward()
; this ensures that each element of any dℒ/dx will represent a derivative of a scalar function.- Parameters:
- gradOptional[array_like], (must be broadcast-compatible with
self
By default, the present tensor is treated as the terminus of the computational graph (ℒ). Otherwise, one can specify a “downstream” derivative, representing
dℒ/d(self)
. This can be used to effectively connect otherwise separate computational graphs.
- gradOptional[array_like], (must be broadcast-compatible with
Examples
>>> import mygrad as mg >>> x = mg.tensor(2) >>> y = mg.tensor(3) >>> w = x * y >>> ℒ = 2 * w >>> ℒ.backward() # computes dℒ/dℒ, dℒ/dw, dℒ/dy, and dℒ/dx
>>> ℒ.grad # dℒ/df == 1 by identity array(1.) >>> w.grad # dℒ/dw array(2.) >>> y.grad # dℒ/dy = dℒ/dw * dw/dy array(4.) >>> x.grad # dℒ/dx = dℒ/dw * dw/dx array(6.)
Calling
ℒ.backward()
from a non-scalar tensor is equivalent to first summing that tensor.>>> tensor = mg.tensor([2.0, 4.0, 8.0]) >>> ℒ = tensor * tensor[::-1] # [x0*x2, x1*x1, x2*x0] >>> ℒ.backward() # behaves like ℒ = x0*x2 + x1*x1 + x2*x0 >>> tensor.grad array([16., 8., 4.])
>>> tensor = mg.Tensor([2.0, 4.0, 8.0]) >>> ℒ = tensor * tensor[::-1] >>> ℒ.sum().backward() >>> tensor.grad array([16., 8., 4.])
Specifying a value for
grad
>>> x = mg.Tensor(1.) >>> x.backward(2.) >>> x.grad # Would normally be dℒ/dℒ == 1 array(2.)