Source code for mygrad.nnet.initializers.he_uniform

import numpy as np

from mygrad.nnet.initializers.uniform import uniform


[docs]def he_uniform(*shape, gain=1, dtype=np.float32, constant=None): r"""Initialize a :class:`mygrad.Tensor` according to the uniform initialization procedure described by He et al. Parameters ---------- shape : Sequence[int] The shape of the output Tensor. Note that ``shape`` must be at least two-dimensional. gain : Real, optional (default=1) The gain (scaling factor) to apply. dtype : data-type, optional (default=float32) The data type of the output tensor; must be a floating-point type. constant : bool, optional (default=False) If `True`, the returned tensor is a constant (it does not back-propagate a gradient). Returns ------- mygrad.Tensor, shape=``shape`` A Tensor, with values initialized according to the He uniform initialization. Notes ----- He, Zhang, Ren, and Sun put forward this initialization in the paper "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification" https://arxiv.org/abs/1502.01852 A Tensor :math:`W` initialized in this way should be drawn from a distribution about .. math:: U[-\sqrt{\frac{6}{(1+a^2)n_l}}, \sqrt{\frac{6}{(1+a^2)n_l}}] where :math:`a` is the slope of the rectifier following this layer, which is incorporated using the `gain` variable above. The guidance put forward in that paper is that this initialization procedure should be preferred over the ``mygrad.nnet.initializers.glorot_*`` functions especially when rectifiers (e.g. ReLU, PReLU, leaky_relu) in very deep (> 1-20 or so layer) networks. Examples -------- >>> from mygrad.nnet.initializers import he_uniform >>> he_uniform(2, 3) Tensor([[-0.97671795, 0.85518736, -0.8187388 ], [ 0.7599437 , 0.94951814, -0.96755147]], dtype=float32) >>> he_uniform(4, 2, gain=5/3, dtype="float64", constant=True) Tensor([[-1.10372799, -0.16472136], [-1.32614867, 1.14142637], [ 0.78044471, 0.20562334], [-1.23968259, 1.0057054 ]]) >>> he_uniform(2, 1, 2, dtype="float16") Tensor([[[-0.1233, 0.1023]], [[ 0.3845, 0.1003]]], dtype=float16) """ if len(shape) == 1: shape = shape[0] if len(shape) < 2: raise ValueError("He Uniform initialization requires at least two dimensions") bound = gain / np.sqrt(3 / shape[1] * (np.prod(shape[2:]) if len(shape) > 2 else 1)) return uniform( shape, lower_bound=-bound, upper_bound=bound, dtype=dtype, constant=constant )