Trying to list some of the defaults used in fastai. I thought it would a good idea to have them in one place to refer in case we cannot use fastai for some project and in general to know what are the state-of-the-art practices.
torch.backends.cudnn.benchmark = True
-
(1e-7, 1-1e-7)
for clamping. - Normalization layers
bias=1e-3
andweight=0 or 1
depending on where the norm layer occurs. -
Linear + activation + BatchNorm + Dropout
. - Sigmoid clamped to
(1e-7, 1-1e-7)
. - Leaky ReLU
negative_slope=0.3
. - Initialization used for a given activation function
Activation func | Initialization |
---|---|
ReLU, ReLU6, LeakyReLU, Swish, Mish | kaiming_uniform |
Sigmoid, Tanh | xavier_uniform |
- Initialization for
Linear
layer (includes conv layer also).bias = normal(mean=0, std=0.01)
-
weight
: the above table is used for this.
-
Conv
layerkernel_size=3
-
padding = (kernel_size-1)//2
,zero
for transposed conv. -
BatchNorm
,ReLU
used. - Output is
Convolution + BatchNorm + ReLU
-
Embedding
layer initialized asweight.data.normal_(mean=0, std=1).fmod_(2)
- For merging skip-connections
addition
is used.