Understanding Fast.ai magic: added layers

fredguth · May 29, 2018, 3:19am

Fast.ai is smart enough to understand from my datasets that I want a category classification and automagically define the model adding layers to a my chosen cnn architecture.

What I don’t know is what exactly has it added. When I print learn.model, what was there originally and what was added?

Besides, is it possible to generate a graph drawing of this model?

crcrpar · May 29, 2018, 4:31am

Hi,

I think this (Visualizing your network in PyTorch) or this (GitHub - lanpa/tensorboardX: tensorboard for pytorch (and chainer, mxnet, numpy, ...)) will work.

fredguth · May 29, 2018, 11:31am

Thanks @crcrpar

I will try those. I still need to know which are the added layers though, so that I only draw those layers added.

bny6613 · May 29, 2018, 2:45pm

If you just want to understand how it builds the network, you can look at https://github.com/fastai/fastai/blob/master/fastai/conv_learner.py. In brief - it cuts top layers of pretrained model based on architecture, then adds few fully connected layers with relu activations and Softmax or Sigmoid(if it is a multilabel classification) as an output layer.

fredguth · May 30, 2018, 9:05pm

@bny6613 I didn’t understand the code. Is all the model bellow added by fast.ai or is it the arch? If it is mainly the arch, where does the customization starts?

Here is the output of learn.summary():
Sequential(
(0): Conv2d (3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(2): ReLU(inplace)
(3): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1))
(4): Sequential(
(0): Bottleneck(
(conv1): Conv2d (64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): Bottleneck(
(conv1): Conv2d (256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(2): Bottleneck(
(conv1): Conv2d (256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
)
(5): Sequential(
(0): Bottleneck(
(conv1): Conv2d (256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d (256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): Bottleneck(
(conv1): Conv2d (512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(2): Bottleneck(
(conv1): Conv2d (512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(3): Bottleneck(
(conv1): Conv2d (512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
)
(6): Sequential(
(0): Bottleneck(
(conv1): Conv2d (512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d (512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): Bottleneck(
(conv1): Conv2d (1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(2): Bottleneck(
(conv1): Conv2d (1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(3): Bottleneck(
(conv1): Conv2d (1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(4): Bottleneck(
(conv1): Conv2d (1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(5): Bottleneck(
(conv1): Conv2d (1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
)
(7): Sequential(
(0): Bottleneck(
(conv1): Conv2d (1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d (1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): Bottleneck(
(conv1): Conv2d (2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
(2): Bottleneck(
(conv1): Conv2d (2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(conv2): Conv2d (512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(conv3): Conv2d (512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
)
)
(8): AdaptiveConcatPool2d(
(ap): AdaptiveAvgPool2d(output_size=(1, 1))
(mp): AdaptiveMaxPool2d(output_size=(1, 1))
)
(9): Flatten(
)
(10): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True)
(11): Dropout(p=0.25)
(12): Linear(in_features=4096, out_features=512)
(13): ReLU()
(14): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True)
(15): Dropout(p=0.5)
(16): Linear(in_features=512, out_features=102)
(17): LogSoftmax()
)

bny6613 · May 30, 2018, 9:16pm

This is the custom part added by library

fredguth · May 30, 2018, 9:21pm

Thanks

bholmer · October 23, 2018, 11:04pm

I would like to add layers at the “foot” of a pre-trained network, and fine tune the parameters for those layers. The motivation for this is to turn arbitrary numerical data into an “image” which then a pre-trained CNN can classify. A specific application would be to turn audio data into an image. Rather than doing the usual FFTs to create a spectrogram, why not learn the optimal transformation to a 2D representation?

The main issue that I’m not sure of is whether gradients can be propagated backwards through the normalization of the pixel values.

Thanks

fredguth · October 24, 2018, 1:32pm

This seems to be a preprocessing stage which should not be in the network. Audio data is normally transformed to image using a Fast Fourier Transform , if I am not wrong. FFT is avaliable in opencv and I saw once there is a CUDA implementation.