Cannot use SqueezeNet with create_cnn

devforfu · October 30, 2018, 3:12pm

It seems that squeezenet1_1 architecture is not compatible with create_cnn function:

Traceback (most recent call last):
  File "doodles/doodles_dataset.py", line 283, in <module>
    main()
  File "doodles/doodles_dataset.py", line 74, in main
    learn = create_cnn(bunch, args['network'])
  File "/home/ck/code/fastai_v1/repo/fastai/vision/learner.py", line 52, in create_cnn
    nf = num_features_model(body) * 2
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

The num_features_model function returns None:

assert num_features_model(create_body(squeezenet1_1())) is None

I guess it somehow can’t find a top-most layer. Is it a bug, or do I need to create such kind of models using different API? I can just construct the model manually I think but wanted to clarify.

sgugger · October 30, 2018, 7:13pm

It’s not supported by fastai v1 yet. If you want to make a PR out of it, for the reason you just discovered

devforfu · October 31, 2018, 1:32am

Oh, got it.

Sure, I’ll definitely make a try! Will post a link here if/when create something useful

jeremy · October 31, 2018, 6:03pm

It’s probably as simple as adding a suitable entry to the meta dict.

devforfu · November 3, 2018, 12:28pm

Not sure if I am going in the right direction but here is my fork with a basic attempt to bring SqueezeNet arch into create_cnn func:

I have a question about line:

nf = num_features_model(body) * 2

It doesn’t work with SqueezeNet because during training process an exception is raised:

    def batch_norm(input, running_mean, running_var, weight=None, bias=None,
                   training=False, momentum=0.1, eps=1e-5):
        r"""Applies Batch Normalization for each channel across a batch of data.
    
        See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
        :class:`~torch.nn.BatchNorm3d` for details.
        """
        if training:
            size = list(input.size())
            if reduce(mul, size[2:], size[0]) == 1:
                raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
        return torch.batch_norm(
            input, weight, bias, running_mean, running_var,
>           training, momentum, eps, torch.backends.cudnn.enabled
        )
E       RuntimeError: running_mean should contain 1024 elements not 512

So I’ve introduced an additional meta parameter called mult to account this issue and multiply 4 instead. Also, squeeze net has a bit different layout and that’s why I’ve created a specific function to detach body:

def _squeezenet_body(m:nn.Module): return m.features

Finally, I am not sure which values to use here:

def _squeezenet_split(m:nn.Module): return (m[1],)

My goal was to make fit_one_cycle working so I was patching here and there to make it work Could you please tell me if these changes look reasonable? I would like to use this arch in my experiments, so that’s why I am trying to make it work with fastai.

The fork contains a very basic draft and will be arranged into the required format with tests and notebooks if you would like to bring this stuff (when it is ready and approved) into master.

jeremy · November 3, 2018, 2:40pm

The basic approach is sound, but you need to figure out why this is happening and fix it, rather than mult by 4.

The reason for the mult by 2 is that concat pooling has both max and avg pooled versions of previous # filters.

So the bug here is that we’re not correctly determining the number of outputs from the squeezenet conv layers, or that you’ve cut at the wrong spot. So if you can debug why that’s happening, then we should be fine to use the existing code.

sgugger · November 3, 2018, 2:44pm

Shouldn’t we use a hook_ouput to figure that number of features anyway?

jeremy · November 3, 2018, 2:45pm

I wondered about that. It’s certainly an option if it turns out to be non-trivial.

devforfu · November 3, 2018, 2:50pm

Yes, the main reason why I’ve added this multiplier is that I wasn’t really sure why multiplication by 2 exists. Now I see the reason, i.e., the constant does not come from that specific architecture, and therefore it is not a solution then.

Sure, I guess that I just need to trace the code to see what is going on.

phaniteja · November 6, 2018, 6:53pm

The problem seems to be that num_features_models looks if a particular model has batch norm , and if so determines the features from it. So it fails for models without batch norm. So for squeeze net it returns a none type object. So I just hardcoded nf to be 512 *2 ,and cut the model at ‘-2’.
So we have to change num_features to properly determine the features.

The following code works for me to create squeeze net learner.


def create_cnn_snet(data:DataBunch, 
                lin_ftrs:Optional[Collection[int]]=None, ps:Floats=0.5,
                custom_head:Optional[nn.Module]=None, split_on:Optional[SplitFuncOrIdxList]=None,
                classification:bool=True, **kwargs:Any):
    body = create_body(snet, -1)
    nf = 512* 2
    head = custom_head or create_head(nf, data.c, lin_ftrs, ps)
    model = nn.Sequential(body, head)
    learn = ClassificationLearner(data, model, **kwargs)
    def _default_split(m:nn.Module): return (m[1],)
    meta = {'cut':-2, 'split':_default_split}
    learn.split(ifnone(split_on,meta['split']))
    learn.freeze()
    apply_init(model[1], nn.init.kaiming_normal_)
    return learn

where snet=torchvision.models.squeezenet1_1(True)

sgugger · November 6, 2018, 9:16pm

Note that I change of num_features_model is implemented, and it should now work with any model. Therefore you can easily add new models to fastai by just specifying their metadata.

phaniteja · November 6, 2018, 9:56pm

There may be a bug in the new function.

def num_features_model(m:nn.Module)->int:
    "Return the number of output features for a `model`."
    return model_sizes(tst_model, full=False)[-1][1]

I think the input to model_sizes should be ‘m’ not ‘tst_model’

Pardon me if I am wrong about this.

sgugger · November 6, 2018, 9:57pm

It’s fixed, thanks for flagging!

phaniteja · November 6, 2018, 10:00pm

Awesome.

devforfu · November 7, 2018, 3:37am

Ok, great! As I can see, you’re using output hook to precisely detect output size, as was proposed. So I think that now it is simple to introduce SqeezeNet (and others). I am going to update my code then, and make a PR using the new method.

devforfu · November 13, 2018, 7:06am

Eventually, I’ve added sqeeze net meta data in this PR. Please let me know if you would like to add a notebook showing the new behavior or modify something in the PRs code to make it ready to merge into the lib.

swsaraf · December 1, 2018, 8:36pm

@sgugger @jeremy I’ve been able to make Squeezenet1_1 work with updated num_features_model. I did not supply a new meta for this model and relied on default_meta (cut: -1) . And have been able to get 92.67% accuracy on a subset of AWA2 dataset. Let me know if you see any issues with my approach.

github.com

swarna04/cs230/blob/master/Squeezenet1_1TrainOnAwA2.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Squeezenet train on AWA2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ensure any edits to libraries are reloaded here automatically. Use `matplotlib inline` to display any charts or images inline in the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},

This file has been truncated. show original

I’m trying to use this model in a stanford course project that I’m taking this year.

digitalspecialists · December 2, 2018, 5:30pm

This is how I am loading non-resnet torchvision models with fastai v1.0.

from fastai import *
from fastai.vision import *
import fastai.vision
from torchvision.models import densenet201 # or whatever
def _densenet201_split(m:nn.Module): return (m[0][0][7], m[1]) # or whatever 
_densenet201_meta  = {'cut':-1, 'split': _densenet201_split} # or whatever
fastai.vision.learner.model_meta = { densenet201:{**_densenet201_meta} }
arch = densenet201

You can use the default cut/meta/features but forgo control over layer groups for discriminative learning rates.

NB: I’m not convinced I am able to achieve the same results as I was with fastai v0.7 and am still investigating why.

PPPW · January 24, 2019, 1:11am

Hi @devforfu,

In the SuqeezeNet implementation, I saw this comment in the code for _squeezenet_split:

Split squeezenet model on maxpool layers

However, with the _squeezenet_split, the model won’t be split on the max pool layer. The model’s body (or “features”, in terms of the SqueezeNet class) was:

['Conv2d',
 'ReLU',
 'MaxPool2d',
 'Fire',
 'Fire',
 'Fire',
 'MaxPool2d',
 'Fire',
 'Fire',
 'Fire',
 'Fire',
 'MaxPool2d',
 'Fire']

And it’ll be split into:

 1. ['Conv2d', 'ReLU', 'MaxPool2d', 'Fire', 'Fire'],
 2. ['Fire', 'MaxPool2d', 'Fire'],
 3. ['Fire', 'Fire', 'Fire', 'MaxPool2d', 'Fire'],

Although it probably doesn’t matter that much, but that comment makes me wonder whether the _squeezenet_split works as expected as the model is not split by the max pool layers. Thanks!

devforfu · January 24, 2019, 4:52am

@PPPW Yes, makes sense! I guess that I’ve messed a bit with my commentary and probably implementation as well. Actually, I had in mind something like this:

['Conv2d', 'ReLU', 'MaxPool2d', 'Fire', 'Fire', 'Fire', 'MaxPool2d']
['Fire', 'Fire', 'Fire', 'Fire', 'MaxPool2d']
['Fire'] + custom_head_layers

But forget to property test the splitting code. So it definitely makes sense to use a better default split. I’ll submit a new PR to adjust this behavior to something more meaningful. Or I believe you can do it as well if you manage to do it faster than I do

Actually, my choice of default splitting was a bit arbitrary so maybe there is a better solution