Cannot use SqueezeNet with create_cnn

It’s not supported by fastai v1 yet. If you want to make a PR out of it, for the reason you just discovered :wink:

Oh, got it.

Sure, I’ll definitely make a try! Will post a link here if/when create something useful :smile:

It’s probably as simple as adding a suitable entry to the meta dict.

1 Like

Not sure if I am going in the right direction but here is my fork with a basic attempt to bring SqueezeNet arch into create_cnn func:

I have a question about line:

nf = num_features_model(body) * 2

It doesn’t work with SqueezeNet because during training process an exception is raised:

    def batch_norm(input, running_mean, running_var, weight=None, bias=None,
                   training=False, momentum=0.1, eps=1e-5):
        r"""Applies Batch Normalization for each channel across a batch of data.
        See :class:`~torch.nn.BatchNorm1d`, :class:`~torch.nn.BatchNorm2d`,
        :class:`~torch.nn.BatchNorm3d` for details.
        if training:
            size = list(input.size())
            if reduce(mul, size[2:], size[0]) == 1:
                raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
        return torch.batch_norm(
            input, weight, bias, running_mean, running_var,
>           training, momentum, eps, torch.backends.cudnn.enabled
E       RuntimeError: running_mean should contain 1024 elements not 512

So I’ve introduced an additional meta parameter called mult to account this issue and multiply 4 instead. Also, squeeze net has a bit different layout and that’s why I’ve created a specific function to detach body:

def _squeezenet_body(m:nn.Module): return m.features

Finally, I am not sure which values to use here:

def _squeezenet_split(m:nn.Module): return (m[1],)

My goal was to make fit_one_cycle working so I was patching here and there to make it work :smile: Could you please tell me if these changes look reasonable? I would like to use this arch in my experiments, so that’s why I am trying to make it work with fastai.

The fork contains a very basic draft and will be arranged into the required format with tests and notebooks if you would like to bring this stuff (when it is ready and approved) into master.

The basic approach is sound, but you need to figure out why this is happening and fix it, rather than mult by 4.

The reason for the mult by 2 is that concat pooling has both max and avg pooled versions of previous # filters.

So the bug here is that we’re not correctly determining the number of outputs from the squeezenet conv layers, or that you’ve cut at the wrong spot. So if you can debug why that’s happening, then we should be fine to use the existing code.

1 Like

Shouldn’t we use a hook_ouput to figure that number of features anyway?

1 Like

I wondered about that. It’s certainly an option if it turns out to be non-trivial.

Yes, the main reason why I’ve added this multiplier is that I wasn’t really sure why multiplication by 2 exists. Now I see the reason, i.e., the constant does not come from that specific architecture, and therefore it is not a solution then.

Sure, I guess that I just need to trace the code to see what is going on.

1 Like

The problem seems to be that num_features_models looks if a particular model has batch norm , and if so determines the features from it. So it fails for models without batch norm. So for squeeze net it returns a none type object. So I just hardcoded nf to be 512 *2 ,and cut the model at ‘-2’.
So we have to change num_features to properly determine the features.

The following code works for me to create squeeze net learner.

def create_cnn_snet(data:DataBunch, 
                lin_ftrs:Optional[Collection[int]]=None, ps:Floats=0.5,
                custom_head:Optional[nn.Module]=None, split_on:Optional[SplitFuncOrIdxList]=None,
                classification:bool=True, **kwargs:Any):
    body = create_body(snet, -1)
    nf = 512* 2
    head = custom_head or create_head(nf, data.c, lin_ftrs, ps)
    model = nn.Sequential(body, head)
    learn = ClassificationLearner(data, model, **kwargs)
    def _default_split(m:nn.Module): return (m[1],)
    meta = {'cut':-2, 'split':_default_split}
    apply_init(model[1], nn.init.kaiming_normal_)
    return learn

where snet=torchvision.models.squeezenet1_1(True)

Note that I change of num_features_model is implemented, and it should now work with any model. Therefore you can easily add new models to fastai by just specifying their metadata.

There may be a bug in the new function.

def num_features_model(m:nn.Module)->int:
    "Return the number of output features for a `model`."
    return model_sizes(tst_model, full=False)[-1][1]

I think the input to model_sizes should be ‘m’ not ‘tst_model’

Pardon me if I am wrong about this.

It’s fixed, thanks for flagging!


Ok, great! As I can see, you’re using output hook to precisely detect output size, as was proposed. So I think that now it is simple to introduce SqeezeNet (and others). I am going to update my code then, and make a PR using the new method.

1 Like

Eventually, I’ve added sqeeze net meta data in this PR. Please let me know if you would like to add a notebook showing the new behavior or modify something in the PRs code to make it ready to merge into the lib.

@sgugger @jeremy I’ve been able to make Squeezenet1_1 work with updated num_features_model. I did not supply a new meta for this model and relied on default_meta (cut: -1) . And have been able to get 92.67% accuracy on a subset of AWA2 dataset. Let me know if you see any issues with my approach.

I’m trying to use this model in a stanford course project that I’m taking this year.

1 Like

This is how I am loading non-resnet torchvision models with fastai v1.0.

from fastai import *
from import *
from torchvision.models import densenet201 # or whatever
def _densenet201_split(m:nn.Module): return (m[0][0][7], m[1]) # or whatever 
_densenet201_meta  = {'cut':-1, 'split': _densenet201_split} # or whatever = { densenet201:{**_densenet201_meta} }
arch = densenet201

You can use the default cut/meta/features but forgo control over layer groups for discriminative learning rates.

NB: I’m not convinced I am able to achieve the same results as I was with fastai v0.7 and am still investigating why.


Hi @devforfu,

In the SuqeezeNet implementation, I saw this comment in the code for _squeezenet_split:

Split squeezenet model on maxpool layers

However, with the _squeezenet_split, the model won’t be split on the max pool layer. The model’s body (or “features”, in terms of the SqueezeNet class) was:


And it’ll be split into:

 1. ['Conv2d', 'ReLU', 'MaxPool2d', 'Fire', 'Fire'],
 2. ['Fire', 'MaxPool2d', 'Fire'],
 3. ['Fire', 'Fire', 'Fire', 'MaxPool2d', 'Fire'],

Although it probably doesn’t matter that much, but that comment makes me wonder whether the _squeezenet_split works as expected as the model is not split by the max pool layers. Thanks!

@PPPW Yes, makes sense! I guess that I’ve messed a bit with my commentary and probably implementation as well. Actually, I had in mind something like this:

['Conv2d', 'ReLU', 'MaxPool2d', 'Fire', 'Fire', 'Fire', 'MaxPool2d']
['Fire', 'Fire', 'Fire', 'Fire', 'MaxPool2d']
['Fire'] + custom_head_layers

But forget to property test the splitting code. So it definitely makes sense to use a better default split. I’ll submit a new PR to adjust this behavior to something more meaningful. Or I believe you can do it as well if you manage to do it faster than I do :slight_smile:

Actually, my choice of default splitting was a bit arbitrary so maybe there is a better solution :smile:

Hi @devforfu, Thanks for double checking! Perhaps you can merge your layer group 2 and 3? If people use lr_range to get the learning rates, then the layer group 2 will have much smaller lr than group 3, so forth for the custom head layers. It doesn’t quite make sense why the last “Fire” layer is so special in this split?..

Probably it doesn’t matter that much since people can set learning rates to be whatever they want