How to change dropout in partially trained model

iNLyze · May 25, 2018, 12:11pm

With a model trained with ps=0. how can you change dropout for further training?
If you just go learn.models.ps=0.3 it doesn’t seem to use this new value.
However, if you instantiate a new model, e.g. learn=ConvLearner.pretrained(...,ps=0.3, ...) and go learn.load('my-model') (that was saved with (ps=0.) you actually get an error.
Is there better way to starting all over with a different dropout?

digitalspecialists · May 25, 2018, 2:52pm

What is the error? I am 95% sure you could previously create a new learner and set a new ps.

iNLyze · May 25, 2018, 3:03pm

Here goes:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-137-cb30743083f8> in <module>()
----> 1 dumb.load('resnet34-RGB-128')

/modules/fastai/learner.py in load(self, name)
     94 
     95     def load(self, name):
---> 96         load_model(self.model, self.get_model_path(name))
     97         if hasattr(self, 'swa_model'): load_model(self.swa_model, self.get_model_path(name)[:-3]+'-swa.h5')
     98 

/modules/fastai/torch_imports.py in load_model(m, p)
     25 def children(m): return m if isinstance(m, (list, tuple)) else list(m.children())
     26 def save_model(m, p): torch.save(m.state_dict(), p)
---> 27 def load_model(m, p): m.load_state_dict(torch.load(p, map_location=lambda storage, loc: storage))
     28 
     29 def load_pre(pre, f, fn):

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
    719         if len(error_msgs) > 0:
    720             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 721                                self.__class__.__name__, "\n\t".join(error_msgs)))
    722 
    723     def parameters(self):

RuntimeError: Error(s) in loading state_dict for Sequential:
	Missing key(s) in state_dict: "0.bias", "0.running_mean", "0.running_var", "3.weight", "3.bias", "3.running_mean", "3.running_var", "4.weight", "4.bias". 
	Unexpected key(s) in state_dict: "6.0.conv1.weight", "6.0.bn1.weight", "6.0.bn1.bias", "6.0.bn1.running_mean", "6.0.bn1.running_var", "6.0.conv2.weight", "6.0.bn2.weight", "6.0.bn2.bias", "6.0.bn2.running_mean", "6.0.bn2.running_var", "6.0.downsample.0.weight", "6.0.downsample.1.weight", "6.0.downsample.1.bias", "6.0.downsample.1.running_mean", "6.0.downsample.1.running_var", "6.1.conv1.weight", "6.1.bn1.weight", "6.1.bn1.bias", "6.1.bn1.running_mean", "6.1.bn1.running_var", "6.1.conv2.weight", "6.1.bn2.weight", "6.1.bn2.bias", "6.1.bn2.running_mean", "6.1.bn2.running_var", "6.2.conv1.weight", "6.2.bn1.weight", "6.2.bn1.bias", "6.2.bn1.running_mean", "6.2.bn1.running_var", "6.2.conv2.weight", "6.2.bn2.weight", "6.2.bn2.bias", "6.2.bn2.running_mean", "6.2.bn2.running_var", "6.3.conv1.weight", "6.3.bn1.weight", "6.3.bn1.bias", "6.3.bn1.running_mean", "6.3.bn1.running_var", "6.3.conv2.weight", "6.3.bn2.weight", "6.3.bn2.bias", "6.3.bn2.running_mean", "6.3.bn2.running_var", "6.4.conv1.weight", "6.4.bn1.weight", "6.4.bn1.bias", "6.4.bn1.running_mean", "6.4.bn1.running_var", "6.4.conv2.weight", "6.4.bn2.weight", "6.4.bn2.bias", "6.4.bn2.running_mean", "6.4.bn2.running_var", "6.5.conv1.weight", "6.5.bn1.weight", "6.5.bn1.bias", "6.5.bn1.running_mean", "6.5.bn1.running_var", "6.5.conv2.weight", "6.5.bn2.weight", "6.5.bn2.bias", "6.5.bn2.running_mean", "6.5.bn2.running_var", "7.0.conv1.weight", "7.0.bn1.weight", "7.0.bn1.bias", "7.0.bn1.running_mean", "7.0.bn1.running_var", "7.0.conv2.weight", "7.0.bn2.weight", "7.0.bn2.bias", "7.0.bn2.running_mean", "7.0.bn2.running_var", "7.0.downsample.0.weight", "7.0.downsample.1.weight", "7.0.downsample.1.bias", "7.0.downsample.1.running_mean", "7.0.downsample.1.running_var", "7.1.conv1.weight", "7.1.bn1.weight", "7.1.bn1.bias", "7.1.bn1.running_mean", "7.1.bn1.running_var", "7.1.conv2.weight", "7.1.bn2.weight", "7.1.bn2.bias", "7.1.bn2.running_mean", "7.1.bn2.running_var", "7.2.conv1.weight", "7.2.bn1.weight", "7.2.bn1.bias", "7.2.bn1.running_mean", "7.2.bn1.running_var", "7.2.conv2.weight", "7.2.bn2.weight", "7.2.bn2.bias", "7.2.bn2.running_mean", "7.2.bn2.running_var", "10.weight", "10.bias", "10.running_mean", "10.running_var", "12.weight", "12.bias", "14.weight", "14.bias", "14.running_mean", "14.running_var", "16.weight", "16.bias", "1.running_mean", "1.running_var", "4.0.conv1.weight", "4.0.bn1.weight", "4.0.bn1.bias", "4.0.bn1.running_mean", "4.0.bn1.running_var", "4.0.conv2.weight", "4.0.bn2.weight", "4.0.bn2.bias", "4.0.bn2.running_mean", "4.0.bn2.running_var", "4.1.conv1.weight", "4.1.bn1.weight", "4.1.bn1.bias", "4.1.bn1.running_mean", "4.1.bn1.running_var", "4.1.conv2.weight", "4.1.bn2.weight", "4.1.bn2.bias", "4.1.bn2.running_mean", "4.1.bn2.running_var", "4.2.conv1.weight", "4.2.bn1.weight", "4.2.bn1.bias", "4.2.bn1.running_mean", "4.2.bn1.running_var", "4.2.conv2.weight", "4.2.bn2.weight", "4.2.bn2.bias", "4.2.bn2.running_mean", "4.2.bn2.running_var", "5.0.conv1.weight", "5.0.bn1.weight", "5.0.bn1.bias", "5.0.bn1.running_mean", "5.0.bn1.running_var", "5.0.conv2.weight", "5.0.bn2.weight", "5.0.bn2.bias", "5.0.bn2.running_mean", "5.0.bn2.running_var", "5.0.downsample.0.weight", "5.0.downsample.1.weight", "5.0.downsample.1.bias", "5.0.downsample.1.running_mean", "5.0.downsample.1.running_var", "5.1.conv1.weight", "5.1.bn1.weight", "5.1.bn1.bias", "5.1.bn1.running_mean", "5.1.bn1.running_var", "5.1.conv2.weight", "5.1.bn2.weight", "5.1.bn2.bias", "5.1.bn2.running_mean", "5.1.bn2.running_var", "5.2.conv1.weight", "5.2.bn1.weight", "5.2.bn1.bias", "5.2.bn1.running_mean", "5.2.bn1.running_var", "5.2.conv2.weight", "5.2.bn2.weight", "5.2.bn2.bias", "5.2.bn2.running_mean", "5.2.bn2.running_var", "5.3.conv1.weight", "5.3.bn1.weight", "5.3.bn1.bias", "5.3.bn1.running_mean", "5.3.bn1.running_var", "5.3.conv2.weight", "5.3.bn2.weight", "5.3.bn2.bias", "5.3.bn2.running_mean", "5.3.bn2.running_var". 
	While copying the parameter named "0.weight", whose dimensions in the model are torch.Size([1024]) and whose dimensions in the checkpoint are torch.Size([64, 3, 7, 7]).
	While copying the parameter named "1.weight", whose dimensions in the model are torch.Size([512, 1024]) and whose dimensions in the checkpoint are torch.Size([64]).
	While copying the parameter named "1.bias", whose dimensions in the model are torch.Size([512]) and whose dimensions in the checkpoint are torch.Size([64]).

iNLyze · May 25, 2018, 9:26pm

This is odd. The issue appears to affect also general loading, even if dropout stays the same.

Model was created the usual way with:

arch = resnet34
#aug_tfms = transforms_top_down
aug_tfms = [RandomRotateZoom(20, 6.0, 0.15, ps=[0.4, 0.3, 0.1, 0.2]), RandomLighting(0.1, 0.1), RandomDihedral()]
def get_data(sz, bs=bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=aug_tfms)
    return ImageClassifierData_sep.from_csv(PATH, 'train_rgb', csv_fname=LABELS, tfms=tfms, suffix='.png', bs=bs, cat_separator=',', test_name='valid_rgb')

data = get_data(sz)

learn = ConvLearner.pretrained(arch, data, ps=ps, precompute=True)

Now the model doesn’t load previous weights any more. However, while building and training it worked many times over. After restarting the kernel it wouldn’t. Has anyone seen this before?

iNLyze · May 25, 2018, 9:31pm

Btw - ImageClassfierData_sep() ist just a small modification I made for passing through cat_separator to parse_csv_labels() like so:

def parse_csv_labels(fn, skip_header=True, cat_separator = ' '):

    df = pd.read_csv(fn, index_col=0, header=0 if skip_header else None, dtype=str)
    fnames = df.index.values
    df.iloc[:,0] = df.iloc[:,0].str.split(cat_separator)
    return sorted(fnames), list(df.to_dict().values())[0]

def csv_source(folder, csv_file, skip_header=True, suffix='', continuous=False, **kwargs):
    fnames,csv_labels = parse_csv_labels(csv_file, skip_header, **kwargs)
    return dict_source(folder, fnames, csv_labels, suffix, continuous)

class ImageClassifierData_sep(ImageClassifierData):
        @classmethod
        def from_csv(cls, path, folder, csv_fname, bs=64, tfms=(None,None),
               val_idxs=None, suffix='', test_name=None, continuous=False, skip_header=True, num_workers=8, **kwargs):

            assert not (tfms[0] is None or tfms[1] is None), "please provide transformations for your train and validation sets"
            assert not (os.path.isabs(folder)), "folder needs to be a relative path"
            fnames,y,classes = csv_source(folder, csv_fname, skip_header, suffix, continuous=continuous, **kwargs)
            return cls.from_names_and_array(path, fnames, y, classes, val_idxs, test_name,
                    num_workers=num_workers, suffix=suffix, tfms=tfms, bs=bs, continuous=continuous)
```

iNLyze · May 28, 2018, 4:10pm

Sorry, if this has been posted elsewhere, but I couldn’t find it.
I seem to be having a pretty serious problem here (not being able to load/save properly), which might be caused by my failure to understand something really basic, which is really important.
So, what happens is, not only when I change dropout through ps, but even when not changing dropout “ocassionally” I get the above mentioned error (related to Error(s) in loading state_dict for Sequential).
Tried to reproduce, but it sometimes happens ans sometimes not.

So, what I did is:

Create a model as shown above
Trained a couple of epochs with with weights frozen, then unfrozen (using SGD with restarts)
Saved using learn.save() or best_save_name parameter in .fit()
Changed sz
Repeated the above steps.
Then all of a sudden happens and I don’t know how to get back to my previously trained state.

It appears to me, that many people would stumble across an issue like this. So, sorry again in case I double-posted, but I just couldn’t find that exact same problem.

I went through the relevant code bits in fastai library for loading and saving as well as instantiating new ConvLearners - just can’t get my head around what I should be doing differently.

Any ideas greatly appreciated.

sgugger · May 28, 2018, 5:07pm

Do you have a notebook to share? With just the error message and your instruction that created the error, it’s hard to see what’s going on.

As for your first question, locate the dropout layer you want to change in your model by typing learn.model, then you can access it by typing children(learn.model)[idx_of_the_layer]. Or if the layer has a specific name by learn.model.that_name.

To change the value of dropout, change the parameter p of this module, in the first case:

children(learn.model)[idx_of_the_layer].p = what_you_want

iNLyze · May 28, 2018, 9:11pm

Good point. I’ll put something together, which tries to reproduce this problem and thanks for your help with dropout.

iNLyze · May 29, 2018, 9:26am

So, here I created a gist for reproducing the issue. Appreciate your time for looking into it. Never mind the accuracies. I didn’t train fully. Just to get to the point when it happens. With my most successful run I got 98.6 % accuracy so far (but can’t load the trained model any more). As of this time unfortunately I am not able to give away the data. I could create some dummy data perhaps, though I feel the issue should be reproducible with any image data set in .from_csv format

iNLyze · May 29, 2018, 1:55pm

There is one clue I got from the error message above, but I don’t know what to do with it:
The error is, essentially, that load_model() which calls load_state_dict() does not find the model architecture in learn, which it expects. This is probably related to some other observation I made and haven’t really understood: If you plot the model using the property learn.model then you get a printout of the new head which fastai created for you (see output below). These probably correspond to the missing keys reported in the RuntimeError msg. However, if you go learn.summary() you get the full model including the pretrained architecture (e.g. resnet34). This probably corresponds to the unexpected keys in the RuntimeError msg. So, apparently, all the layers and their weights were saved, but the current model only seems to know its custome head (again see output below).
I am sure I am missing something super basic and super simple here, just can’t figure out what it is.

Here the output of learn.model

Sequential(
  (0): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): Dropout(p=0.3)
  (2): Linear(in_features=1024, out_features=512, bias=True)
  (3): ReLU()
  (4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): Dropout(p=0.3)
  (6): Linear(in_features=512, out_features=9, bias=True)
  (7): LogSoftmax()
)

And here the output of learn.summary()

OrderedDict([('Conv2d-1',
              OrderedDict([('input_shape', [-1, 3, 128, 128]),
                           ('output_shape', [-1, 64, 64, 64]),
                           ('trainable', False),
                           ('nb_params', tensor(9408))])),
             ('BatchNorm2d-2',
              OrderedDict([('input_shape', [-1, 64, 64, 64]),
                           ('output_shape', [-1, 64, 64, 64]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-3',
              OrderedDict([('input_shape', [-1, 64, 64, 64]),
                           ('output_shape', [-1, 64, 64, 64]),
                           ('nb_params', 0)])),
             ('MaxPool2d-4',
              OrderedDict([('input_shape', [-1, 64, 64, 64]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-5',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(36864))])),
             ('BatchNorm2d-6',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-7',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-8',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(36864))])),
             ('BatchNorm2d-9',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-10',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('BasicBlock-11',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-12',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(36864))])),
             ('BatchNorm2d-13',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-14',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-15',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(36864))])),
             ('BatchNorm2d-16',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-17',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('BasicBlock-18',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-19',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(36864))])),
             ('BatchNorm2d-20',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-21',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-22',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(36864))])),
             ('BatchNorm2d-23',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('trainable', False),
                           ('nb_params', tensor(128))])),
             ('ReLU-24',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('BasicBlock-25',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 64, 32, 32]),
                           ('nb_params', 0)])),
             ('Conv2d-26',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(73728))])),
             ('BatchNorm2d-27',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-28',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-29',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-30',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('Conv2d-31',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(8192))])),
             ('BatchNorm2d-32',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-33',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('BasicBlock-34',
              OrderedDict([('input_shape', [-1, 64, 32, 32]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-35',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-36',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-37',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-38',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-39',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-40',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('BasicBlock-41',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-42',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-43',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-44',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-45',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-46',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-47',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('BasicBlock-48',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-49',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-50',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-51',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-52',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(1.4746e+05))])),
             ('BatchNorm2d-53',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('trainable', False),
                           ('nb_params', tensor(256))])),
             ('ReLU-54',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('BasicBlock-55',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 128, 16, 16]),
                           ('nb_params', 0)])),
             ('Conv2d-56',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(2.9491e+05))])),
             ('BatchNorm2d-57',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-58',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-59',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-60',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('Conv2d-61',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(32768))])),
             ('BatchNorm2d-62',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-63',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('BasicBlock-64',
              OrderedDict([('input_shape', [-1, 128, 16, 16]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-65',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-66',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-67',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-68',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-69',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-70',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('BasicBlock-71',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-72',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-73',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-74',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-75',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-76',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-77',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('BasicBlock-78',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-79',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-80',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-81',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-82',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-83',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-84',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('BasicBlock-85',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-86',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-87',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-88',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-89',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-90',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-91',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('BasicBlock-92',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-93',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-94',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-95',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-96',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(5.8982e+05))])),
             ('BatchNorm2d-97',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('trainable', False),
                           ('nb_params', tensor(512))])),
             ('ReLU-98',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('BasicBlock-99',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 256, 8, 8]),
                           ('nb_params', 0)])),
             ('Conv2d-100',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1.1796e+06))])),
             ('BatchNorm2d-101',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('ReLU-102',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('Conv2d-103',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(2.3593e+06))])),
             ('BatchNorm2d-104',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('Conv2d-105',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1.3107e+05))])),
             ('BatchNorm2d-106',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('ReLU-107',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('BasicBlock-108',
              OrderedDict([('input_shape', [-1, 256, 8, 8]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('Conv2d-109',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(2.3593e+06))])),
             ('BatchNorm2d-110',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('ReLU-111',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('Conv2d-112',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(2.3593e+06))])),
             ('BatchNorm2d-113',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('ReLU-114',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('BasicBlock-115',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('Conv2d-116',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(2.3593e+06))])),
             ('BatchNorm2d-117',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('ReLU-118',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('Conv2d-119',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(2.3593e+06))])),
             ('BatchNorm2d-120',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('trainable', False),
                           ('nb_params', tensor(1024))])),
             ('ReLU-121',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('BasicBlock-122',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 4, 4]),
                           ('nb_params', 0)])),
             ('AdaptiveMaxPool2d-123',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 1, 1]),
                           ('nb_params', 0)])),
             ('AdaptiveAvgPool2d-124',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 512, 1, 1]),
                           ('nb_params', 0)])),
             ('AdaptiveConcatPool2d-125',
              OrderedDict([('input_shape', [-1, 512, 4, 4]),
                           ('output_shape', [-1, 1024, 1, 1]),
                           ('nb_params', 0)])),
             ('Flatten-126',
              OrderedDict([('input_shape', [-1, 1024, 1, 1]),
                           ('output_shape', [-1, 1024]),
                           ('nb_params', 0)])),
             ('BatchNorm1d-127',
              OrderedDict([('input_shape', [-1, 1024]),
                           ('output_shape', [-1, 1024]),
                           ('trainable', True),
                           ('nb_params', tensor(2048))])),
             ('Dropout-128',
              OrderedDict([('input_shape', [-1, 1024]),
                           ('output_shape', [-1, 1024]),
                           ('nb_params', 0)])),
             ('Linear-129',
              OrderedDict([('input_shape', [-1, 1024]),
                           ('output_shape', [-1, 512]),
                           ('trainable', True),
                           ('nb_params', tensor(5.2480e+05))])),
             ('ReLU-130',
              OrderedDict([('input_shape', [-1, 512]),
                           ('output_shape', [-1, 512]),
                           ('nb_params', 0)])),
             ('BatchNorm1d-131',
              OrderedDict([('input_shape', [-1, 512]),
                           ('output_shape', [-1, 512]),
                           ('trainable', True),
                           ('nb_params', tensor(1024))])),
             ('Dropout-132',
              OrderedDict([('input_shape', [-1, 512]),
                           ('output_shape', [-1, 512]),
                           ('nb_params', 0)])),
             ('Linear-133',
              OrderedDict([('input_shape', [-1, 512]),
                           ('output_shape', [-1, 9]),
                           ('trainable', True),
                           ('nb_params', tensor(4617))])),
             ('LogSoftmax-134',
              OrderedDict([('input_shape', [-1, 9]),
                           ('output_shape', [-1, 9]),
                           ('nb_params', 0)]))])
```

sgugger · May 29, 2018, 2:06pm

You found the problem by yourself ;-).

Why does the printout of the model only shows the head? Because you used precompute=True in your first learner, so the fastai library precomputes all the activations of your backbone model (here resnet34), stores them in a tmp directory, and only considers the last layers as the model to speed up things.
At no point did you say precompute=False (which you should before unfreezing) so when you save the model, it only saves the custom head.

When you come back later to your notebook and try to load the model with a new learner where you don’t specify precompute=True (it’s False by default), it doesn’t work because it expects more weights.

If you want to retrieve your saved models, create a learner with precompute=True, load the model, then type learn.precompute = False, then resave your model.

iNLyze · May 29, 2018, 2:10pm

Oh, I see! I have to be more careful about this. I’ll try that right away.

iNLyze · May 29, 2018, 2:21pm

It works. The error is gone, though the weights seem to pretty garbled, because I created some confusion. Well, now I know how to do it, thanks a lot!

alessa · July 12, 2018, 3:30pm

Don’t know if it is the right spot to ask this question.
I try to make the model generalize better - and I searched for some examples, notebooks, that I can follow - nothing so far, just theory.
The plan is to:

1/ train until overfit
2/ data augmentation
3/ add dropout (this is why I found your post)
4/ add weight_decay
5/ test different optimizers/sizes/bs etc

What I have for now is a saved model for two classes (happy/not_happy with 2000 examples each from emotioNet - images in the wild) - after unfreezing it and training many epochs with the following results:

epoch      trn_loss   val_loss   accuracy                   
    x      0.044441   0.107906   0.9625

And TTA 0.96375

The validation Loss doesn’t go lower - and when I try to apply the model to random pictures faces collected in the lab - the result is a non-sense.

So this is why I want to make the model generalise better.

The confusion comes because all these steps 2-5 are defined when initializing the learner

ConvLearner.pretrained(arch, data, precompute=False, xtra_fc=[1024, 512], ps=[0, 0, 0], opt_fn=optimizer)

Do you have any insights, workflow, suggestions on what are the best practices to test the model and make it generalize better?