Difference in results between using Learner and cnn_learner

amritv · February 10, 2020, 5:15am

I am trying to figure the difference in results between using Learner and cnn_learner. For example using this example:

path = untar_data(URLs.IMAGEWOOF)
path_im = path/‘train’

batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
item_tfms = Resize(224)

pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=[parent_label],
item_tfms=item_tfms,
batch_tfms=batch_tfms)

dbs = pets.dataloaders(path_im, bs=16, path=path, tfms=batch_tfms, num_workers=0)

Trained with same lr and epochs
if I use Learner:

net = resnet50(pretrained=True)
learn = Learner(dbs, net, loss_func=LabelSmoothingCrossEntropy(), metrics=[error_rate])

I get pretty lousy results:

But if I use:

arch = resnet50
learn = cnn_learner(dbs, arch, pretrained=True, loss_func=LabelSmoothingCrossEntropy(), metrics=[error_rate])

I get much better results:

I could not figure out why the big difference aren’t the 2 theoretically the same? Thanks

muellerzr · February 10, 2020, 5:24am

It’s actually not! There’s quite a bit happening in the background if we look at the source code for cnn_learner here. What we find is first we pass in a configuration. From this configuration we split our model into two different groups in which we fine tune on (first the frozen then we can unfreeze) just like we did in lesson 1. What’s different? Learner never splits layer groups and freezes, you have to do this outside of Learner (because this is the base class, you won’t assume we will always transfer learn). Along with this, we see that cnn_learner calls create_cnn_model which then does 2 things:

Make a body of our architecture (removes the last layer group so we can transfer learn)
Create a fastai2 head for our model, which is more than just one last linear layer, and then initializes those weights for the last layer randomly.

This behavior is the exact same as V1 vs V2 (V2 will also normalize if it’s assumed too among other things that are different but the main bits are the same). Also, you most likely still have the original 1000 classes as the output of your model (check by doing learn.model[-1] and the dimensions should be 1000). This can be another cause too.

Hope this helps

amritv · February 10, 2020, 5:48am

Thanks for the explanation @muellerzr.

I’m going to have to spend some time understanding this. For cnn_learner: learn.model[-1] results in 10 classes:

for Learner:
I get the following error:

Cheers!

muellerzr · February 10, 2020, 5:51am

That’s because we have one full model without any splits yet (duh). Do learn.model and scroll to the bottom of it to see what that last layer is

amritv · February 10, 2020, 6:11am

Yes your absolutely right 1000 classes

I guess resnet does not have a c_out option only xresnet?

muellerzr · February 10, 2020, 6:15am

IIRC it does actually. It may be n_out or out instead of c_out (but I may be wrong)