Lesson 2: further discussion ✅

This is a place to talk about more advanced or tangential topics related to the Lesson 2 lecture. This will not be monitored during class, but we will read it afterwards.

Feel free to discuss anything you like, as long as it’s at least somewhat related to what’s happening in class.


Two Questions about LR Finding:

Let us say we have a NN: P1->P2->N.
Here P1, P2 are pre-trained and N is the new layer.

The standard pipeline:
Stage 1: Freeze P1, P2. Find LR using Leslie for N. This is straightforward.

Stage 2: Unfreeze all. Using discriminative LR. When we provide a slice are controlling the LR to apply to N. The LRs for P1 and P2 are some fixed factor of the LR of N?


Does it make any sense to use any optimizer other than SGD during LR finding? So if we are using Adam say then the procedure would be “Find LR using SGD and pass the estimated LR to Adam”?


Q: to use resnet architecture but without per-trained weights, i would set pretrained=False. Do i need to unfreeze or is this done automatically? do i need to initialize random weights, etc?


Not sure about fastai setting unfreeze() but as far as Pytorch is concerned, setting pretrained=False is all you need to do. Initialization is handled in PyTorch (i.e. torchvision) - https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L112-L117


I think when you create a model the initialization is taken care of.

def create_cnn(data:DataBunch, arch:Callable, cut:Union[int,Callable]=None, pretrained:bool=True,
            lin_ftrs:Optional[Collection[int]]=None, ps:Floats=0.5,
            custom_head:Optional[nn.Module]=None, split_on:Optional[SplitFuncOrIdxList]=None,
            classification:bool=True, **kwargs:Any)->None:
"Build convnet style learners."
assert classification, 'Regression CNN not implemented yet, bug us on the forums if you want this!'
meta = cnn_config(arch)
body = create_body(arch(pretrained), ifnone(cut,meta['cut']))
nf = num_features_model(body) * 2
head = custom_head or create_head(nf, data.c, lin_ftrs, ps)
model = nn.Sequential(body, head)
learn = ClassificationLearner(data, model, **kwargs)
if pretrained: learn.freeze()
apply_init(model[1], nn.init.kaiming_normal_)
return learn

If you put pretrained = false then it would not freeze the layers and so there is no need to do it. Similarly, the model is also initialized inherently.


Q: to make model generalize well, even to an extent where ‘test’ images look ‘different’ in some ways that the images that model was trained on. but the ‘test’ images are not available just expected to look ‘different’ in some ways. are the ‘standard’ over fitting techniques good enough or do i need to do something extra? like higher dropout, more aggressive transformations in data augmentation, less cycles/epochs, higher learning rate? to say even to sacrifice the validation score?

Q: Sometimes (especially after training the head layers and unfreezing) the learning rate finder doesn’t show a characteristic downslope:
(this image is taken from the lesson 1 nb https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson1-pets.ipynb). From last year’s course we are used to look for a figure like:
(this image is taken from https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson2-planet.ipynb)

My question: How to interpret the learning rate finder results if it doesn’t have a downslope?


So normally in DL, there’s a very vague understanding of what we mean by generalization. Sure there is the fact that we would want the model to be invariant to all kinds of transforms and even some distortions, there are limitations to this. One of the most crucial assumptions when we develop DL models is that the train, val and test set come from the same distribution which simply means the same dataset.

Although what you suggested might make the model more robust to invariance i wonder if it would lead to better generalization.

One other thing to mention is most real world examples are noisy in nature, the thing about datasets we most often don’t see are these datasets have been created through careful curation. So to make it robust to noise as well, adversarial training can be done.

1 Like

what if we wanted for example take model trained on images from ‘south’ to ‘north’ or from ‘east’ to ‘west’. We could expect some differences in images. But how to train a model that would work on both image sets, but without having access to both image sets. We can only train on say ‘south’ data set but make inference on ‘north’ data set. are there techniques for that?

1 Like

I didn’t understand inference. Why are we creating a new data bunch, instead using the previous one?

can’t we just use the older data bunch and input our image?

empty_data = ImageDataBunch.single_from_classes(path, data.classes, tfms=get_transforms()).normalize(imagenet_stats)

I have a question for after class regarding the “delete photos from dataset” concept (new widget) introduced:

In which cases does it make sense to do delete images that “don’t belong”?
In which cases is it better to create a new “other” category in order for the network to be able to discern between the actual classes and random bullshit (“none of the above”)?

Especially in “real-world” multiclass settings involving real people you will always get those (people uploading hotdog photos to the cat/dog classifier app etc.)?

I have wondered this e.g. in the google quickdraw dataset. There is no “none” /“other”/“random” category, although clearly a lot of times people just doodle random stuff not belonging to any of the 340/345 categories. Would it not be helpful to distinguish this instead of predicting one of the existing known classes? Or would this hinder the network from learning the actual classes?

Is it better to train only on the correct categories and then have a mechanism that based on very low probabilities across categories will say “none of the above”? (Isn’t this difficult when using softmax, because that will still give you some “winner” category most of the time)


You can while you are still in the same notebook, same session, and have everything initialized anyways.

What this method refers to is if you have trained a model, that “phase” of the project is finished, and now you just want to run that model as part of an app (and most likely not within a notebook). You don’t want to load any training data then or validation data, you just want to reload your trained model and weights and do inference, meaning making preditions using a learned model.


I want to add a note about downloading images. This is the process I did:

  • Download some images from Google using this tool to my laptop.
  • Then clean out bad images by hand.
  • Resize them and create a tarball.
  • Upload it into a GitHub release.
  • Then use that dataset in the notebook as usual.

Here’s the whole process with some more details.

Unfortunately, I couldn’t use untar_data due to an issue. So, I had to come up with a replacement function.
But I’ll try to fix it and do a PR this week.


Data Curation, Deleting images that don’t belong is a part of that. In our case in the download images example what we did was download it from the website and we had no filter to check if the data actually belonged to the classes we wanted.
So when we “deleted” the images which didnt belong we were simply curating the data. Another reason why we delete the images that don’t belong is that ultimately these images in our dataset are assigned to some class, and if we keep them in the dataset, two things happen the network might learn a wrong representation of that class and it might misclassify. Now none of this usually happens since the amount of these examples is very low. But i think its more of “safe than sorry” practice.

I think the call to have an “other” category is more of a choice and not a necessity in the sense that if you are sure that you’ll only input images belonging to the classes you have then it would make little sense to have additional classes. On the flip side not breaking the model when you input images not belonging to the classes is a big reason to have an “other” category.

In the case of the quickdraw dataset, i wonder if doodle’s were actually added. I think the dataset was preprocessed and these outliers were removed before being released. I’m not entirely sure.

1 Like

If I have several instances of the fastai.vision.image.Image class, what is the best way to display them in a grid?

For example:

x1 = open_image('tmp_027.jpg')
x2 = open_image('tmp_029.jpg')

will place the two images vertically, but I’d like to put them side by side. If I have more images, then I’ll want to put them in a grid.

Although the above example uses open_image to create the image, in general, the images I want to display are calculated rather than read from disk.

Can plt’s subplot be used? An example would help. Thanks.

1 Like

Which data loader did you use for Quickdraw dataset?

Question: I’m excited to deploy a previous model I wrote and create a web app around it after yesterday’s class. I wrote my Data Preparation code in Apache Spark. What is the recommended way to prepare the data during inference for realtime predictions?

Not sure if Spark is the right choice as in inference we wont have huge batches of data but single instances of data points to transform for prediction. But at the same time, If we choose some other data processing engine, I have to re-write data processing code in that language.

I might be completely wrong here, in Leslie Smith’s paper it was said that the loss reaches the minimum and then shoots up, thus if we follow back from the shoot-up, the minimum loss must be the lowest loss, here for the first figure it is 1e-4 and the most recent bulge is at 1e-5 thus the slice should be slice(1e-5,1e-4) but I don’t know why the nb has it as slice(1e-6,1e-4). If the initial part before overshoot is completely flat it would have been better if we could zoom. but can we do that in a jupyter-nb?

1 Like

Take a look at the source code for show_batch and tell us what you can figure out from that. Let us know if you get stuck!

I’m currently participating in the Human Protein Atlas Image Classification challenge on Kaggle and trying to use FastAI V1 for it.

The challenge is that instead of a single 3-channel RGB image, you have 4 grayscale images of the same subcellular structure under different filters (i.e. different chemicals). Each image highlights a different part of the cell, shown below: the protein (green), microtubles (red), nucleus (blue), endoplasmic reticulum (yellow)

The green image is the one that needs to be classified, and the rest are for reference (but surely useful!). It’s a multilabel classification problem with 28 classes (like “Cytosol” and “Plasma membrane” above).

I have 2 questions:

  1. How to load the 4 images together into a single 4-channel image using FastAI’s ImageDataBunch?
  2. How can we do transfer learning using Resnet34, since the backbone expects a 3-channel RGB image, but here there are 4?