Lesson 1 In-Class Discussion ✅

[batch size] is typically chosen between 1 and a few hundreds, e.g. [batch size] = 32 is a good default value, with values above 10 taking advantage of the speedup of matrix-matrix products over matrix-vector products.

Here is a paper
Revisiting Small Batch Training for Deep Neural Networks(https://arxiv.org/abs/1804.07612)

1 Like

Really Interested.

We take the mean/std of each channels for all the images in the dataset. For instance for ImageNet, [0.485, 0.456, 0.406] are the means of the channels R,G, B for the couple of millions of images in the training set.
Then before passing the images through the model, we subtract this mean and divide by this std (always the same), so that the mean/std of each channel of the dataset is 0/1.


I see bunch of questions re: normalization of data, all of that totally makes sense, thanks for the insight! I have two quick related questions though:

  1. Why is the data normalized with the mean/std of the image_net data as opposed to the new cat/dog breed image data? Is this typical of transfer learning - you should always normalize with the starting/pretrained model’s input data, as opposed to the new data you are receiving?

  2. Is there a reason this isn’t baked into the ImageDataBunch initialization, or is it separated because there are times where you perhaps wouldn’t want to normalize? Or maybe this is related to the first question…since you aren’t normalizing with the new data bunch you are feeding in, you need to separately specify how you choose to normalize the data?


Hi Aditya, I think

  1. The standards should be same so it makes sense to use pretrained data metrics.
  2. The standardization is required when there is good amount of variation. So I would wish to make this optional. Also it’s not just imagenet model which we are going to use as pretrained model all the time. so baking it into makes this function less flexible.
    Hope this helps.
    I would request others also to comment.
1 Like

If you use a pretrained model, you should use the normalization that was used to pretrain it, yes. If we where to train a model from scratch, we would compute the means/stds for this specific dataset and use that.

And as you pointed out, this is why it isn’t baked in because you might want different normalizations depending on your situation. Some pretrained models use a normalization that just scale the coordinates from -1 to 1 (inception models IIRC) even on imagenet.


I’ve updated the wiki to share the notebook (see on top of the post). Basically I’ve just reduced the batch size to run out of memory issues.

Check which version of fastai you have installed because ‘error-rate’ arrived in 1.0.7 https://github.com/fastai/fastai/blob/master/CHANGES.md

Please refer to the FAQ thread. Your question is covered there.

I am having a doubt regarding lesson 1 jupyter notebook. If fastai’s models.resnet34 is pretrained on imagenet dataset then it will predict 1000 classes i.e. probability of 1000 classes for the given input image but in the notebook we are having 37 classes. My question is aren’t we going to modify the last fc-layer of resnet34 to change the model’s output to 37 classes instead of 1000 classes?

Fast.ai automatically removes final layers from the original architecture and replaces it with something suitable for your problem. Later on in the course we will also learn how to fit in our own custom layers at the end, something called a ‘custom head’.


You need to pass the exact file name along with the path and folder to untar the data like below.

> path = untar_data(URLs.PETS,fname='/content/drive/My Drive/FastAIPart1V3/data/oxford-iiit-pet.tgz',dest=/content/drive/My Drive/FastAIPart1V3/data/oxford-iiit-pet')
> path

Note that adding .tgz is important. Also make sure that the folder to where the data needs to be untarred/unzipped needs to created(am being careful)

1 Like

How to know which version we are working on right now?? Any shortcut command? Thanks

The ConvLearner does it automatically for you based on the number of classes of your data. If you call learn.model you will see that the last FC layer with 37 activations.


Ok Thanks @dreambeats and @joshfp

1 Like

Wanted to share this. Made a utility program to notify remotely via email with model params and graphs, after training is over.

import fastai
1 Like

look at the data download part in this blog post

Added where Jeremy takes over in the video :slight_smile: thought it would help skipping long waits in the beginning

@jeremy In your notebook output, resnet34 learn.fit_one_cycle(4) took around 2 minutes in the main video notebook (using sagemaker)
While I can see, it took for you around 1 minute in the github lesson 1 notebook.

Can you please mention what was the specs for the video lesson nb training and what was for the github nb?

This could be useful for all of us as a baseline to check whether our local setup is working well.

For me I am getting much lower speed (12 min) for gtx1080ti with an old cpu (Xeon® W3503) which is running 100%, and from time to time the gpu utilization pops up from 2% to 70-90%

The 1st thing I should do is to get rid of the gpu risers that is lowering the pcie2 x16 into pcie2 x1 which is the only way to install multiple gpu for me.