I see bunch of questions re: normalization of data, all of that totally makes sense, thanks for the insight! I have two quick related questions though:
Why is the data normalized with the mean/std of the image_net data as opposed to the new cat/dog breed image data? Is this typical of transfer learning - you should always normalize with the starting/pretrained model’s input data, as opposed to the new data you are receiving?
Is there a reason this isn’t baked into the ImageDataBunch initialization, or is it separated because there are times where you perhaps wouldn’t want to normalize? Or maybe this is related to the first question…since you aren’t normalizing with the new data bunch you are feeding in, you need to separately specify how you choose to normalize the data?
The standards should be same so it makes sense to use pretrained data metrics.
The standardization is required when there is good amount of variation. So I would wish to make this optional. Also it’s not just imagenet model which we are going to use as pretrained model all the time. so baking it into makes this function less flexible.
Hope this helps.
I would request others also to comment.
If you use a pretrained model, you should use the normalization that was used to pretrain it, yes. If we where to train a model from scratch, we would compute the means/stds for this specific dataset and use that.
And as you pointed out, this is why it isn’t baked in because you might want different normalizations depending on your situation. Some pretrained models use a normalization that just scale the coordinates from -1 to 1 (inception models IIRC) even on imagenet.
I am having a doubt regarding lesson 1 jupyter notebook. If fastai’s models.resnet34 is pretrained on imagenet dataset then it will predict 1000 classes i.e. probability of 1000 classes for the given input image but in the notebook we are having 37 classes. My question is aren’t we going to modify the last fc-layer of resnet34 to change the model’s output to 37 classes instead of 1000 classes?
Fast.ai automatically removes final layers from the original architecture and replaces it with something suitable for your problem. Later on in the course we will also learn how to fit in our own custom layers at the end, something called a ‘custom head’.
The ConvLearner does it automatically for you based on the number of classes of your data. If you call learn.model you will see that the last FC layer with 37 activations.
@jeremy In your notebook output, resnet34 learn.fit_one_cycle(4) took around 2 minutes in the main video notebook (using sagemaker)
While I can see, it took for you around 1 minute in the github lesson 1 notebook.
Can you please mention what was the specs for the video lesson nb training and what was for the github nb?
This could be useful for all of us as a baseline to check whether our local setup is working well.
For me I am getting much lower speed (12 min) for gtx1080ti with an old cpu (Xeon® W3503) which is running 100%, and from time to time the gpu utilization pops up from 2% to 70-90%
The 1st thing I should do is to get rid of the gpu risers that is lowering the pcie2 x16 into pcie2 x1 which is the only way to install multiple gpu for me.