Lesson 1 In-Class Discussion ✅

A small (and probably harmless) error that I found, but I thought I should just point it out anyway. It should be Ctrl+Shift+J to summon the JS console instead of Ctrl+Shift+I.

2 Likes

No there’s no impact on performance of using smaller numbers.

4 Likes

As you know, you need to calculate mean and standard deviations to normalize your dataset(training set). As we are using transfer learning(pre-trained network) we need to use those exact number, because those exact numbers were used to train that network.

2 Likes

Yes, we are using ResNets trained on ImageNet, so in effect we take as a starting point that our images are like ImageNet images and have to treat them in the same way by normalizing with the same summary statistics and then training. Then we fine-tune our model by unfreezing and training some more because our images are in some way different to ImageNet images. The idea is that the early layers are probably fine but the later layers are likely to need some adjustment.

We don’t have a huge amount of data (just use a sample of the MNIST data); not enough to train a deep network from scratch but enough if we transfer what we know works on ImageNet and then train.

I have a few questions about freezing/unfreezing models and comparing results.

Is there a way to tell if a current ConvLearner is frozen or unfrozen? Only way I know is to actually call freeze() or unfreeze() before some operation that matters.

Also, in the lesson 1 notebook, it seems that the comparison between frozen and unfrozen models isn’t actually fair. For example:

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=48)
data.normalize(imagenet_stats)
learn = ConvLearner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(5)

learn.save('stage-1-50')
learn.unfreeze()
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4))

This part of the notebook uses the resnet50 model to train on the dog/cat breeds data set. After that training is done, the model is unfrozen and trained again with custom learning rate.

Then the comment “In this case it doesn’t, so let’s go back to our previous model.” is made, indicating that the result isn’t better… It does have a higher error rate, but it was trained with only 1 epoch and actually is derived from an already trained breed model.

Wouldn’t it be more accurate to do this instead to compare the unfrozen model?

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=48)
data.normalize(imagenet_stats)
learn = ConvLearner(data, models.resnet50, metrics=error_rate)
learn.unfreeze()
learn.fit_one_cycle(4, max_lr=slice(1e-6,1e-4))

Also, perhaps I’m missing something when comparing the two? Is it only the error rate that really matters? Or does careful consideration also need to be taken with training loss and validation loss?

1 Like

Thank you @Mirodil and @AlisonDavey.

What argument should I use in normalize if I wish to train ResNet on identifying images containing only digits:

  • Would it be imagenet_stats because we are using ResNet training on ImageNet? OR
  • Would it be mnist_stats because our training data is very similar to MNIST dataset?

Basically, I am trying to understand if the argument is dependent on our training data or the pre-trained model or both?

@akschougule I believe you can calculate normalize by yourself if you are not using pre-trained network weights.That is how it calculated: for each feature dimension, compute the mean of the feature and subtract it from the dataset, storing the mean value. Next, compute the standard deviation of each feature and divide each feature by it’s standard deviation, storing the standard deviation. And those mean and std will be used for validation set and test set.

When you are using pre-trained network weights you need to use that means and stds which were used to train that network.

As you know, the ResNet is a type of neural network architecture. It can be initialized randomly or with pre-trained weights.

2 Likes

If you call normalize() without any argument it does all that for you :slight_smile:

6 Likes

No, you need to fine-tune the last layers first (i.e. without unfreezing) to get good results. We’ll be learning all about this in the coming lessons. And you can’t use more epochs in the last stage, otherwise it overfits.

1 Like

Yes, @insoluble. I once had both CPU and GPU version on same machine but in different envs. So If you use different environments you can have it.

@jeremy what is the limit of epochs we should use. I was trying on 120 dog’s breed and without unfreezing I got result of 15% and more error rate and I don’t understand why the same accuracy/error rate cannot be guaranteed for every execution. Am I missing something.

Very interested! Please add me to any group

2 Likes

@jpramos Were u able to figure this out?

Looking forward to the discussions on this later in the course, but if you’re interested parts of this article might be of interest https://arxiv.org/pdf/1608.08614

5 Likes

My best guess is that different mini batches were used when executing lr_find. Nevertheless, the first plot is still quite troubling to me. I guess what it says is that it already is close to the best weights.

Would be interesting to maybe turn this into a computer vision problem - for example chord recognition from a live video recording!

1 Like

Hello,
I had few hours to study and set up a working google cloud fastai v1 environment running the first lesson notebook.

My idea was to use this “learning-time” to try to work on trying to solve problems impacting people’ lives.

So I looked around for a dataset and I come across this one (https://rdm.inesctec.pt/dataset/nis-2017-003/resource/df04ea95-36a7-49a8-9b70-605798460c35) after reading the paper behind it (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177544) and, doing that, I come into several questions also considering some learnings from other inspiring class-mates, maybe we can get a better understanding of how things work.

A - Can we operate using a different approach rather than transfer learning?
I mean, is there any benefit on defining a set of custom layers on our own, as we can do with Keras/Tensorflow? I see this interesting, for instance, to replicate network architectures as described by several papers or to explore different problems (the like of classifying galaxies, x-rays, etc…)

B - How can we define and use custom normalisation criteria rather than the ones used on ImageNet?
Is there any best-practice and/or scientific advice on how and when to use such a custom approach?
For instance, I found a different set of criteria in this paper, and I got curious about it.

C - Where the “resnet” weights file is locally saved?
Can I define in which folder I would like to save such “weights”?
From reading other forums posts it seems exists a default “~/.fastai/data/oxford-iiit-pet/images/models/” folder (https://forums.fast.ai/t/lesson-1-chat/27332/636)

D - Can we save our “trained” net weights and re-use it in others projects?
Where is this file saved instead?

E - Which are all the available “models” inside Fastai other than ResNet?
Es: model.ResNet34 and ResNet50 are already available. Which other pre-trained models are already available?

F - How we define the local file system path where we are going to store data out of the DataBatch object?
Is the “project folder” or the “fast.ai default” one?

G - How can we use AWS S3 (or Google equivalent) to store files/etc (e.g. datasets to be used for training) rather than the local file system? Does it eventually make sense?

H - Can we store (and later retrieve) our “trained” model on a AWS S3 file?
If so, how can we do that?

I - Can we define a custom set of data to be used as train/validation/test or fast.ai doesn’t allow that?

J - After we are ok with our model, how can we use it in real life?
I saw someone using the “eval” method but, I didn’t understand if we can make the outcome available in real-time (using “stored/saved” weights) or if we need to run our model once again.

Thanks!

3 Likes

Hi
I’ve few questions. I’d be very grateful if someone helps.

Is cuda92 strict requirement for successfully running first lesson ?

I’ve installed both fastai and pytorch v1. Everything was going right but then I found out that it’s running only on cpu when training started.
I think it is because my Nvidia driver is 384.81 but requirement for cuda92 is 396.26.

Last question :
Do I need to install new driver(>=396.26) and then install cuda92 to setup everything ?
Can I change Nvidia driver for one conda environment without disturbing base system ?

@insoluble You dont need to install cuda92. Pytorch comes with cuda drivers. You just need to uninstall your nvidia drivers and install drivers > 396.xx.

This post discussed the same issue. https://forums.fast.ai/t/setting-up-gpu-for-fastai-v3/27678/5?u=magnieet

Not sure you can install Nvidia drivers for particular environment. But you can have different cuda version for different environment.

1 Like

Is there a way to get the filenames for the images returned via interp.top_losses(9)?

That method returns the indexes of the top 9 worse predictions, but I don’t see how I can tie them back to the actual filenames in my file system. I’d love to be able to do this as that in my dataset I’m finding that many of the top losses are actually labeled incorrectly and I’d love to move them to the correct folders.

1 Like