Lesson 1 In-Class Discussion ✅

adpostma · October 25, 2018, 8:45pm

Nice observation! This has to do with so called dropout layers in the model. In these layers information is “thrown away” to avoid overfitting. The higher training loss means that the model is better in identifying correctly on the validationset (without dropout) than it is on the trainingsset (with dropout). This will all be explained later in the course.

akschougule · October 25, 2018, 11:22pm

Question from lesson-1 (codeblock 9 in the original notebook):
data.normalize(imagenet_stats)

imagenet_stats has following values: ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
And it is defined in fastai/vision/data.py. In fact there are bunch of these: mnist_stats, cifar_stats. These numbers seem specific to an image datasets. Anybody knows how this works?

followben · October 25, 2018, 11:23pm

Thanks Jeremy. I’ve updated the notebook and 20 epochs is definitely better, getting me to 74% accuracy with the smaller model.

I tried a higher learning rate but that didn’t seem to help with either. Am I applying it correctly? I’ll read up on this some more as I don’t exactly understand what it’s doing under the hood.

Would love to see a deeper dive on this dataset in class - it’s available at https://storage.googleapis.com/bstovold-fastai/galaxies.tgz

suvash · October 25, 2018, 11:28pm

that you’re still underfitting. So, you can keep trying to push those down further. In general, as long as your training loss is higher (or equal) to validation loss, while you keep reducing the error rate, it’s going all good.

followben · October 25, 2018, 11:50pm

While most images will be public domain, if “published” as a dataset (whatever that means), I’d imagine they should definitely be cleared and/ or attributed. Within the context of this course and forum however, I’d be surprised if it’s a problem.

jeremy · October 25, 2018, 11:58pm

This will all be made clear! You’re now overfitting. Try 10 epochs, then unfreeze, then 4 epochs.

keyurparalkar · October 26, 2018, 2:34am

I am currently working on a binary classification dataset. I used ImageDataBunch.from_folder() function to load all the images. I have created this as my custom dataset which comprise 100 images per class. The problem is whenever I try to run learn.recorder.plot() function it gives me a blank graph even though the learn.lr_find() doesn’t generate any error ?

matwong · October 26, 2018, 3:06am

Where is learn.save() store the data? I can’t really located in my GCP instance. I found the resnet model in .torch folder, but I don’t think the trained weight is there.

niasafaa · October 26, 2018, 3:17am

Definitely very interested. Let me know if you all form a group.

jeremy · October 26, 2018, 3:53am

In your data directory, which by default is ~/.fastai/data/(folder)

followben · October 26, 2018, 4:36am

I believe you 10 epochs, then unfreeze, then 4 epochs still has me at 74% accuracy on both models.

jeremy · October 26, 2018, 4:37am

Ah still overfitting! Try just 5 epochs in the first stage.

tsail · October 26, 2018, 4:55am

Hoping someone can help me understand what an epoch is.

Quick google search on what epoch is leading me here https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks

An epoch describes the number of times the algorithm sees the entire data set. So, each time the algorithm has seen all samples in the dataset, an epoch has completed.

So my question is, why would we run the same data through a neural net “x” times? Shouldn’t one pass through of the data be enough to obtain the weights?

navjots · October 26, 2018, 5:02am

I tried searching, but couldn’t find if this question was already answered somewhere (it was asked, but didn’t get enough likes):

What was the thought/insight behind changing the image size to 299 when using resnet50 (instead of the 224 that was used with resnet34)?

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=48)

jeremy · October 26, 2018, 5:07am

Bigger images generally get better accuracy, since they have more detail. Combining with a bigger model is often a good idea.

navjots · October 26, 2018, 5:16am

Got it, thanks @jeremy .

Follow up Q:
You gave some intuition on how we arrived at the 224 number (final layer being 7x7 - and we wanting 7 times to a power of 2)
Was there a specific thought behind 299? (not divisible by a power of 2 in this case)

joshfp · October 26, 2018, 5:21am

I guess your dataset is quite small for your current batch size, so lr_finder don’t go through enough iterations. On top of that, learn.recorder.plot, skip some iterations when plotting. You should consider:

Using learn.recorder.plot(skip_start=0, skip_end=0) in order to avoid spiking iterations when plotting.
Reducing your batch size, in order to increase the number of batches.

Hope it helps.

jeremy · October 26, 2018, 5:24am

For some reason some imagenet networks were pretrained at 299 - and using the same size as the original can help a little. I can’t recall if rn50 was one of those; I guess it’s kinda a habit for me to try it.

raghavab1992 · October 26, 2018, 5:26am

@MagnIeeT Check this this out! the use case we were discussing a few months back

navjots · October 26, 2018, 5:28am

ok