Lesson 1&2 doubts

Nishanth · June 30, 2019, 10:21am

Hey guys,

I have completed first two lessons and I’m having some doubts in those.Please help and Thanks in advance.

Everytime I run the code from start I’m getting different answers.(Note :I’m using Co-lab and I’m resetting the runtime everytime i start)
Why we are doing 4 epoch while freezing and 2 epochs after unfreezing it??Why don’t we just unfreeze the model in the start and do some 3-4 epochs.
How many images for each class do I need in general??
When I was running the given Lesson-2 notebook the Error rate was constant for 2,3,4 epochs and I dont understand why??
I was trying to replicate the Lesson 2 with exactly same type of dataset but of flowers with 5 Classes and about 700-1000 images in each class.
(I downloaded the dataset from Kaggle.)

I got about 92% accuracy and my Confusion matrix showed its having problem identifying between Tulips and Roses(Both look similar to each other)

How do I solve this problem? and Whats the Max accuracy I can try to get? new

After completing first two lessons I tried to implement the same on similar data set ,what should I do next??
Should I practice with more datasets and dig deeper (or)Should I move to next lesson?

Thank you.

LaurentH · June 30, 2019, 11:08am

Everytime I run the code from start I’m getting different answers.(Note :I’m using Co-lab)

That sounds normal, could be because of shuffling of the data set for example, or the randomness in your transforms that you’re applying. You didn’t specify what exactly “different” means, but unless you’re getting wildly varying model performance, that sounds about normal.

and I’m resetting the runtime everytime i start

I don’t think that’s necessary, unless you’re constantly getting CUDA Out Of Memory errors?

Why we are doing 4 epoch while freezing and 2 epochs after unfreezing it??Why don’t we just unfreeze the model in the start and do some 3-4 epochs.

The idea is that when you’re using a pre-trained model, and you are using the pre-trained model for a similar task, we don’t need to modify those pre-trained weights a whole lot. Remember, using pre-trained models essentially means that all you did is modify the last layer. Experimentation has shown that it’s usually best to first only train the last layer, as it has a random initialisation, and leave the rest of the model intact. This helps with faster convergence. Once the new layer “has gotten used” to predicting these images, it can be trained together with the earlier layers. It’s quite an intuitive approach. If you’re still having trouble grasping it, look up a bit more on differential learning rates and transfer learning.

How many images for each class do I need in general??

Really depends on how easy it is to distinguish between the classes that you are trying to predict. If you are trying to distinguish between red cars and blue cars, you probably need very few (50 of each?), while if you’re trying to distinguish between identical twins, you’ll need far more. Also depends on if you’re using pre-trained models. All in all, if the task is relatively straight forward, I think there’s fastai students that built models with as little as 50-75 images per class (Jeremy covers this at the start of Lesson 2 I think). It’s hard to generalise here though.

When I was running the given Lesson-2 notebook the Error rate was constant for 2,3,4 epochs and I dont understand why??

That really depends on context, we’d need to see a bit more of your code to understand what you were doing. It’s possible the model just found a local minimum and couldn’t “learn” further (at the current learning rate).

its having problem identifying between Tulips and Roses. How do I solve this problem?

I don’t entirely agree with your assessment. Misclassifying 11 tulips while your average is around 3 misclassifications sounds still on the low side of things. I agree that its more confused than usual, but not to a dysfunctional level. Have you tried checking the “most confused images” to get a feeling for why it’s misclassifying? Jeremy shows how to do this in Lesson 1 or 2 as well. Before diving into “fixing” this misclassification problem, first try to understand why the machine is getting this wrong.
Generally speaking, I’d argue quite simply more data is likely to help, so it learns to distinguish the features better. Alternatively, have you tried Test Time Augmentation? That might be an option too.

and Whats the Max accuracy I can try to get?

Take a look at the other kernels to get a feeling for what the current best score is of others playing around. I think around 90% is a good start, but I’m sure you could squeeze out a bit more with some tricks.

After completing first two lessons I tried to implement the same on similar data set ,what should I do next??

I’ve read quite a few threads on “how to fastai” and also given my own experience, I would argue the most important is to continue watching the lessons. Most people watch the whole class 2-3 times. First time is more of a global pass, second time they start to experiment, 3rd time is to really flesh out some details. So my advice would be to continue the lessons, and just experiment a bit when you come across something you think is cool.

Nishanth · June 30, 2019, 2:36pm

Hey there!! Thanks for reaching out and giving out such detailed explanation for everything.
It really motivates me a lot!!

Still a few more questions based on answers.Hope you answer.Thanks

That sounds normal, could be because of shuffling of the data set for example, or the randomness in your transforms that you’re applying. You didn’t specify what exactly “different” means, but unless you’re getting wildly varying model performance, that sounds about normal.

The values are not changing much.Just change of 0.5 or so. I have no idea what you just said ‘randomness in your transforms’ ,Can you explain what that means?

I don’t think that’s necessary, unless you’re constantly getting CUDA Out Of Memory errors?

No,I’m using Google’s GPU .I was resetting as I thought it’ll effect the final answers.

The idea is that when you’re using a pre-trained model, and you are using the pre-trained model for a similar task, we don’t need to modify those pre-trained …

Ohh.Ok I understand it now.

Really depends on how easy it is to distinguish between the classes that you are trying to predict. If you are trying to distinguish between red cars and blue cars…

So I decide with ‘feel’ right.(What Jeremy used to say)

That really depends on context, we’d need to see a bit more of your code to understand what you were doing. It’s possible the model just found a local minimum and couldn’t “learn” further (at the current learning rate).

This is the given code for Lesson-2.I didn’t change anything. I was just running all the code cells.

I don’t entirely agree with your assessment. Misclassifying 11 tulips while your average is around 3 misclassifications sounds still on the low side of things…

How can I know where the machine is getting it wrong?

Test Time Augmentation : I haven’t come across it yet.What exactly is it?.I’ll definitely go through.

and regarding getting more data I cannot do it right if i’m participating in Kaggle competition,I have to do it with the existing data.

Take a look at the other kernels to get a feeling for what the current best score is of others playing around. I think around 90% is a good start, but I’m sure you could squeeze out a bit more with some tricks.

I’m new to Kaggle and Data science ,I didn’t knew that I can check others work also.Thanks for the info.

I’ve read quite a few threads on “how to fastai” and also given my own experience, I would argue the most important is to continue watching the lessons. Most people watch the whole class 2-3 times…

Yeah.Thanks for guiding.

LaurentH · June 30, 2019, 2:53pm

The values are not changing much.Just change of 0.5 or so.

You’ll have to define a bit more which “values” you mean. Do you mean loss or accuracy? If that’s the case, then that’s expected I think, to a degree anyways.

I have no idea what you just said ‘randomness in your transforms’ ,Can you explain what that means?

So when you’re creating a DataBunch out of your images, you’re passing it a transform() call. What this does, is essentially that before feeding the image to the model, it transforms it a bit: it flips it, it rotates it a bit, it changes the lighting a bit etc. It’s called data augmentation, and helps the model be more robust. At any rate, the transformers that are applied to your image are random: they randomly decide whether to rotate, whether to flip, whether to change lighting by how much etc. Because that’s random, it’ll affect what images the model is seeing, which changes the output, etc etc. That’s the randomness factor I meant. Does that help?

Don’t think that’s necessary, at least not in my experience.

Essentially yes. My favourite quote from Jeremy in that direction is

The answer to “should I do ‘blah’?” is often, “I don’t know, maybe? Run some experiments with ‘blah’ and let us know how it went!”

How can I know where the machine is getting it wrong?

In Lesson 1 Jeremy discusses interpreting the confusion matrix.

Test Time Augmentation : I haven’t come across it yet.What exactly is it?.I’ll definitely go through.

A very brief description: we take a single test image and augment it in 8 different ways. We feed the 8 different ways to the model, and take the class it predicts the most often. It’s a form of ensembling.

and regarding getting more data I cannot do it right if i’m participating in Kaggle competition,I have to do it with the existing data.

Indeed! You’ll have to find other ways to do data augmentation, apply tricks etc. I think that’s usually intuition you develop as you do more Kaggle competitions (and read the blogposts the winners usually publish!).

I’m new to Kaggle and Data science ,I didn’t knew that I can check others work also.Thanks for the info.

As I wrote above, very often, the top3 winners of a competition will do a write-up on how they won, what their approach was etc. They’re often accessible for anyone with a bit of DL background, make sure to read them! There’s always interesting hacks.

bhavesh.patil · June 30, 2019, 4:36pm

Hi , I am facing issue while downloading the images using javascript although i am able to download the images but they are not line separated, I mean although @jeremy s lecture all the urls for the images were line separated but in my case all are bunched and not even space separated, how to I solve this issue?

LaurentH · June 30, 2019, 4:39pm

Hey, it would be best if you could create a separate thread that details the issue, and provide some code/notebook on GitHub so we can understand how to reproduce the issue you are running into and what you are trying to do.

As a side-note, it’s best not to mention Jeremy directly unless you need him for something critical, it creates a lot of notification noise otherwise