Lesson 2 discussion - beginner

Not really – I understand that process: We take the average prediction of the image + 4 transformations. I don’t understand why we need to do so. However TTA seems like just testing to me – Instead of testing 1 image, now we’ve tested 5 images. We’re calling TTA to see can it detect transformations. We’re testing how good the model is. Not making it better, but giving it different augmentations to see how good it is. I just don’t see how testing 5 images vs 1 image has any effect if the model has already been trained (after all augmentation is done during training).

Analogy: teach a baby to recognize a chair. Show baby 3 different chairs. For each, show them from the front, and from the back. That’s training augmentation.

Now ask the baby to recognize a chair they haven’t seen before (that’s testing). Regular version: just show them the front of that one chair. TTA version: show them both the front and the back of that chair.

You would expect them to be better at recognizing it, if you show them both the front and the back.

11 Likes

Ah! Thank you. So to potentially get better classification results, I can take an image along with its various augmentations and the model may have a better chance to classify the image.

I’m thinking in terms of a Web App where a user uploads an image to be classified (dog breed for example). The image once uploaded will be transformed in various degrees (side ways for ex) and the model seeing the same image in different augmentations might be able to classify it better.

I hope this is correct.

5 Likes

Exactly correct :slight_smile:

2 Likes

The validation loss starts increasing …What should i do now? @jeremy

That means you’re overfitting! So try the techniques we learned in class:

  • Data augmentation
  • Dropout
  • SGDR
  • Differential learning rates

I’ve searched for the Dog Breed notebook but I’m not able to find it at all. Even tried git pull but it seems my repo wasn’t updated with anything new. Can anyone provide me a way to obtain the notebook ?

I’m pretty sure you have to create it yourself. It’s a homework problem. :slight_smile: I started from the lesson 2 workbook and modified it to use the dogbreeds data.

1 Like

Hi. I was curious about learning rates for the dog breeds problem. I ran the learning rate finder and it looks like the steepest point in the curve is around 1e-1. My understanding is that we should pick a learning rate somewhere between the steepest point and the minimum. However, when I use 1e-2 (what Jeremy used) I get a result close to Jeremy’s. When I use 1e-1 it is slightly worse. Anyone have an explanation for this?

There’s generally not much point training more than a couple of epochs with precompute=True. Try unfreezing and setting precompute=False and see how that impacts your learning rate experiments.

Hello :slight_smile: Can you provide an explanation for the choice of the learning rate of 0.01 in the dog breed notebook? When I run the learning rate finder as suggested, it seems like a choice of 0.1 or 0.2 would be a good choice - so why the learning rate of 0.01?

Good learning rate is not a deterministic number. If you have a smaller learning rate, it will converge slowly…w = w - learning_rate * gradient. So smaller learning rate just means update in smaller increments. Jeremy has provided a good intuition for Learning rate via the finder and a rule of thumb (IMO) to help decide on learning rate. Choose the rate just before the steep fall in the Cost Function.

In most cases, I just choose a rate of 0.01 and it’s usually OK. This is not a big dataset, so it doesn’t take too long to run a Epoch. So experiment with a few different ranges like 0.001, 0.01, 0.1 and see what works for you and compare that with the results of the learning rate finder.

You will find that cyclical learning rate is much more powerful in jumping out of local minimas.

1 Like

Hello everyone!

After running through the Dog Breeds example, I have two questions:

  1. I tried a batch size of 64. It seems the model trained with a slower , compared to the chosen 58. Could I know if there is a way to estimate the right batch size by looking at the train/val error?
  2. I used get_data once, assigned ‘learn’ model and trained a few epochs. If I keep training by feeding more epochs subsequently, will it take more memory from my GPU?
  3. After increasing the image size to 299, I attempted to unfreeze my network and continue to fit, but immediately Out-of-memory occur?

Thanks for clarifying!

Thanks for the lesson! I went through the dog breeds competition exercise and was able to successfully submit a score on kaggle!

A gotcha that I had as a Windows user is that my csv submission was saved with line endings CRLF instead of LF which resulted in an error when uploading to kaggle:

ERROR: Unable to find 10357 require key values in the 'id' column
ERROR: Unable to find the required key value '000621fb3cbb32d8935728e48679680e' in the 'id' column
...

I followed Jeremy’s steps with some minor deviations (used a learning rate of 1e-1, didn’t use as many iterations) and got a score of 0.23567.

I noticed that used an additional method parameter ps=0.5 when creating the initial learner. What was the reason for this? (perhaps I missed it during the lecture video).

I also noticed that my training loss was a lot lower than my validation loss:

dogbreedsLastEpoch

What would be the suggested approach to dealing with this?

Thanks!

For those interested in learning about learning the code underneath the hood of fastai.

I recommend starting with the pytorch 60 minute blitz then head over to Lesson_1 on pytorch.

Hey guys and @jeremy , I still struggle with the rationale behind starting to train and apply the model with precompute = True, then after that turn off precompute and train the net again with augs. After that process unfreeze the shallow layers and train it for another round.
My question is, why are we having a detour through precompute = True? Why not starting from scratch with creating augs, feeding it in a unfrozen net with precompute = False right away, when we have to do that either way? Is it only for saving time? Thanks for helping!

1 Like

Thanks for the explanation. Is it necessary to have precompute=True while saving weights ?

Precompute = True is a way to Fastrack computation. Once model is trained, either with or without it, you can save weights. Hope that helps!

1 Like

Thanks for your reply. I seem to be having a problem. Can you help me with it ? I’ve posted in on the forums.

What does the f' actually do in label_csv = f'{PATH}train_v2.csv'? I realize that it has to do with files and the {PATH} expansion, but what does it do exactly? Are there differences in python versions / platforms?