Lesson 2 In-Class Discussion

Thanks @jeremy.
For point 3, if we get curve like shows in original question, do we ignore it or is there a way gain more insight about it?

  1. The curve isn’t smooth so if we were to find optimal value of lr, do we consider sharp increase and decrease?
  2. The values of lr itself are too small

Some variation is expected, although 99.1% is lower than I’ve seen in my testing. Might be worth checking you’re going through all the correct steps.

Yup! 99.65 and 99.1 has significant difference. I’ll rerun everything in default notebook and check the results.

The above result is from notebook I wrote after going through code so I may have skipped overlapping steps.

@jeremy thank you.
That clarifies alot. So, finding lr through code won’t always give a number which we can trust blindly?

Probably our past experience in this field, knowledge about dataset and trying variations is something we still go with?

I’d say so, for now - at least for later in the process. For the initial run, however, I think my rule of thumb should work fairly reliably.

1 Like

I cannot download using kaggle-cli as the value I am trying for competition name “Dog Breed Identification” is not recognized as the competition name. Does anyone know the name that is used for downloading?

1 Like

I think the best way to check the competition name is to go on Kaggle’s website to the competition and use what is in the url. Like this one https://www.kaggle.com/c/dog-breed-identification so probably try using dog-breed-identification

1 Like

Perfect. Thanks - it was obvious perhaps but I am a newbie at this!

1 Like

Question from ‘resnext50’ notebook.

Can anyone tell if Jeremy discussed ‘num_workers’ and ‘ps’ parameters in lecture? If not pls. share your knowledge or should I wait for futurlectures?
image

Cropping strategy

I found an interesting description of a cropping strategy in the following (now rather old) paper which introduced InceptionNet: https://arxiv.org/pdf/1409.4842.pdf

“During testing,we adopted a more aggressive cropping approach than that of Krizhevskyet al… Specifically, we resize the image to 4 scales where the shorter dimension (height or width) is 256, 288, 320 and 352 respectively, take the left, center and right square of these resized images (in the case of portrait images, we take the top, center and bottom squares). For each square, we then take the 4 corners and the center 224×224 crop as well as the square resized to 224×224, and their mirrored versions. This results in 4×3×6×2 = 144 crops per image. A similar approach was used by Andrew Howard in the previous year’s entry, which we empirically verified to perform slightly worse than the proposed scheme. We note that such aggressive cropping may not be necessary in real applications, as the benefit of more crops becomes marginal after a reasonable number of crops are present (as we will show later on).”

This strategy is in relation to the original training of the InceptionNet, so not really the same as our post-hoc augmentation process. But I wonder if @jeremy or anyone else experienced in this cares to comment on this approach of creating a great many crops - can it be applied to the augmentation approach?

Inception-ResNet-v2, Inception-v4, or ResNet101 64?

Has anyone got any advice about whether the Inception ResNet may work better than the ResNet101 64?

1 Like

Why only 2 cycles after setting ps=0.5 ?

Can anyone clarify something I saw in Jeremy’s notebook for the Dog Breeds compeition: After using the ps parameter to 0.5 he only did 2 cycles - was this just to save time while evaluating the change, or is something else going on - I remember him saying that too many cycles lead to overfitting, but is that what is being done here?

learn = ConvLearner.pretrained(arch, data, precompute=True, ps=0.5)
learn.fit(1e-2, 2)

ps and the use of 2 cycles vs more is something we’ll talk about later, but basically we’re trying to avoid overfitting, since this architecture is a lot bigger.

num_workers just says how many CPU cores to use for preprocessing - it’s not a big deal and doesn’t effect anything except speed.

2 Likes

This is basically what we do, except we do it dynamically with more randomness.

Thanks Jeremy, yes I understand that we are randomly cropping, but this strategy indicates rather severe cropping, including corners, and 144 crops per image. It was in relation to this extreme approach I was asking my question.

In this lesson 2 notebook, what’s the rationale for defining a size “sz” and inmmediately increasing it by a factor of 1.3?

1 Like

Ok. But can you please explain, why is this warning coming ? I tried to search it on google. Couldn’t find concrete explanation.

A (mini) batch vs a cycle / epoch and learning rate

Can anyone explain what @jeremy means around 38 minute mark of lesson 2 about changing the learning rate every mini batch? What is a mini batch, in our learn.fit() method we pass learning rate, number of epochs, cycle length - is a batch an amalgamation of these settings?

We have batch size as part of our model, so is a batch how many images we feed to the model at a time within an epoch? If so, is Jeremy saying that the learning rate changes after every bunch of images are passed in?

If learning rate changes based on the presence of a new batch, what influence does cycle length have on changing the learning rate? Jeremy there talks about resetting it.

So, changing vs resetting learning rate - what is the difference?

Iterations

What is an iteration?

Instead of going through all of the items one by one in one loop, we process items in small batches e.g. bs = 64 (default). This is the power of GPU. In one epoch, we process ‘X’ mini-batches. The total number of mini-batches is given by:

total_mini_batches = total_items/batch_size

So, as you continue to process these mini batches we continue to reduce ‘lr’. An iteration would then be going once through the loop, which I believe would be processing one ‘mini-batch’.

Note: This is my understanding.

I noticed that If I run lr finder in’differential learning rate’ step this happens.