Lesson 1 - Non-beginner discussion

If I have control of the room, I’ll try to. (Anyone that does can push record :wink: )

3 Likes

Ranger already does a gradual LR warm-up. So you should generally use Ranger+flat_cos or Adam+one_cycle.

5 Likes

@muellerzr Is it possible to record the questions asked as well? I’m watching through the video which is very helpful as is but I’m trying to interpret the question based on the answer you gave in the walkthrough! :sweat_smile:

1 Like

I wasn’t able to look it over, but it sounds like it didn’t pick up everyone else I take it. In the future I’ll use the built in zoom recording so that should be better. Apologies!

I’ve been working on reconstructing Hyperspectral Imagery from an RGB input using the NoGAN approach and for me switching to Ranger+OneCycle slashed the MRAE (Mean Relative Absolute Error) metric by half (~0.11 to ~ 0.0575. But this is just “one” instance, I haven’t used it anywhere else at this point.

3 Likes

That’s really cool. And now that I think about it, most of the benefit I got using Ranger + fit_one_cycle comes from the RAdam part, and less so the LookAhead optimizer. So I might try running only RAdam + fit_one_cycle and see if I can get a speedup!

Currently I’m running some FastGarden tests and it looks like Ranger + fit_flat_cos blows the one-cycle learning policy out of the water (by ~5-8%).

1 Like

Maybe a dumb question but, Currently both keras, pytorch and fastai reduces the learning rate after it hit the patience limit on a plateau right? Why not rollback into the last best state, reduce the lr and continue instead of just reducing the lr and continuing further?

2 Likes

Not quite. If you look at how the source code for these functions work, you’re passing in a % to work off of. For instance with flat_cos, we reduce after 75% of the batches. This is also in fit_one_cycle:

def fit_one_cycle(self:Learner, n_epoch, lr_max=None, div=25., div_final=1e5, pct_start=0.25 <- HERE, wd=None,
                  moms=None, cbs=None, reset_opt=False):
def fit_flat_cos(self:Learner, n_epoch, lr=None, div_final=1e5, pct_start=0.75 <- HERE, wd=None,
                 cbs=None, reset_opt=False):
3 Likes

Ok I got that, but I was talking about the ReduceLROnPlateau callback. :sweat_smile:

2 Likes

Ah shoot my bad :slight_smile: In that case I don’t know but I am anticipating the answer too :slight_smile:

1 Like

Why not rollback into the last best state, reduce the lr and continue

I am by no means an expert on any of this, but I can take a guess. Sometimes, the extra training beyond the “plateau” is useful for deep models, even though this training would go into the “overfitting” regime by classical machine learning standards. Empirically, there is a second descent phase of training in deep learning:

This is Figure 2 from the paper about “Deep Double Descent” (arXiv: 1912.02292), which does a great job of explaining these concepts (and much more) in detail. Relevant to our discussion is this result: training beyond the plateau causes the validation/test error to rise, and then miraculously fall again. In other words, extraneous training can undo overfitting! (The same can be accomplished by using larger models.)

Now is this why deep learning practictioners (and ReduceLROnPlateau) doesn’t roll back to an earlier epoch? Probably not, I’m guessing that it was written before fastai made the idea of callbacks as popular as it is now. But perhaps it has created unintentional benefits!

EDIT – accidentally included the wrong figure.

3 Likes

a less advanced question ;).

reading the notebooks I was wondering if we still need to normalize the images (Normalize.from_stats(*imagenet_stats) in v2?! Couldn’t find any information regarding normalization.

Florian

Yes you do, normalization should always be done on your data! However fastai2 has made it a bit easier with the pre-built functions (like cnn_learner, unet_learner). If you use them and say accidentally forget to tag on normalize() it’ll normalized based on your models data

This video is so much clearer than mine, can you please share your OBS settings with me? :slight_smile:

1 Like

Of course! Also for anyone else interested (these are the settings I use for Walk with fastai2 as well!):

  • Base (Canvas) Resolution: 1920x1080
  • Output (Scaled) Resolution: 1920x1080
  • Downscale Filter: Bicubic (Sharpened scaling, 16 samples)
  • Common FPS Values: 30
2 Likes

Thank you! :slight_smile:

This is a great question, I am wondering “why” too :slight_smile:

I’ll run some experiments and check emperically

2 Likes

While this isn’t L1 related directly per say, I’m going to be discussing some multi-label ideas that can be applicable to lesson 1, such as what to do to tell your model I don’t know in image classification. This will be done on the zoom chat in ~10 minutes or so

I wanted to post here as there’s not quite a forum for such an idea that would work well I don’t think

1 Like

I am coming! See you there :slight_smile:

1 Like

Just to add to what @muellerzr said:
"
You need to pass to this transform the mean and standard deviation that you want to use; fastai comes with the standard ImageNet mean and standard deviation already defined. (If you do not pass any statistics to the Normalize transform, fastai will automatically calculate them from a single batch of your data.) "
From the book: https://github.com/fastai/fastbook/blob/master/07_sizing_and_tta.ipynb (Just under Normalization).
I have a question about this part:
"(If you do not pass any statistics to the Normalize transform, fastai will automatically calculate them from a single batch of your data.) "
Why are we doing it on a single batch and not the whole data?
is this because we are doing it on the fly?
Is in’t the imagenet stats calculated on the whole data? or is that on a single batch too?

2 Likes