Lesson 3 In-Class Discussion ✅

sgugger · November 9, 2018, 4:02am

No, the learning rate after unfreezing is very often different.

cedric · November 9, 2018, 4:02am

The idea of progressive resizing. One example: Progressive Growing of GANs for Improved Quality, Stability, and Variation | Research

https://www.fast.ai/2018/04/30/dawnbench-fastai/

Instead, we turned to a method we’d developed at fast.ai, and teach in lessons 1 & 2 of our deep learning course: progressive resizing. Variations of this technique have shown up in the academic literature before (Progressive Growing of GANs and Enhanced Deep Residual Networks) but have never to our knowledge been applied to image classification.

legsidestrangle · November 9, 2018, 4:04am

I am trying to update fastai but I get permission denied. Anything I need to add specifically? I tried conda and pip and both had permission issues.

ramanan · November 9, 2018, 4:04am

I’ve been reading Leslie Smith’s paper and he provides some guidance on batch size with one cycle training. Basically I think go as large as you can, up to that point that you get diminishing returns.

cjwinslow · November 9, 2018, 4:04am

3e-3 === 0.003
1e-3 === 0.001

avatar · November 9, 2018, 4:05am

isn’t dice score a more relevant metric for segmentation problems?

krash · November 9, 2018, 4:05am

What about loss function for multi label classification? Does the same loss function ( cross entropy) work for multi label classification as well?
In Kerala , I used to use binary cross entropy+sigmoid for last layer, it’s not clear how fastai takes care of this

keyurparalkar · November 9, 2018, 4:05am

How can we ignore specific pixel value in an image while training ? i.e. igonire pixel value = 255

ricknta · November 9, 2018, 4:05am

Hey @rachel I finally have 8 votes on this

sgugger · November 9, 2018, 4:05am

This is the metric that was used in the paper introducing camvid, that’s why Jeremy is using it.

ertan · November 9, 2018, 4:05am

It almost feels like some form of data augmentation

sequoia_kings · November 9, 2018, 4:06am

Given it usually takes a minute or so to train a whole epoch, how is lr_find so fast when looking at lots of different learning rates? Does it run just a few iterations for each learning rate? I looked at the documentation but still don’t quite understand how it works.

sgugger · November 9, 2018, 4:06am

It does it for you. If you want to know more, you should ask on the advanced forum for now, this will be covered later in the course.

GiantSquid · November 9, 2018, 4:06am

Any recommendations for making sense of cutting edge academic papers? I often see an interesting-looking paper on something I’m generally familiar with, but the jargon in academic papers can be overwhelming.

krash · November 9, 2018, 4:07am

As per my understanding, it plots loss for different learning rates for different mini batches. That’s why it doesn’t take long

sgugger · November 9, 2018, 4:07am

It does 100 iterations from 1e-5 to 10, growing the learning rate exponentially.

rachel · November 9, 2018, 4:07am

Progressive resizing is also talked about in the fast.ai dawnbench blog posts here and here.

wdhorton · November 9, 2018, 4:08am

You’d want to use BCEWithLogitsLoss in Pytorch, it’s binary cross entropy+sigmoid combined

PegasusWithoutWinds · November 9, 2018, 4:08am

Welcome to Microsoft’s Github.

rachel · November 9, 2018, 4:08am

Some tips around reading (and implementing) new papers are covered in part 2 of the course.