Wiki: Lesson 2

nminhptnk · February 27, 2018, 11:50am

If you are trying a new dataset like me, please check if your training dataset is too small. In that case you need to set a lower batch size e.g. bs = 2.

GregFet · February 27, 2018, 2:52pm

Just ignored. Works fine.

alessa · February 27, 2018, 4:08pm

Python accepts multiple assignment statement for ex

x1,y1 = 2,3 # point one
x2,y2 = 6,8 # point two
m,b = float(y1-y2)/(x1-x2), y1-float(y1-y2)/(x1-x2)*x1

if you type ??data.resize() or data.resize() and shift+tab you will see that the second argument is new_path so I guess tmp is the folder where the modified image will be stored.

If you type again ??accuracy() you will find out the method code. Accuracy is usually comparing the predicted_labels with the ground_truth labels and is saying how many of them were correct. So an accuracy of 90% is interpreted as 90% of the data was correct classified

jk23541 · February 27, 2018, 9:57pm

Thanks so much!

balnazzar · March 1, 2018, 12:27pm

That’s very strange. Can you provide your nb?

balnazzar · March 1, 2018, 7:47pm

Ok, but cuda out of memory errors should be unrelated to poor accuracy…

sidereal · March 4, 2018, 2:01pm

fastai-part1v2-p2 AMI is not available in community AMIs or Amazon Marketplac. What to do?

reshama · March 4, 2018, 4:35pm

I see it in the N. Virginia region. Did you try another region from your default?

rush86999 · March 5, 2018, 11:52pm

did anyone get this error:
No such file or directory: 'data/planet/train-jpg/train_0.jpg'

this is in lesson 2 notebook. I m using Paperspace. Do I have to download this from somewhere?

Kornel · March 7, 2018, 3:26pm

Im a bit confused, how using TTA can improve accuracy?

In validation step we are not modifying any weights.
For each image TTA is returning mean prediction from original and augmented images.
Mean of values can be better or worse then single value.
So mean of predictions can return better or worse results as well.
We can get lower or higher accuracy rate, depends on images set and applied transforms?
And even if we get better results, it acctually can be missleading, because we can think that our model get better.

Where did I make a mistake?

emilmelnikov · March 7, 2018, 5:47pm

I guess it’s because of the softmax/sigmoid activations: even one good crop can give you ≈1 final activation, and, given that the original image consists mostly from the target class pictorial representation, it’s good enough.

pspenano · March 8, 2018, 6:25am

Hello,

I’m working through the 8-step process to train a world class model described in this lesson. As a reference, the steps are

Enable data augmentation, and precompute=True
Use lr_find() to find highest learning rate where loss is still clearly improving
Train last layer from precomputed activations for 1-2 epochs
Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
Unfreeze all layers
Set earlier layers to 3x-10x lower learning rate than next higher layer
Use lr_find() again
Train full network with cycle_mult=2 until over-fitting

My question is, why is it useful to train the precomputed activations in step number 3 first before moving to step 4? What is the intuition for this step?

I know Jeremy deletes this step in the shorter version of this 8-step approach but I couldn’t find the explanation for why we do this step in the first place.

Many thanks!

emilmelnikov · March 8, 2018, 7:41am

A forward pass with precomputed activations (step 3) is much faster than a forward pass without them (step 4). In the step 3, it’s just 1 vector-matrix multiplication and 1 softmax: logistic regression. In the step 4, it’s 10–50 convolutions, and it gets even worse with deeper nets.

So, the idea here is to get out of the random weights in the last layer as quickly as possible (step 3), and then improve it with “new” data (step 4).

RogerS49 · March 8, 2018, 5:40pm

Some thoughts on Lesson 2 dog Breeds

Lesson 2 video time line 1:37 approx the final run where the validation loss is 0.199… the model is overfitting slightly. Is that correct, it seems a small amount by which the training loss is .01 smaller that the validation loss.

As I run similar notebook code I can’t replicate Jeremy’s results exactly. Is this due to the random validation set index creation?.

How is it possible to look a the confusion matrix to find which of the 120 classes are performing badly. Has anyone got any pointers.

RogerS49 · March 8, 2018, 5:52pm

I think precompute=True has no effect. When change to precompute=False and run some epochs you will notice the difference in the time it takes everything else been equal, that is because now all your augmentations are being processed.

That’s my understanding.

RogerS49 · March 8, 2018, 6:00pm

Are you using the resnext101_64 architecture. Your last read out line 6 for differential rates is overfitting, what are you parameters for each fit call.

pspenano · March 8, 2018, 8:41pm

I see—thank you for the answer. So there isn’t anything conceptually wrong with skipping step 3 and jumping straight to step 4 other than for the speed bump?

emilmelnikov · March 8, 2018, 9:13pm

Yes, somewhere in the lesson 1 or 2 Jeremy showed how to skip some of these 8 steps. I suppose you can skip even more steps and start directly from differential learning rates (obviously, together with running lr_find() before that).

pspenano · March 9, 2018, 7:11am

Okay, this makes sense. Thanks!

pspenano · March 9, 2018, 7:31am

Just a couple of questions about steps 7 and 8:

What is the intuition behind finding the learning rate again in step 7 of the 8-step process? At this point (for the Cats vs Dogs challenge) we have an unfrozen model trained using differential learning rates with cycle length and multiplier of 1 and 2, respectively. Is this just to find the learning rates for step 8 where we train the full network until it overfits? If so, do we simply execute learn.lr_find() on the currently unfrozen and precompute=False model?

As for step 8, are we just running learn.fit(new_lr, 20, cycle_len=1, cycle_mult=2) to train until it overfits? (I pick 20 to ensure overfitting). Would the fact that we used the lr_find() method in the previous step (if this is indeed how we do step 7) cause any problems here?

Appreciate your help!