Lesson 2 In-Class Discussion

@yinterian
I just used the official AWS AMI to run the Lesson 1 notebook, and the final TTA accuracy I got was 0.99299999999999999 https://gyazo.com/1a780541eefe368f1602ba3f83752909 which is less good than @Jeremy’s.

  1. is the difference relevant?
  2. is there any way to get reproducible results using PyTorch and the fastai lib?
1 Like

A question re: learn.predict()

Why does the FastAI library’s predict function return probabilities as logs rather than pure probabilities?

You can use ONNX that translates models from Pytorch to Caffe2 that can run on mobile

2 Likes

Test Time Augmentation I believe happens whenever you ask the model to do prediction on test (or production data), so that it get’s a better chance to predict.

So, is it not something you do when you move your model and weights to production, but when you have ready done that, and a request comes in for image classification.

2 Likes

I think its not very easy to perform image type data argumentation to Time Series data and adding noise will defeat the purpose of data argumentation. However, if data is seasonal, we can use something like take average of current value and value at t - 52, t + 52 , so on and so forth for weekly dataset. This is just an idea, as I have to still evaluate this

2 Likes

For multi-class classification, we use softmax function to convert the raw scores to pseudo probabilities (so that all class probabilities add up to one for an image)

So the operative word there is pseudo, softmax does not give you pure probabilities as taught in probability theory.

Was that your question though?

1 Like

QQ - Why do we’ve 7 epochs

learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

1 Like

The validation loss was lower than the train loss. How? Is there drop-out?

We have cycle_mult=2 which means that we have 3 cycles and every cycle number of epoch in the cycle will double. That’s why we have 7 epochs = 1 +2 + 4

5 Likes

If I got this right, the smallest block of learning rate adjustment is a mini-batch, definitely not per epoch.

Hint: Look at that output from the learn.sched.plot_lr() in the notebook.

2 Likes

Right now it returns a number between 0 (very confident prediction) and negative numbers (lower means less confident).

Why not just return a number between 0 and 1 indicating probability instead of the log?

My understanding is that it doesn’t preserve all the activations but only last layer including augment images. The Neural network model will directly use the activations. It eliminates recomputation for same data as frozen layers will not create different activation. Just speeds up computation in multi epoch situation.

1 Like

I have seen that too, validation loss lower than training loss. My hypothesis was that the data in the validation set (in this case) was very well learned by the model, hence it could do well - better than the larger training set. Probably if we shuffle it better (train validation split), we might see different results.

Where is the info of the groups (teams)? Edit: found the link of lesson1, but spreadsheet wont let me access. Will find out why.

How to pick learning rate and other hyper-parameter when the data size is too large. Like in kaggle cdiscount image classification challenge the training size is 12 Million images and one epoch takes 22 hrs on my moderate size gpu?

1 Like

It typically happens due to data augmentation. The training data has more complex situations than test/validation images.

3 Likes

Good point!

The validation set shouldn’t be trained on though! Just used to set hyper-parameters.

1 Like

Can’t access team sheet on google drive. CAN YOU PLEASE CHECK IN?

1 Like

Maybe a beginner question. I see @jeremy notebook nicely organized with bullet points and headings. How do I get the same ? Any setting ?