Lesson 3 In-Class Discussion

vikbehal · November 19, 2017, 4:10am

That’s interesting. If I were to bring human analogy here, all of us may be biased in some sense but merging all of our experience can bring a super-human who’s perfect than average humans!

vikbehal · November 19, 2017, 5:32am

I had by mistake typed 3 cycles instead of 1. While each cycle takes 30 minutes, I’m wondering If I can interrupt Kernel and I expect the learn to retain the knowledge from 1st epoch.

binga · November 19, 2017, 7:00am

Yes, you are right. It’s similar to an ensemble effect. Similar to the effect; each tree in a random forest brings to the overall model. You could build multiple NN models, which would have had different individual weights to start with (due to randomness) and when you average all of them, the averaged model kinds of smooths out the tiny errors caused by each model and most of the time you see improved losses / better accuracies.

binga · November 19, 2017, 7:03am

You can interrupt the kernel anytime by clicking this icon on the menu bar of the notebook.

Re-initialize the learner to be on the safer side and you should be good to go!

.

vikbehal · November 19, 2017, 7:29am

Double checking… we average the log_preds from various models?

jamesrequa · November 19, 2017, 8:14am

I usually convert the logs to probabilities before taking the average.
probs = np.exp(log_preds)

Then if you had a few model probabilities:
avg = (prob1 + prob2 + prob3) / 3

vikbehal · November 19, 2017, 8:31am

Great! I found 41 variations between resnet50 and resnext50!
~0.0011 improvements in log loss! Thank you!

ecdrid · November 19, 2017, 8:53am

Am I interpreting this correctly?

We will train different models on the same dataset and then we will predict on the test values by each and then average them and update the sub.csv accordingly?

What i was doing previously was combining excel’s manually…

jamesrequa · November 19, 2017, 9:09am

Yep thats correct! You replace the previous single model predictions with the average of all of the model’s predictions and that becomes your new submission to Kaggle.

miguel_perez · November 19, 2017, 10:51am

@jamesrequa, while Im hopeless in reading Python code I can answer 100% sure that any transformation (non related to image augmentation) that you do to train must be applied the same to test set for prediction.

Also I noticed that if you resize data previously to increase speed like planet notebook does the temporary file includes resized test, what I take as a confirmation of this.

miguel_perez · November 19, 2017, 10:54am

… unless one of those humans is Einstein and the topic is Physics

ramesh · November 19, 2017, 12:47pm

Most likely, you can expect the learner to have weights from first epoch. But cannot be guaranteed that it’s in a stable state. In the example below, I have a dict with two keys - a, b. Both should have same value at the end of the for loop, but because I interrupted the cell, the values of a and b are inconsistent. But d is still updated. Hope this is useful.

ramesh · November 19, 2017, 12:58pm

I generally prefer the from_csv method of ImageClassfierData so that I don’t have to move the files between Train and Valiation. You can modify the val_idxs to one value say [0] (just the first image in the csv dataset. But if you are using from_paths, you may need to move or copy the files. But please note that you will be Overfitting to Train (which is now Validation data), so the metrics like Val. Loss and Val. Accuracy are not meaningful anymore if you copy validation images to Train.

I don’t know of any scenario where this will be useful. So, unless you have so few images that you can’t afford to have a validation set, I would avoid merging validation and training data. Because what I really care about is performance on TEST Data (unseen) data and having a Validation data that’s similar to Test data is the only way to ensure you get your expected performance on test data.

vikbehal · November 19, 2017, 4:13pm

Idea is that once you train model with train and Val set, oprtimize it. Once you’ve the best model, rerun same steps by moving validation data into test data. That’s like restarting the whole process.

In actual validation data is for us to evaluate model or for model to learn some of the parameters.

ramesh · November 19, 2017, 4:36pm

I have seen that technique used in Traditional methods like Trees, Logistic Reg etc. My main concern is, if you add Validation data to training process and adjust weights by running for few epochs, it might start to slightly overfit to the validation set and we no longer know how it will perform on Out of Sample data (test). May be that’s what we have to do when we don’t have lots of data to train on.

I don’t think I am clarifying anything for you other than stating that I would also like to hear what others think about this process.

vikbehal · November 19, 2017, 5:26pm

I heard Jeremy saying (unsure of what I inferred is what he said) something like this so thought of trying. Thanks, Ramesh.

jeremy · November 19, 2017, 7:43pm

Yes I did. We had some earlier discussions on the forum about this - hopefully one of you can find and link to them…

Waltz · November 20, 2017, 12:11am

Changing the size ‘sz’ during the training phases seems to me quite remarkable.
Is that possible because of the structure of resnet which is kind of size-agnostic up to some layers?

Anyway it is still unclear. My understanding is that after the training at sz=64 we can provide to the same learner double sized images. The firsts layer-‘filters’ have constant dimensions hence producing bigger activation images, but at a certain point the final layer must have the correct number of labels.
Is the network structure changing somewhere (thanks to fastai sw) or resnet is that flexible?
Thanks

EricPB · November 20, 2017, 2:44pm

Video timelines for Lesson 3

00:00:05 Cool guides & posts made by Fast.ai classmates
- tmux, summary of lesson 2, learning rate finder, guide to Pytorch, learning rate vs batch size,
- decoding ResNet architecture, beginner’s forum
00:05:45 Where we go from here
00:08:20 How to complete last week assignement “Dog breeds detection”
00:08:55 How to download data from Kaggle (Kaggle CLI) or anywhere else
00:12:05 Cool tip to download only the files you need: using CulrWget
00:13:35 Dogs vs Cats example
00:17:15 What means “Precompute = True” and “learn.bn_freeze”
00:20:10 Intro & comparison to Keras with TensorFlow
00:30:10 Porting PyTorch fast.ai library to Keras+TensorFlow project
00:32:30 Create a submission to Kaggle
00:39:30 Making an individual prediction on a single file
00:42:15 The theory behind Convolutional Networks, and Otavio Good demo (Word Lens)
00:49:45 ConvNet demo with Excel,
- filter, Hidden layer, Maxpool, Dense weights, Fully-Connected layer
Pause
01:08:30 ConvNet demo with Excel (continued)
- output, probabilities adding to 1, activation function, Softmax
01:15:30 The mathematics you really need to understand for Deep Learning
- Exponentiation & Logarithm
01:20:30 Multi-label classification with Amazon Satellite competition
01:33:35 Example of improving a “washed-out” image
01:37:30 Seting different learning rates for different layers
01:38:45 ‘data.resize()’ for speed-up, and ‘metrics=[f2]’ or ‘fbeta_score’ metric
01:45:10 ‘sigmoid’ activation for multi-label
01:47:30 Question on “Training only the last layers, not the initial freeze/frozen ones from ImageNet models”
- ‘learn.unfreeze()’ advanced discussion
01:56:30 Visualize your model with ‘learn.summary()’, shows ‘OrderedDict()’
01:59:45 Working with Structured Data “Corporacion Favorita Grocery Sales Forecasting”
- Based on the Rossman Stores competitition
02:05:30 Book: Python for Data Analysis, by Wes McKinney
02:13:30 Split Rossman columns in two types: categorical vs continuous

jeremy · November 20, 2017, 3:14pm

Thanks @EricPB this is wonderfully helpful! FYI I’ve cleaned up your links and formatting, and also pasted it directly into the wiki post (which is editable by all).

Are you planning to do more of these? They are really great