Lesson 3 In-Class Discussion


(Vikrant Behal) #235

Idea is that once you train model with train and Val set, oprtimize it. Once you’ve the best model, rerun same steps by moving validation data into test data. That’s like restarting the whole process.

In actual validation data is for us to evaluate model or for model to learn some of the parameters.


(Ramesh Sampath) #236

I have seen that technique used in Traditional methods like Trees, Logistic Reg etc. My main concern is, if you add Validation data to training process and adjust weights by running for few epochs, it might start to slightly overfit to the validation set and we no longer know how it will perform on Out of Sample data (test). May be that’s what we have to do when we don’t have lots of data to train on.

I don’t think I am clarifying anything for you other than stating that I would also like to hear what others think about this process.


(Vikrant Behal) #237

I heard Jeremy saying (unsure of what I inferred is what he said) something like this so thought of trying. Thanks, Ramesh.


(Jeremy Howard (Admin)) #238

Yes I did. We had some earlier discussions on the forum about this - hopefully one of you can find and link to them… :slight_smile:


(Walter Vanzella) #239

Changing the size ‘sz’ during the training phases seems to me quite remarkable.
Is that possible because of the structure of resnet which is kind of size-agnostic up to some layers?

Anyway it is still unclear. My understanding is that after the training at sz=64 we can provide to the same learner double sized images. The firsts layer-‘filters’ have constant dimensions hence producing bigger activation images, but at a certain point the final layer must have the correct number of labels.
Is the network structure changing somewhere (thanks to fastai sw) or resnet is that flexible?
Thanks


(Eric Perbos-Brinck) #240

Video timelines for Lesson 3

  • 00:00:05 Cool guides & posts made by Fast.ai classmates
    • tmux, summary of lesson 2, learning rate finder, guide to Pytorch, learning rate vs batch size,
    • decoding ResNet architecture, beginner’s forum
  • 00:05:45 Where we go from here
  • 00:08:20 How to complete last week assignement “Dog breeds detection”
  • 00:08:55 How to download data from Kaggle (Kaggle CLI) or anywhere else
  • 00:12:05 Cool tip to download only the files you need: using CulrWget
  • 00:13:35 Dogs vs Cats example
  • 00:17:15 What means “Precompute = True” and “learn.bn_freeze”
  • 00:20:10 Intro & comparison to Keras with TensorFlow
  • 00:30:10 Porting PyTorch fast.ai library to Keras+TensorFlow project
  • 00:32:30 Create a submission to Kaggle
  • 00:39:30 Making an individual prediction on a single file
  • 00:42:15 The theory behind Convolutional Networks, and Otavio Good demo (Word Lens)
  • 00:49:45 ConvNet demo with Excel,
    • filter, Hidden layer, Maxpool, Dense weights, Fully-Connected layer
  • Pause
  • 01:08:30 ConvNet demo with Excel (continued)
    • output, probabilities adding to 1, activation function, Softmax
  • 01:15:30 The mathematics you really need to understand for Deep Learning
    • Exponentiation & Logarithm
  • 01:20:30 Multi-label classification with Amazon Satellite competition
  • 01:33:35 Example of improving a “washed-out” image
  • 01:37:30 Seting different learning rates for different layers
  • 01:38:45 ‘data.resize()’ for speed-up, and ‘metrics=[f2]’ or ‘fbeta_score’ metric
  • 01:45:10 ‘sigmoid’ activation for multi-label
  • 01:47:30 Question on “Training only the last layers, not the initial freeze/frozen ones from ImageNet models”
    • ‘learn.unfreeze()’ advanced discussion
  • 01:56:30 Visualize your model with ‘learn.summary()’, shows ‘OrderedDict()’
  • 01:59:45 Working with Structured Data “Corporacion Favorita Grocery Sales Forecasting”
    • Based on the Rossman Stores competitition
  • 02:05:30 Book: Python for Data Analysis, by Wes McKinney
  • 02:13:30 Split Rossman columns in two types: categorical vs continuous

Wiki: Lesson 3
Deep Learning Brasilia - Lição 3
(Jeremy Howard (Admin)) #241

Thanks @EricPB this is wonderfully helpful! FYI I’ve cleaned up your links and formatting, and also pasted it directly into the wiki post (which is editable by all).

Are you planning to do more of these? They are really great :slight_smile:


(Eric Perbos-Brinck) #242

Thanks !
I try to create them during my 2nd pass on watching the video, because fast-forwarding through several x2 hours videos for those 5 mins dedicated to ‘data.resize()’ or ‘knowledge distillation’ drove me nuts in Part 1 v1 :smiley:


(Chris Palmer) #243

Yes, about the overfitting. I have tried this and my model went from being nicely balanced to horribly overfitted. I don’t know what to make of the information though, as its only ever validating agianst one image in this scenario. See this post: Dog Breed Identification challenge

Whereas I had achieved around 0.200 loss prior to this, after it I had 0.230 loss - but again I am not sure how to understand it :slight_smile:

When I submitted this “enhanced” model to kaggle it had a 0.2275 loss, so it looks like it did something like the predicted accuracy, which was not very good to say the least! Perhpas @jeremy could let us know if there is a best approach to doing this, as I struggled to piece together an approach from reading numerous posts, and likely stuffed something up!


(James Requa) #244

@Chris_Palmer training on all of the data is a bit tricky because you have to follow all the same steps you did when you had set aside the validation set, any variance could cause overfitting. To be honest with you, before this course I never even considered training on all of the data for those reasons and instead my approach was always to use K-Fold cross-validation.

In case you aren’t aware of how it works, the idea behind CV is that your model does get to see all the data but it sees it in folds (each time the validation set is a different split of the data), at the very end you take the average of all the folds so your predictions are a reflection of all of the data. With CV you get the luxury of verifying that each fold is not overfitting and then feel comfortable taking the average of all folds for the final prediction.

@sermakarevich has provided a lot of good examples on how to use cross validation with fastai so you should check some of his posts on that if you wanted to try it out. Otherwise I think Jeremy is also planning on covering this in more detail since he created a notebook for it but its not quite completed yet so probably best to wait before exploring it if you are new to this concept.


(Chris Palmer) #245

Hi @jamesrequa

Yes, I have used k-fold cross validation in standard ML. But I am confused about its use with deep learning. I have observed that with each training cycle I go through my model improves, but then goes towards overfitting. So it seems like a very fine line between a good model and one that is no longer so good, more is not neccessarily better. Although you are varying the data the model is learning on with CV, are you not also training the model many times more - and is there not a risk of overtraining it?


(James Requa) #246

Actually no because the model starts over fresh for each fold. It doesn’t continue training other folds with the saved weights, those weights are reset each time.

I also recommend reading this post from fastai blog courtesy of @rachel, it should help you build up more of an intuition about this stuff
http://www.fast.ai/2017/11/13/validation-sets/


(Chris Palmer) #247

Thanks @jamesrequa

How are we making use of these models, if we reset the data and weights for each fold? The only model we have “in hand” is the current one, yet we want to combine these various CV models to a good effect for doing a prediction.

Likewise if we try to create an ensemble of different architectures…

For each fold (and for different models from other architectures), are you processing the model all the way through to predicting on the test data, and preserving those predictions, before cycling through to another fold?


(unknown) #248

Is there a lesson 4 wiki yet? I can’t find it.


(James Requa) #249

We typically would generate predictions on the test set after each fold with those trained weights. So the predictions are then what we would average for the 5 folds at the end. Similarly, you can repeat this same process with multiple different models and then take an average of those different model’s predictions.

You can also save the weights of your model at any point by using the function learn.save('model'). So alternatively you could save the weights and then after you are done training, you can load the weights back in with learn.load('model') and just run predictions directly from those weights.


(Ramesh Sampath) #250

If you pass in the cycle_save_name parameter into the learn.fit like learn.fit(lr, 3, cycle_len=1, cycle_mult=1, cycle_save_name="my_model_"), it will save the weights from each Cycle. Then you can load them and run prediction on them and ensemble these Cycle predictions.

Currently you have to load them independently into separate models and then predict on them multiple times. Could be useful to have a method called load_cycles to load all these models and return ensemble like Scikit-learn’s VotingClassifier. :slight_smile:


(Ramesh Sampath) #251

Ah…didn’t realize you guys were talking about cross-validation and not cycle prediction. So, ignore my comment above. Hope it’s still useful for someone.


(James Requa) #252

Yep I think its still very useful information. What you are describing is Snapshot Ensembles which is also a really great feature! I think Jeremy actually did create some functions in the planet-cv notebook for something similar to what you were suggesting, so maybe check that out!


(Chris Palmer) #253

Actually would this is useful for picking a better cut-off for model? I have seen my model deteriorate (to overfit) and I wished I had been able to stop training on it earlier, so maybe I could use this approach to retrieve that better point in time?


(Chris Palmer) #254

Thanks for this advice @jamesrequa. What functions are you using for saving your predictions - are they the bcolz ones referred to earlier in this wiki? Lesson 3 In-Class Discussion

def save_array(fname, arr): c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush()

# Example
save_array('preds.bc', preds)

What about reloading, do you have a function for that?

I have not had much success with reliably getting a model I can reload using learn.save. That is, I have performed a learn.save, then come back later (after closing then restarting my AWS), and have not been able to learn.load…