For a bit of background, this past year at my university I ran my own “study group” where I proctored my own rendition of Practical Deep Learning for Coders. In this I focused more on a datatype by datatype basis each lecture and went in-depth into each. I knew that eventually I wanted to redo these notebooks into 2.0 for the Spring when I kicked it off again, but as I designed the whole thing to be intro friendly, I have decided to do it now and port 2.0 over. In these notebooks I will go over the high-level API differences so that those who may only stay at this level right now don’t get too overwhelmed by all the new information that 2.0 brings.
The first few notebooks will be very similar to the original course as it’s a great introduction, and then branching off from there. The first one is available here where I go over PETs! Over the next week or two, I’ll slowly be bringing in more notebooks and converting them over.
Thanks @amritv I just uploaded a tabular example, I’m working on getting an example with Tabular RAPIDs next, and then Baysian optimization, k-folds, along with Regression (rossmann)
<ipython-input-3-352c3a9f46af> in <module>() 1 from fastai2.basics import * ----> 2 from fastai2.callback.all import * 3 from fastai2.vision.all import *
Thanks @Jeremy! It means a lot hearing that from you I’ll try to not step on @sgugger’s toes too much then. These will focus more on a direct comparison of what’s new and some nice implementations (feature importance, Ranger, etc)
I’ve updated notebooks 1 and 2 now with ImageDataBunch examples as well as the full PipeLine (also from folder now works how we’re used to! Thanks Sylvain!)
So I was hoping for a RAPIDs notebook, but Colab is not seeming to let me use a T4 instance… otherwise I have a multi-label classification based on sgugger’s example
Hi. @muellerzr, trying to understand why importances in your notebook look not familiar to me (if I read importances correctly the top feature is responsible for only 3% of accuracy) as I compare it with my football case, where top feature is 30%-important, I have noticed that you calculate FI on a separate set. Have I understood it correctly?
If so it raises a serious question, should we calculate FI on a whole set (including the one we trained our model) or on a separate one (to try to be unbiased)?
I have thought about it a bit beforehand and for now my mind is the following. As we, in this type of FI calculation, strictly speaking, more analyse the model itself than the data, we can use the whole dataset. It allows us a) use more data (which can be important when there is a scarcity of it) and b)maybe get more clear results as unaccuracy l of the test (never seen by the model) set is too big and have often the same order of magnitude than permutaion unaccuracy. I mean that (base_error - value) can be more chaotic as base_error is as big as value.
By the way, I don’t really remember if Jeremy mentioned something regarding this topic on it in the video with Feature Importance concept
Hi @Pak! To answer your question, I chose (and have been using) a separate test entirely for a few reasons. I wanted FI to be focused on how my model is behaving on unseen data, to see what in the real world my model will do. This is two-fold, as it eliminates any biases during training in my features, and also allows for a better understanding of that model’s behavior during post-production.
Jeremy went over it briefly in the Intro to ML course, and I cannot recall correctly if he did permutation importance (it was random forests then)
Let me know if you have any thoughts or questions!
Also, here is a note from the scikit-learn documentation:
Using a held-out set makes it possible to highlight which features contribute the most to the generalization power of the inspected model. Features that are important on the training set but not on the held-out set might cause the model to overfit.
I think I do understand what FI shows using training set. I think it determines to what columns model binds itself the most, in some sense it means in what columns there is the most consumable info connected to dependent variable contained. But I’m not quite have intuition what this method of FI will ‘mean’ on an unseen data.
And I’m afraid that it can be hard, to catch the ‘useful signal’ of importance in the chaos of low accuracy on unseen data. Similar problem (low value of signal to noise ratio) turned me away from calculating FI by retraining
This maybe a part of the answer. I don’t know what testset-FI mean, but the difference in FI’s over training and test sets is a good source of info answering to which features model maybe tend to bind too much.
I should check that out, thanx
I’ve added a notebook detailing how to use the new optimization function Ranger and a new fit function, as well as the Mish activation function Still working on an update to pytorch before I can continue onto any of the other Vision-based tasks. There’s a few implementation issues that I’m working on sorting out so you won’t get quite the accuracy you expect (working on that)