Thanks @Jeremy! It means a lot hearing that from you I’ll try to not step on @sgugger’s toes too much then. These will focus more on a direct comparison of what’s new and some nice implementations (feature importance, Ranger, etc)
Don’t worry about his toes - I was only pointing it out in case there was some useful ideas you could steal.
Yes, it’s never a problem to have too many resources
Thanks Jeremy! There certainly were!
Sounds good Sylvain I’ll do my best!
I’ve updated notebooks 1 and 2 now with
ImageDataBunch examples as well as the full PipeLine (also from folder now works how we’re used to! Thanks Sylvain!)
I’ve added a notebook detailing permutation importance based on Pak’s work in 1.0 here
So I was hoping for a RAPIDs notebook, but Colab is not seeming to let me use a T4 instance… otherwise I have a multi-label classification based on sgugger’s example
@muellerzr, trying to understand why importances in your notebook look not familiar to me (if I read importances correctly the top feature is responsible for only 3% of accuracy) as I compare it with my football case, where top feature is 30%-important, I have noticed that you calculate FI on a separate set. Have I understood it correctly?
If so it raises a serious question, should we calculate FI on a whole set (including the one we trained our model) or on a separate one (to try to be unbiased)?
I have thought about it a bit beforehand and for now my mind is the following. As we, in this type of FI calculation, strictly speaking, more analyse the model itself than the data, we can use the whole dataset. It allows us a) use more data (which can be important when there is a scarcity of it) and b)maybe get more clear results as unaccuracy l of the test (never seen by the model) set is too big and have often the same order of magnitude than permutaion unaccuracy. I mean that
(base_error - value) can be more chaotic as
base_error is as big as
By the way, I don’t really remember if Jeremy mentioned something regarding this topic on it in the video with Feature Importance concept
Hi @Pak! To answer your question, I chose (and have been using) a separate test entirely for a few reasons. I wanted FI to be focused on how my model is behaving on unseen data, to see what in the real world my model will do. This is two-fold, as it eliminates any biases during training in my features, and also allows for a better understanding of that model’s behavior during post-production.
Jeremy went over it briefly in the Intro to ML course, and I cannot recall correctly if he did permutation importance (it was random forests then)
Let me know if you have any thoughts or questions!
Also, here is a note from the scikit-learn documentation:
Using a held-out set makes it possible to highlight which features contribute the most to the generalization power of the inspected model. Features that are important on the training set but not on the held-out set might cause the model to overfit.
I think I do understand what FI shows using training set. I think it determines to what columns model binds itself the most, in some sense it means in what columns there is the most consumable info connected to dependent variable contained. But I’m not quite have intuition what this method of FI will ‘mean’ on an unseen data.
And I’m afraid that it can be hard, to catch the ‘useful signal’ of importance in the chaos of low accuracy on unseen data. Similar problem (low value of signal to noise ratio) turned me away from calculating FI by retraining
This maybe a part of the answer. I don’t know what testset-FI mean, but the difference in FI’s over training and test sets is a good source of info answering to which features model maybe tend to bind too much.
I should check that out, thanx
I’ve added a notebook detailing how to use the new optimization function Ranger and a new fit function, as well as the Mish activation function Still working on an update to pytorch before I can continue onto any of the other Vision-based tasks. There’s a few implementation issues that I’m working on sorting out so you won’t get quite the accuracy you expect (working on that)
@muellerzr FYI there’s a
Lookahead wrapper for optimizers, instead of a
Ranger optimizer. Use it like so:
def optfunc(p, lr=defaults.lr): return Lookahead(Radam(p, lr=lr))
Exactly what I did, thanks
def opt_func(ps, lr=defaults.lr): return Lookahead(RAdam(ps, wd=1e-2,mom=0.95, eps=1e-6,lr=lr))
(I lined up our hyperparameters)
Cool. Note that sometimes
eps is inside the
sqrt(), and sometimes outside, depending on the implementation. You should make sure that your
1e-6 is consistent with the location of the
eps you’ve used elsewhere.
Thanks for the hint! I’ll go check on that and see if it helps later today!
It looks like the issue with Colab has been fixed (they pushed 1.3.1). Double checking now. It’s good! I’ll post a headpose notebook shortly
Looks like that wasn’t it. All of ours have it outside (verified by @morgan)
it had left it outside the sqrt in both Ranger and RangerQH. RAdam in fastai v2 also has it outside and my port of QHAdam also leaves it outside. - morgan
Thanks so much for this guide!
It’s really useful to switch between fast AI v1 & v2, because I can get more insights into the fundamentals.
I got stuck at 02_Custom_Image_Classification when I use the cnn learner learn = cnn_learner(dbunch, resnet34, pretrained=True, metrics=error_rate)
The Learner object that is returned has no attribute ‘fit_one_cycle’. I used the help function on the learn variable and the fit_one_cycle method doesn’t show up.
I’m using Google Colab, I did restart the runtime and I get the same error. I’ve also rewritten the code & pasted ~approx everything you wrote in the guide and same error showing up.
n.b: my Leaner object also doesn’t have a lr_find() method.
@andreihrs Try doing:
from fastai2.callback.all import *
And thank you very much for the kind words I’ll update the notebook with that fix later today.
I’m working on NLP next, I plan on trying to migrate Multi-FiT over once I finish with an IMDB sample and IMDB (so we can see
from_csv and the original from
folders). If anyone has particular topics they want me to do for notebooks please let me know Otherwise here is the outline of what is left in no particular order:
08_Rossmann - Will wait until tabular issues are fixed
(Possibly GAN and/or
Multiple points (pose detection)
thanks much @muellerzr, an implementation of DeViSe in 2.0 will be helpful