Check this out https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559
I’m quite fond of this approach to debugging in Jupyter Notebook
The fact that you can call %debug
after you get an error and are transported to where it occurred in the stack still boggles my mind
You can use %debug
in a notebook to perform whatever debugging you might need:
https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559?gi=361cf6d1bf19
Q: Regarding lr_find
, are there any recommendations about suggest functions? Which one to use in what cases? Is it always slide
and valley
as references and picking something in the middle?
Is always choosing valley from learning_rate_finder
a good idea?
Jeremy mentioned choose some value between valley and slide
Not sure this answers your question, but here is a list of the available methods
They use different algorithms but valley is the one that seems to work best most of the time.
There is some discussion here:
Yes. There is an extension among the Jupyter standard extensions called Variable Inspector. It’s great.
Yeah, exactly, I’ve seen that there are many of them, so was wondering why we have all of them. Like, I usually go with steep
or valley
.
This is my favorite guide to ensembling if anyone might be interested I suspect quite a few people who did very well on Kaggle might have started with that blog post, or have encountered it early in their Kaggle journey
Q: Why did Jeremy say in Industry it’s very easy to screw up logistic regression?
The criticism is in terms of the amount of required pre-processing to get a reliable result. And to contrast this with the Random Forrest algorithm - which tends to be quite robust even to ‘ugly’ data, where there may be outliers & missing values.
Not speaking for Jeremy – but from my direct experience it is because all the details that go into preparing data, transforming it and ensuring that all your assumptions match the use of the Logistic Regression model are very easy to get wrong. You can make sillly-but-consequential errors that will completely mess up your results without it being obvious that you’ve either made an error or understand it when it does occur.
In other words, while logistic regression is “straightforwad”, the implementation is nuanced, sensitive and fragile. Random forests by nature and design are more robust to the nuances. They are also easier to interpret and debug.
I would say that it definitely gets especially difficult for regression generalizations like GAM or GLM. And for complex terms that capture trends, for example. It is quite easy for things to go off because of this extra complexity.
Great to see that JupyterLab has a visual debugger.
The variable inspector looks useful too.
That’s a good idea Shivam, perhaps it could be combined with sympy to compare different loss functions
Ensembling of artifical neural nets would seem to have parallels with biological neural nets…
Kaggle just launched tabular data competition to practice random forest skil from last lesson
I’m also a bit confused about the centering at 0. I understand how this would be useful for the relu activation function, but in the final layer - for binary classification in the example - the sigmoid function ranges from 0 to 1 right? Do you want a function in that layer that centres at 0 too?
ie How does the initiation parameter centred at 0 impact the gradient descent efficiency if the final layer is sigmoid?
The output does, yes, but we want the input to it centered at zero, so the output of the sigmoid is centered at 0.5.