Lesson 5 official topic

Check this out https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559

5 Likes

I’m quite fond of this approach to debugging in Jupyter Notebook :slight_smile:

The fact that you can call %debug after you get an error and are transported to where it occurred in the stack still boggles my mind :slight_smile:

13 Likes

You can use %debug in a notebook to perform whatever debugging you might need:

https://blog.jupyter.org/a-visual-debugger-for-jupyter-914e61716559?gi=361cf6d1bf19

5 Likes

Q: Regarding lr_find, are there any recommendations about suggest functions? Which one to use in what cases? Is it always slide and valley as references and picking something in the middle?

1 Like

Is always choosing valley from learning_rate_finder a good idea?

Jeremy mentioned choose some value between valley and slide

4 Likes

Not sure this answers your question, but here is a list of the available methods

1 Like

They use different algorithms but valley is the one that seems to work best most of the time.

There is some discussion here:

10 Likes

Yes. There is an extension among the Jupyter standard extensions called Variable Inspector. It’s great.

4 Likes

Yeah, exactly, I’ve seen that there are many of them, so was wondering why we have all of them. Like, I usually go with steep or valley.

This is my favorite guide to ensembling if anyone might be interested :slight_smile: I suspect quite a few people who did very well on Kaggle might have started with that blog post, or have encountered it early in their Kaggle journey :slight_smile:

20 Likes

Q: Why did Jeremy say in Industry it’s very easy to screw up logistic regression?

3 Likes

The criticism is in terms of the amount of required pre-processing to get a reliable result. And to contrast this with the Random Forrest algorithm - which tends to be quite robust even to ‘ugly’ data, where there may be outliers & missing values.

3 Likes

Not speaking for Jeremy – but from my direct experience it is because all the details that go into preparing data, transforming it and ensuring that all your assumptions match the use of the Logistic Regression model are very easy to get wrong. You can make sillly-but-consequential errors that will completely mess up your results without it being obvious that you’ve either made an error or understand it when it does occur.

In other words, while logistic regression is “straightforwad”, the implementation is nuanced, sensitive and fragile. Random forests by nature and design are more robust to the nuances. They are also easier to interpret and debug.

8 Likes

I would say that it definitely gets especially difficult for regression generalizations like GAM or GLM. And for complex terms that capture trends, for example. It is quite easy for things to go off because of this extra complexity.

1 Like

Great to see that JupyterLab has a visual debugger.
The variable inspector looks useful too.

1 Like

That’s a good idea Shivam, perhaps it could be combined with sympy to compare different loss functions

Ensembling of artifical neural nets would seem to have parallels with biological neural nets…

4 Likes

Kaggle just launched tabular data competition to practice random forest skil from last lesson :slight_smile:

9 Likes

I’m also a bit confused about the centering at 0. I understand how this would be useful for the relu activation function, but in the final layer - for binary classification in the example - the sigmoid function ranges from 0 to 1 right? Do you want a function in that layer that centres at 0 too?

ie How does the initiation parameter centred at 0 impact the gradient descent efficiency if the final layer is sigmoid?

1 Like

The output does, yes, but we want the input to it centered at zero, so the output of the sigmoid is centered at 0.5.

3 Likes