Lesson 5 official topic

I’m quite fond of this approach to debugging in Jupyter Notebook :slight_smile:

The fact that you can call %debug after you get an error and are transported to where it occurred in the stack still boggles my mind :slight_smile:


You can use %debug in a notebook to perform whatever debugging you might need:



Q: Regarding lr_find, are there any recommendations about suggest functions? Which one to use in what cases? Is it always slide and valley as references and picking something in the middle?

1 Like

Is always choosing valley from learning_rate_finder a good idea?

Jeremy mentioned choose some value between valley and slide


Not sure this answers your question, but here is a list of the available methods

1 Like

They use different algorithms but valley is the one that seems to work best most of the time.

There is some discussion here:


Yes. There is an extension among the Jupyter standard extensions called Variable Inspector. It’s great.


Yeah, exactly, I’ve seen that there are many of them, so was wondering why we have all of them. Like, I usually go with steep or valley.

This is my favorite guide to ensembling if anyone might be interested :slight_smile: I suspect quite a few people who did very well on Kaggle might have started with that blog post, or have encountered it early in their Kaggle journey :slight_smile:


Q: Why did Jeremy say in Industry it’s very easy to screw up logistic regression?


The criticism is in terms of the amount of required pre-processing to get a reliable result. And to contrast this with the Random Forrest algorithm - which tends to be quite robust even to ‘ugly’ data, where there may be outliers & missing values.


Not speaking for Jeremy – but from my direct experience it is because all the details that go into preparing data, transforming it and ensuring that all your assumptions match the use of the Logistic Regression model are very easy to get wrong. You can make sillly-but-consequential errors that will completely mess up your results without it being obvious that you’ve either made an error or understand it when it does occur.

In other words, while logistic regression is “straightforwad”, the implementation is nuanced, sensitive and fragile. Random forests by nature and design are more robust to the nuances. They are also easier to interpret and debug.


I would say that it definitely gets especially difficult for regression generalizations like GAM or GLM. And for complex terms that capture trends, for example. It is quite easy for things to go off because of this extra complexity.

1 Like

Great to see that JupyterLab has a visual debugger.
The variable inspector looks useful too.

1 Like

That’s a good idea Shivam, perhaps it could be combined with sympy to compare different loss functions

Ensembling of artifical neural nets would seem to have parallels with biological neural nets…


Kaggle just launched tabular data competition to practice random forest skil from last lesson :slight_smile:


I’m also a bit confused about the centering at 0. I understand how this would be useful for the relu activation function, but in the final layer - for binary classification in the example - the sigmoid function ranges from 0 to 1 right? Do you want a function in that layer that centres at 0 too?

ie How does the initiation parameter centred at 0 impact the gradient descent efficiency if the final layer is sigmoid?

1 Like

The output does, yes, but we want the input to it centered at zero, so the output of the sigmoid is centered at 0.5.


I think I finally understand this. So we are using sigmoid without the +0.5 offset because we want our output predictions to be centered around 0 because that will make it easier for our model to train. So even though in the first example (before you pass our output through the sigmoid function) it is centering our outputs around 0.5 with some results less than 0 and some of our results greater than 1, it is actually easier for the model to give smart weights if we don’t add the extra shifting and instead handle that with a sigmoid. If my intuition is correct here, that also means that the model can output a wider range of values to the sigmoid as well which means that it is easier for the model to put things that are definitely close to 0% very far into the negative and then the sigmoid will convert that to a value close to 0%