Lesson 5 official topic

Seeing the 0 values being multiplied by the coefficients makes me wonder if that could give us some extra information by adding in an {column_name}_is_zero column since this information isn’t changing when the coefficient of the {column_name} coefficient is changed. Is that intuition correct or is there something that is happening to change the 0 values?

edit: Thinking more about this, we shouldn’t need to do this on one-hot encoded variables, but maybe still useful for other columns?

2 Likes

thanks :slight_smile:

One concern that I always have while using get_dummies is what happens while using a test data, I have a new category (lets say male / female / others) and this will have a extra column missing from training data. How do you take care of that? Could I bucket all extra categories into one column just using get_dummies or probably save the state of get_dummies so that I can use that while calling it on test data?

8 Likes

On the topic of sensible gradients: By sensible in the book (or maybe it was earlier this course) you say the function is relatively smooth with no huge peaks or valleys. Is there an easy way to know/picture/plot what the gradient looks like so we can tell what’s sensible vs. not sensible or even come up with sensible alternatives if it made sense to do that?

4 Likes

… and if you would like to read up on logarithms, Math Better Explained is quite an awesome resource :slight_smile:

It is probably not super useful to learn all these details (you can probably get more out of writing and reading code), but it is a fun rabbit hole to dive into :slight_smile:

For machine learning this can be quite useful though. But it deals with why you might want to take a log of a target variable, so it is not the scenario we encountered in class today (we took a log of an input variable to make the life of our linear model easier :wink: ).

15 Likes

Dont we have to add 0.5 to x of sigmoid? Because prediction is currently generally already centered around 0.5? whereas sigmoid is centered at 0?

1 Like

What is the best way to deal with class imbalance problem?

2 Likes

Also, it could be a good idea to train for this other category. Like, you may want to randomly change some percentage (maybe 5% or so) of your training values into other. Then your model can learn an “average” effect of this feature, and start with some meaningful coefficient when encounter a new category during testing. Possibly fastai implements something like this already for categorical data.

I think you may need to write a function to track that. There was a function that was introduced called plot_function that could help. How does a neural net really work? | Kaggle.

They are several ways to deal with that. Some sklearn ML algorithms e.g Logistic Regression have an argument class_weight which you can adjust however, looking at the source code to see the effect will help even more.

The simple solution, for a class imbalance that is high enough to cause a problem, is you can oversample (duplicate) your minority class, or undersample your majority class.

If this isn’t good enough, you can generate artificial data that is similar to the minority class. SMOTE (Synthetic Minority Oversampling Technique) is one example - discussed here.

4 Likes

Javier, basis my limited knowledge, you can always look at the gradient values in pytorch using your_variable.grad. You could do it after each layer (in the case of deep learning) to see how it is changing. I have used it to see the problem of diminishing gradients while working on a model in my MS class. I haven’t seen it as a graph but only as values in a table.

At [72], normally I hear “hidden” in reference to “hidden layers”, but there are two layers.
So n_hidden=20 is 20 cells per layer?

2 Likes

How could we figure out that we need to -0.3 the layer 2 coefficients in init_coeffs (cell 78) to get the model to train?
If we change the number of hidden layers would we need to change it?

1 Like

It’s true that the input domain of the sigmoid is centered on zero, however the output range is [0-1]. The sigmoid function is applied to the classifier’s output, producing predictions that are in that 0-1 range.

edit: I wrongly switched the terms ‘range’ and ‘domain’ in my initial reply

2 Likes

Yes - 20 neurons.

3 Likes

Jeremy’s comment on that 0.3 magic value was that he basically fiddled with it until it was trainable.

5 Likes

Yes, you would. But this is just a demonstration of the ‘hand crafted’ approach, and Jeremy will show in time the more effective ways of managing this.

2 Likes

Question: At the beginning, we had taken the log of a few variables. Now after model building, do we interpret their coefficients differently from the other features. If yes, how do we read these coefficients. Does that method change for the deep neural network like for the 20 layered?

Are there any good debugging features in Jupyter (or extensions for it) that let you break execution and explore variables and data like in an IDE? PyCharm, for instance, has some nice tools to show dataframes as tables during a debug session.
In Jupyter that would helpful to go through for loops and see how variables change in each iteration.