Lesson 7 - Official topic

Hi @SMEissa! If we focus on the second case (wd=0.1), by epoch 5 both your training and validation losses are still getting better… so try training a bit longer until your training loss keeps getting better but your validation loss starts getting worst. This is the time to stop.

Weight decay has a regularization effect that prevents from ovefitting (which is a good thing), but it means also that it can take longer for your model to learn. That’s why in your second case more than 5 epochs may be required.

I think you have it right. You can choose any file name you want for the model. But then when you use load_model you have to pass it the filename. So in your example, you can retrieve the saved model with
my_model_objects = load_model('my_model.pth')
then you can check if you got everything you saved with
You should see the model and the optimizer that you saved.

1 Like

You could also utilize the SaveModelCallback, which has a parameter for a filename that it will save it to (I believe you can also have it simply save every iteration). Then do a learn.load (or load_model) to bring it back in :slight_smile:


Thanks a lot for the clarification!

In the video, and also in the relevant fastbook notebook here, for weight decay it says that:

loss_with_wd = loss + wd * (parameters**2).sum()

which, in derivative, is equivalent to (note: ‘parameters’ above has been swapped for ‘weight’ below):

weight.grad += wd * 2 * weight

Shouldn’t it be loss.grad instead? i.e. (I’ll use the original naming of ‘parameters’ here):

loss.grad += wd * 2 * parameters

Or have I misunderstood something…?



Great question, Yijin! In principle you are correct. But PyTorch uses a slick notation trick:

weight.grad implictly calculates the derivative of the loss function with respect to weight.


Ah right. Thanks for your clarification : )


thanks for this post. pip install dtreeviz resolved my issue.

In the TabularPandas and TabularProc section of 09_tabular.ipynb
We are splitting the training and the validation set with before November 2011 and after November 2011

cond = (df.saleYear<2011) | (df.saleMonth<10)
train_idx = np.where( cond)[0]
valid_idx = np.where(~cond)[0]

splits = (list(train_idx),list(valid_idx))

Should the logic be as follow ?

cond = (df.saleYear<2011) | ( (df.saleMonth<10) & (df.saleYear==2011) )

In Chapter 8 Collab, the embedding can be created with

def create_params(size):
    return nn.Parameter(torch.zeros(*size).normal_(0, .01))

Is the std equal to 0.01 because n_factors is 50, and so 1/50 \sim 0.01?

On a related note (and apologies if I’ve misread the text), in this paragraph

To calculate the result for a particular movie and user a combination we have to look up the index of the movie in our movie latent factors matrix, and the index of the user in our user latent factors matrix, and then we can do our dot product between the two latent factor vectors. But look up in an index is not an operation which our deep learning models know how to do.

it seems to say that we cannot simply index into an embedding, but doesn’t this implementation later on in the chapter

class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = create_params([n_users, n_factors])
        self.user_bias = create_params([n_users])
        self.movie_factors = create_params([n_movies, n_factors])
        self.movie_bias = create_params([n_movies])
        self.y_range = y_range
    def forward(self, x):
        users = self.user_factors[x[:,0]]
        movies = self.movie_factors[x[:,1]]
        res = (users*movies).sum(dim=1)
        res += self.user_bias[x[:,0]] + self.movie_bias[x[:,1]]
        return sigmoid_range(res, *self.y_range)

show that yes we can? Aren’t the square brackets on user_factors and movie_factors the same as ‘look up in an index’?

I’m working in paperspace and am getting the same errors. I had already imported utils from fastbook. Not sure why

NameError: name ‘draw_tree’ is not defined

Hi @Grace1 Try this: make sure that you
cd into the fastbook folder
before you import utils.

Here is what I did:

%cd '/content/drive/My Drive/fastbook/'
from utils import *
%cd ..

For reference, please see my Colab notebook.

1 Like

Yes, I also think that it should be the AND operator, &.


200_000 == 200000. Nice.

1 Like

If you don’t use small NNs, the combined size of your model gets large very fast. The question then is if you shouldn’t just use a single large NN instead.

What’s “entity encoding”? I don’t see the term in the linked article.

Here in the documentation you have learn.save and learn.load, you can use the same for your need.

The Cold-start problem for collaborative filtering.

Thought that was a type and corrected it!