Lesson 5 In-Class Discussion

This is very good https://www.youtube.com/watch?v=kvnBw_D0gfs

2 Likes

The video is 30 minutes shorter, it starts at 7 pm.
Do you have also the first part of it?

Does pytorch autograd calculate jacobian products on the background ?

Correct. :frowning: This is weird. Usually, live-streams remains as full movies.

That’s because of his 8 years in management consulting? @jeremy could answer this! especially the patience that we need with Excel approach!

1 Like

It’s back!

How can I create a dataloader in pytorch with multiple inputs like cats, conts ? Thanks.

I couldn’t find what cls(…) does.

I found ConcatDataset() class in PyTorch…

Gives similar data structure with ColumnarModelData.from_data_frame

Doesn’t work still batch only has cats…

Just added the lesson video to the wiki post.

We could, but it would be better to define a regular function dot_product(), since we’re not actually using any nn.Module features in DotProduct - I was just showing it as an example of a really simple module.

1 Like

This is SO COOL! :slight_smile:

2 Likes

Heh I had no idea… :slight_smile:

Oh gosh that’s important! Thanks. This is why we should be using nn.Dropout really. Remind me to cover this next week if I forget.

2 Likes

Can you just use the one we showed in class (both this lesson, and last week’s)? Or are you asking how to implement it from scratch?

Yes sure, I was just curious about how it’s done in Pytorch. For practical purposes I will use Fastai. Thanks

I wanted to understand better the standard deviation input to the Kaiming He initialization (0.05). Should the “number of things” be n_users or the product of n_users and n_factors? Or in other words, should it be the number of users or the number of elements in the embedding matrix?

i.e.:

n_movies = 9066
n_users = 671
n_factors = 50
users_x_factors_stddev = math.sqrt(2/(n_users*n_factors))

or 

users_stddev = math.sqrt(2/(n_users))

users_x_factors_stddev is 0.0077, users_stddev is 0.0545

Intuitively I was thinking that it should be the number of elements in the embedding matrix, since I assumed the distribution would be over the number of weights, but the calculation works out to be the number of users.

From my understanding, it should just be n_factors. That’s the number of Nodes in the Previous Layer. If my understanding is wrong, please correct me.

1 Like

Right. I believe that was what was said in the lesson, but that calculation seemed to work out to be 0.2 using math.sqrt(2/50). Am I doing something wrong?

No, that’s correct. As long as stdev is small, the initializations work out fine. There’s another method where it just suggests do 1/n_factors, so as long as initializations are small, the network learns the weights.

Thanks. I guess it is just a guideline, since in a practical sense the weights get reset during learning anyways. I was just curious to see if I could understand make sense of how to calculate the number. Conceptually, I believe Jeremy was making the point of having the initialization not be a really large numbers like in the millions.

1 Like

Is the reason the number of nodes is n_factors because the element-wise product of the values for a user and the movie would be of size n_factors prior to summation?