Lesson 5 In-Class Discussion

kcturgutlu · November 28, 2017, 5:21am

This is very good https://www.youtube.com/watch?v=kvnBw_D0gfs

alessa · November 28, 2017, 5:30am

The video is 30 minutes shorter, it starts at 7 pm.
Do you have also the first part of it?

kcturgutlu · November 28, 2017, 5:31am

Does pytorch autograd calculate jacobian products on the background ?

vikbehal · November 28, 2017, 5:47am

Correct. This is weird. Usually, live-streams remains as full movies.

vikbehal · November 28, 2017, 5:52am

That’s because of his 8 years in management consulting? @jeremy could answer this! especially the patience that we need with Excel approach!

vikbehal · November 28, 2017, 7:32am

It’s back!

kcturgutlu · November 28, 2017, 8:38am

How can I create a dataloader in pytorch with multiple inputs like cats, conts ? Thanks.

I couldn’t find what cls(…) does.

I found ConcatDataset() class in PyTorch…

Gives similar data structure with ColumnarModelData.from_data_frame

Doesn’t work still batch only has cats…

jeremy · November 28, 2017, 3:12pm

Just added the lesson video to the wiki post.

jeremy · November 28, 2017, 4:45pm

We could, but it would be better to define a regular function dot_product(), since we’re not actually using any nn.Module features in DotProduct - I was just showing it as an example of a really simple module.

jeremy · November 28, 2017, 4:55pm

This is SO COOL!

jeremy · November 28, 2017, 4:58pm

Heh I had no idea…

jeremy · November 28, 2017, 5:00pm

Oh gosh that’s important! Thanks. This is why we should be using nn.Dropout really. Remind me to cover this next week if I forget.

jeremy · November 28, 2017, 5:02pm

Can you just use the one we showed in class (both this lesson, and last week’s)? Or are you asking how to implement it from scratch?

kcturgutlu · November 28, 2017, 6:04pm

Yes sure, I was just curious about how it’s done in Pytorch. For practical purposes I will use Fastai. Thanks

kmatsuda · November 28, 2017, 9:15pm

I wanted to understand better the standard deviation input to the Kaiming He initialization (0.05). Should the “number of things” be n_users or the product of n_users and n_factors? Or in other words, should it be the number of users or the number of elements in the embedding matrix?

i.e.:

n_movies = 9066
n_users = 671
n_factors = 50
users_x_factors_stddev = math.sqrt(2/(n_users*n_factors))

or 

users_stddev = math.sqrt(2/(n_users))

users_x_factors_stddev is 0.0077, users_stddev is 0.0545

Intuitively I was thinking that it should be the number of elements in the embedding matrix, since I assumed the distribution would be over the number of weights, but the calculation works out to be the number of users.

ramesh · November 28, 2017, 9:20pm

From my understanding, it should just be n_factors. That’s the number of Nodes in the Previous Layer. If my understanding is wrong, please correct me.

kmatsuda · November 28, 2017, 9:24pm

Right. I believe that was what was said in the lesson, but that calculation seemed to work out to be 0.2 using math.sqrt(2/50). Am I doing something wrong?

ramesh · November 28, 2017, 9:29pm

No, that’s correct. As long as stdev is small, the initializations work out fine. There’s another method where it just suggests do 1/n_factors, so as long as initializations are small, the network learns the weights.

kmatsuda · November 28, 2017, 9:34pm

Thanks. I guess it is just a guideline, since in a practical sense the weights get reset during learning anyways. I was just curious to see if I could understand make sense of how to calculate the number. Conceptually, I believe Jeremy was making the point of having the initialization not be a really large numbers like in the millions.

kmatsuda · November 28, 2017, 10:00pm

Is the reason the number of nodes is n_factors because the element-wise product of the values for a user and the movie would be of size n_factors prior to summation?