This is very good https://www.youtube.com/watch?v=kvnBw_D0gfs
The video is 30 minutes shorter, it starts at 7 pm.
Do you have also the first part of it?
Does pytorch autograd calculate jacobian products on the background ?
Correct. This is weird. Usually, live-streams remains as full movies.
That’s because of his 8 years in management consulting? @jeremy could answer this! especially the patience that we need with Excel approach!
How can I create a dataloader in pytorch with multiple inputs like cats, conts ? Thanks.
I couldn’t find what cls(…) does.
I found ConcatDataset() class in PyTorch…
Gives similar data structure with ColumnarModelData.from_data_frame
Doesn’t work still batch only has cats…
Just added the lesson video to the wiki post.
We could, but it would be better to define a regular function
dot_product(), since we’re not actually using any
nn.Module features in
DotProduct - I was just showing it as an example of a really simple module.
This is SO COOL!
Heh I had no idea…
Oh gosh that’s important! Thanks. This is why we should be using
nn.Dropout really. Remind me to cover this next week if I forget.
Can you just use the one we showed in class (both this lesson, and last week’s)? Or are you asking how to implement it from scratch?
Yes sure, I was just curious about how it’s done in Pytorch. For practical purposes I will use Fastai. Thanks
I wanted to understand better the standard deviation input to the Kaiming He initialization (0.05). Should the “number of things” be n_users or the product of n_users and n_factors? Or in other words, should it be the number of users or the number of elements in the embedding matrix?
n_movies = 9066 n_users = 671 n_factors = 50 users_x_factors_stddev = math.sqrt(2/(n_users*n_factors)) or users_stddev = math.sqrt(2/(n_users))
0.0077, users_stddev is
Intuitively I was thinking that it should be the number of elements in the embedding matrix, since I assumed the distribution would be over the number of weights, but the calculation works out to be the number of users.
From my understanding, it should just be
n_factors. That’s the number of Nodes in the Previous Layer. If my understanding is wrong, please correct me.
Right. I believe that was what was said in the lesson, but that calculation seemed to work out to be 0.2 using
math.sqrt(2/50). Am I doing something wrong?
No, that’s correct. As long as stdev is small, the initializations work out fine. There’s another method where it just suggests do 1/n_factors, so as long as initializations are small, the network learns the weights.
Thanks. I guess it is just a guideline, since in a practical sense the weights get reset during learning anyways. I was just curious to see if I could understand make sense of how to calculate the number. Conceptually, I believe Jeremy was making the point of having the initialization not be a really large numbers like in the millions.
Is the reason the number of nodes is n_factors because the element-wise product of the values for a user and the movie would be of size n_factors prior to summation?