This is very good https://www.youtube.com/watch?v=kvnBw_D0gfs

# Lesson 5 In-Class Discussion

**alessa**(Aless Bandrabur) #126

The video is 30 minutes shorter, it starts at 7 pm.

Do you have also the first part of it?

**kcturgutlu**(Kerem Turgutlu) #127

Does pytorch autograd calculate jacobian products on the background ?

**vikbehal**(Vikrant Behal) #129

That’s because of his 8 years in management consulting? @jeremy could answer this! especially the patience that we need with Excel approach!

**kcturgutlu**(Kerem Turgutlu) #132

How can I create a dataloader in pytorch with multiple inputs like cats, conts ? Thanks.

I couldn’t find what cls(…) does.

I found ConcatDataset() class in PyTorch…

Gives similar data structure with ColumnarModelData.from_data_frame

Doesn’t work still batch only has cats…

**jeremy**(Jeremy Howard (Admin)) #134

We could, but it would be better to define a regular function `dot_product()`

, since we’re not actually using any `nn.Module`

features in `DotProduct`

- I was just showing it as an example of a really simple module.

**jeremy**(Jeremy Howard (Admin)) #138

Oh gosh that’s important! Thanks. This is why we should be using `nn.Dropout`

really. Remind me to cover this next week if I forget.

**jeremy**(Jeremy Howard (Admin)) #139

Can you just use the one we showed in class (both this lesson, and last week’s)? Or are you asking how to implement it from scratch?

**kcturgutlu**(Kerem Turgutlu) #140

Yes sure, I was just curious about how it’s done in Pytorch. For practical purposes I will use Fastai. Thanks

**kmatsuda**(Ken) #141

I wanted to understand better the standard deviation input to the Kaiming He initialization (0.05). Should the “number of things” be n_users or the product of n_users and n_factors? Or in other words, should it be the number of users or the number of elements in the embedding matrix?

i.e.:

```
n_movies = 9066
n_users = 671
n_factors = 50
users_x_factors_stddev = math.sqrt(2/(n_users*n_factors))
or
users_stddev = math.sqrt(2/(n_users))
```

users_x_factors_stddev is `0.0077`

, users_stddev is `0.0545`

Intuitively I was thinking that it should be the number of elements in the embedding matrix, since I assumed the distribution would be over the number of weights, but the calculation works out to be the number of users.

**ramesh**(Ramesh Sampath) #142

From my understanding, it should just be `n_factors`

. That’s the number of Nodes in the Previous Layer. If my understanding is wrong, please correct me.

**kmatsuda**(Ken) #143

Right. I believe that was what was said in the lesson, but that calculation seemed to work out to be 0.2 using `math.sqrt(2/50)`

. Am I doing something wrong?

**ramesh**(Ramesh Sampath) #144

No, that’s correct. As long as stdev is small, the initializations work out fine. There’s another method where it just suggests do 1/n_factors, so as long as initializations are small, the network learns the weights.

**kmatsuda**(Ken) #145

Thanks. I guess it is just a guideline, since in a practical sense the weights get reset during learning anyways. I was just curious to see if I could understand make sense of how to calculate the number. Conceptually, I believe Jeremy was making the point of having the initialization not be a really large numbers like in the millions.

**kmatsuda**(Ken) #146

Is the reason the number of nodes is n_factors because the element-wise product of the values for a user and the movie would be of size n_factors prior to summation?