Lesson 4 - Official Topic

init_27 · April 15, 2020, 1:49am

I also really like matrixmultiplication.xyz for simulating it

jwuphysics · April 15, 2020, 1:51am

Mostly because it is easier to do matrix multiplication with objects of the shape (N_samples, N_features). With our simple model, none of the pixels in the image know if they are adjacent to another, so it is okay to flatten the matrix into a vector because you do not lose any of those spatial relationships.

gamino · April 15, 2020, 1:51am

Is the example of matrix multiplication being done in the GPU? Or do we have to indicate that?

radikubwa · April 15, 2020, 1:51am

This was shared last time that summarizes matrix multiplication

from @rachel

nareshr8 · April 15, 2020, 1:51am

In this case the model has a input layer as a dense layer. If your model expects a @D matrix say using Convolution layer, you need not change to vector

go_go_gadget · April 15, 2020, 1:51am

To remember how matrix multiplication goes, Rachel’s math course introduced me to the song to the tune of “Oh, My Darling”: https://youtu.be/BGbiHdKHG7o

I have an M.S. in math, and I still use this song to remember the order!

sgugger · April 15, 2020, 1:51am

You need to check the .device of the tensors being multiplied. If it tells you cuda something, the operations happen on the GPU.

ram_cse · April 15, 2020, 1:52am

No, its on CPU. For GPU You have to specify it.

init_27 · April 15, 2020, 1:53am

If anyone is facing an audio cut in/lag-refreshing the video stream/YouTube page fixed it for me.

rfhink · April 15, 2020, 1:55am

Is that sometimes referred to as gradient loss?

sgugger · April 15, 2020, 1:56am

@rachel A lot of people on the youtube channel are asking for the matrix multiplication song, jsut saying

arunslb123 · April 15, 2020, 1:57am

Highly recommend this for matrix multiplication

ram_cse · April 15, 2020, 1:59am

Here is the documentation:

torch.where Documentation

harish3110 · April 15, 2020, 2:02am

Is there a reason the mean of the loss is calculated over say doing a median? Since median is less prone to getting influenced by outliers.

In the example Jeremy gave, if the third point which was wrongly predicted is an outlier then the derivative would push the function away while doing SGD…in this case using a median could be better…

sgugger · April 15, 2020, 2:04am

The median is not going to be differentiable, that’s why we take the mean. Also, you want the points that are really wrongly predicted to give big gradients, so that your model gets better. On the opposite, samples that are rightly predicted won’t contribute a lot to the gradients, which is also what we want.

The idea is that even if you have one wrongfully predicted sample, it’s good that it drags your loss up, and therefore gives a chance to your model to get more accurate.

radikubwa · April 15, 2020, 2:08am

I agree that median is less prone to outlier influence. However, it’s better to use mean when you have more values. They’ll be loss of information when you use median. Try an experiment with random values and you’ll see. I recommend median when you have small number of values and mean when you have a couple of values or rather a lot of values.

kodzaks · April 15, 2020, 2:11am

Is there the upper limit to batch size? Is there a rule of thumb of how to select batch size based on your dataset size?

mrfabulous1 · April 15, 2020, 2:11am

Fabulous this is my kind of math.

Mrfabulous1

sgugger · April 15, 2020, 2:12am

Usually, what you manage to fit in memory (of your GPU) is good.

kodzaks · April 15, 2020, 2:12am

Thank you!