Lesson 4 - Official Topic

I also really like matrixmultiplication.xyz for simulating it

2 Likes

Mostly because it is easier to do matrix multiplication with objects of the shape (N_samples, N_features). With our simple model, none of the pixels in the image know if they are adjacent to another, so it is okay to flatten the matrix into a vector because you do not lose any of those spatial relationships.

3 Likes

Is the example of matrix multiplication being done in the GPU? Or do we have to indicate that?

This was shared last time that summarizes matrix multiplication

from @rachel

7 Likes

In this case the model has a input layer as a dense layer. If your model expects a @D matrix say using Convolution layer, you need not change to vector

To remember how matrix multiplication goes, Rachel’s math course introduced me to the song to the tune of “Oh, My Darling”: https://youtu.be/BGbiHdKHG7o

I have an M.S. in math, and I still use this song to remember the order!

2 Likes

You need to check the .device of the tensors being multiplied. If it tells you cuda something, the operations happen on the GPU.

7 Likes

No, its on CPU. For GPU You have to specify it.

1 Like

If anyone is facing an audio cut in/lag-refreshing the video stream/YouTube page fixed it for me.

8 Likes

Is that sometimes referred to as gradient loss?

@rachel A lot of people on the youtube channel are asking for the matrix multiplication song, jsut saying :wink:

7 Likes

Highly recommend this for matrix multiplication

9 Likes

Here is the documentation:

torch.where Documentation

Is there a reason the mean of the loss is calculated over say doing a median? Since median is less prone to getting influenced by outliers.

In the example Jeremy gave, if the third point which was wrongly predicted is an outlier then the derivative would push the function away while doing SGD
in this case using a median could be better


2 Likes

The median is not going to be differentiable, that’s why we take the mean. Also, you want the points that are really wrongly predicted to give big gradients, so that your model gets better. On the opposite, samples that are rightly predicted won’t contribute a lot to the gradients, which is also what we want.

The idea is that even if you have one wrongfully predicted sample, it’s good that it drags your loss up, and therefore gives a chance to your model to get more accurate.

17 Likes

I agree that median is less prone to outlier influence. However, it’s better to use mean when you have more values. They’ll be loss of information when you use median. Try an experiment with random values and you’ll see. I recommend median when you have small number of values and mean when you have a couple of values or rather a lot of values.

2 Likes

Is there the upper limit to batch size? Is there a rule of thumb of how to select batch size based on your dataset size?

3 Likes

Fabulous this is my kind of math.

Mrfabulous1 :grinning: :smiley: :grinning:

1 Like

Usually, what you manage to fit in memory (of your GPU) is good.

4 Likes

Thank you!