I also really like matrixmultiplication.xyz for simulating it
Mostly because it is easier to do matrix multiplication with objects of the shape (N_samples, N_features)
. With our simple model, none of the pixels in the image know if they are adjacent to another, so it is okay to flatten the matrix into a vector because you do not lose any of those spatial relationships.
Is the example of matrix multiplication being done in the GPU? Or do we have to indicate that?
In this case the model has a input layer as a dense layer. If your model expects a @D matrix say using Convolution layer, you need not change to vector
To remember how matrix multiplication goes, Rachelâs math course introduced me to the song to the tune of âOh, My Darlingâ: https://youtu.be/BGbiHdKHG7o
I have an M.S. in math, and I still use this song to remember the order!
You need to check the .device
of the tensors being multiplied. If it tells you cuda something, the operations happen on the GPU.
No, its on CPU. For GPU You have to specify it.
If anyone is facing an audio cut in/lag-refreshing the video stream/YouTube page fixed it for me.
Is that sometimes referred to as gradient loss?
@rachel A lot of people on the youtube channel are asking for the matrix multiplication song, jsut saying
Highly recommend this for matrix multiplication
Here is the documentation:
Is there a reason the mean of the loss is calculated over say doing a median? Since median is less prone to getting influenced by outliers.
In the example Jeremy gave, if the third point which was wrongly predicted is an outlier then the derivative would push the function away while doing SGDâŠin this case using a median could be betterâŠ
The median is not going to be differentiable, thatâs why we take the mean. Also, you want the points that are really wrongly predicted to give big gradients, so that your model gets better. On the opposite, samples that are rightly predicted wonât contribute a lot to the gradients, which is also what we want.
The idea is that even if you have one wrongfully predicted sample, itâs good that it drags your loss up, and therefore gives a chance to your model to get more accurate.
I agree that median is less prone to outlier influence. However, itâs better to use mean when you have more values. Theyâll be loss of information when you use median. Try an experiment with random values and youâll see. I recommend median when you have small number of values and mean when you have a couple of values or rather a lot of values.
Is there the upper limit to batch size? Is there a rule of thumb of how to select batch size based on your dataset size?
Fabulous this is my kind of math.
Mrfabulous1
Usually, what you manage to fit in memory (of your GPU) is good.
Thank you!