Lesson 4 - Official Topic

This is lesson 4, which is week 5 since last week was out of order.

Note: This is a wiki post - feel free to edit to add links from the lesson or other useful info.


Links from lesson

Other useful links

Notes by @Lankinen


Questionnaire entry 13 may be slightly misleading (Why does SGD use mini batches?).

Perhaps you could say a few words about the differences between (1) Stochastic Gradient Descent, (2) Mini Batch Gradient Descent, and (3) Batch Gradient Descent.

Scratch that. I did not notice your text in ‘SGD and mini-batches’.

You do explain it, it is just the question’s phrasing that is a little confusing.

A minor thing. In Lesson 4 from the book, pandas background gradient applied in row-wise. For example, df.iloc[10,16] which is 22 dark.

Seaborn considers all the numbers in the dataframe for applying gradient. The same number 22 is light now.


Whats the difference between Gradient Descent and Stochastic Gradient Descent ? Is there something that I should particularly remember ?

Gradient descent is when you use all your data to compute the gradients, then update your weights. Stochastic gradient descent is when you use mini-batches with random samples of your training set to compute the gradients, then update your weights.


To add: I remember it as: The “stochastic-ness” comes from the batches


Gradient Descent ==> Gradient is calculated using Whole data
Stochastic Gradient Descent ==> Gradient is calculated using one sample of data
Mini Batch Gradient Descent ==> Gradient is calculated using batch(generally given by batch-size) of data

Traditionally, we call mini-batch gradient descent as SGD.


I don’t really like this terminology that is pretty old (and absolutely no one does true stochastic gradient descent anyway). When we say stochastic gradient descent, it’s the mini-batch gradient descent (and usually, when people that want to refer to the stochastic gradient descent of this definition, they say true stochastic gradient descent).


Why did we reshape the images from a matrix into a vector?

To be able to do a matrix multiplication with the set of those images by our weights: you can’t multiply a tensor of size N x 28 x 28 by some weights, but you can multiply a tensor of size N x 784 by some weights.