# Lesson 4 - Official Topic

This is lesson 4, which is week 5 since last week was out of order.

Note: This is a wiki post - feel free to edit to add links from the lesson or other useful info.

## Other useful links

19 Likes

Questionnaire entry 13 may be slightly misleading (Why does SGD use mini batches?).

Perhaps you could say a few words about the differences between (1) Stochastic Gradient Descent, (2) Mini Batch Gradient Descent, and (3) Batch Gradient Descent.

1 Like

Scratch that. I did not notice your text in âSGD and mini-batchesâ.

You do explain it, it is just the questionâs phrasing that is a little confusing.

1 Like

A minor thing. In Lesson 4 from the book, pandas background gradient applied in row-wise. For example, df.iloc[10,16] which is 22 dark.

Seaborn considers all the numbers in the dataframe for applying gradient. The same number 22 is light now.

3 Likes

Is this post going to be a wiki for the community to add resources?

1 Like

Iâve wiki-fied it

3 Likes

Is there a plan to work in a smaller groups?

@SMEissa there are a few smaller groups that are active-please check the study groups section, we have a book reading group and a Mid-Level API ones that are active.

There are few open collaborations created by Radek. If you find something interesting and want to start a group-please do so!

2 Likes

@rachel, Jeremyâs mic isnât working

1 Like

Yes, Text is good

3 Likes

Are you also meeting this weekend? it would be great

Iâm not sure yet-but Iâll update the respective wikis

Whats the difference between Gradient Descent and Stochastic Gradient Descent ? Is there something that I should particularly remember ?

1 Like

Gradient descent is when you use all your data to compute the gradients, then update your weights. Stochastic gradient descent is when you use mini-batches with random samples of your training set to compute the gradients, then update your weights.

8 Likes

To add: I remember it as: The âstochastic-nessâ comes from the batches

2 Likes

Technically:
Gradient Descent ==> Gradient is calculated using Whole data
Stochastic Gradient Descent ==> Gradient is calculated using one sample of data
Mini Batch Gradient Descent ==> Gradient is calculated using batch(generally given by batch-size) of data

Traditionally, we call mini-batch gradient descent as SGD.

12 Likes

I donât really like this terminology that is pretty old (and absolutely no one does true stochastic gradient descent anyway). When we say stochastic gradient descent, itâs the mini-batch gradient descent (and usually, when people that want to refer to the stochastic gradient descent of this definition, they say true stochastic gradient descent).

8 Likes

Why did we reshape the images from a matrix into a vector?

To be able to do a matrix multiplication with the set of those images by our weights: you canât multiply a tensor of size N x 28 x 28 by some weights, but you can multiply a tensor of size N x 784 by some weights.

6 Likes