Lesson 4 - Official Topic

jeremy · April 14, 2020, 7:39pm

This is lesson 4, which is week 5 since last week was out of order.

Note: This is a wiki post - feel free to edit to add links from the lesson or other useful info.

Resources

Links from lesson

Face Masks Against COVID-19: An Evidence Review

Other useful links

Notes by @Lankinen

ren · April 15, 2020, 12:33am

Questionnaire entry 13 may be slightly misleading (Why does SGD use mini batches?).

Perhaps you could say a few words about the differences between (1) Stochastic Gradient Descent, (2) Mini Batch Gradient Descent, and (3) Batch Gradient Descent.

ren · April 15, 2020, 12:51am

Scratch that. I did not notice your text in ‘SGD and mini-batches’.

You do explain it, it is just the question’s phrasing that is a little confusing.

arunslb123 · April 15, 2020, 1:05am

A minor thing. In Lesson 4 from the book, pandas background gradient applied in row-wise. For example, df.iloc[10,16] which is 22 dark.

Seaborn considers all the numbers in the dataframe for applying gradient. The same number 22 is light now.

ilovescience · April 15, 2020, 1:18am

Is this post going to be a wiki for the community to add resources?

init_27 · April 15, 2020, 1:19am

I’ve wiki-fied it

SMEissa · April 15, 2020, 1:29am

Is there a plan to work in a smaller groups?

init_27 · April 15, 2020, 1:30am

@SMEissa there are a few smaller groups that are active-please check the study groups section, we have a book reading group and a Mid-Level API ones that are active.

There are few open collaborations created by Radek. If you find something interesting and want to start a group-please do so!

Blowoffvalve · April 15, 2020, 1:31am

@rachel, Jeremy’s mic isn’t working

gamino · April 15, 2020, 1:32am

Yes, Text is good

Albertotono · April 15, 2020, 1:34am

Are you also meeting this weekend? it would be great

init_27 · April 15, 2020, 1:35am

I’m not sure yet-but I’ll update the respective wikis

vijaysai · April 15, 2020, 1:36am

Whats the difference between Gradient Descent and Stochastic Gradient Descent ? Is there something that I should particularly remember ?

sgugger · April 15, 2020, 1:37am

Gradient descent is when you use all your data to compute the gradients, then update your weights. Stochastic gradient descent is when you use mini-batches with random samples of your training set to compute the gradients, then update your weights.

init_27 · April 15, 2020, 1:39am

To add: I remember it as: The “stochastic-ness” comes from the batches

ram_cse · April 15, 2020, 1:43am

Technically:
Gradient Descent ==> Gradient is calculated using Whole data
Stochastic Gradient Descent ==> Gradient is calculated using one sample of data
Mini Batch Gradient Descent ==> Gradient is calculated using batch(generally given by batch-size) of data

Traditionally, we call mini-batch gradient descent as SGD.

sgugger · April 15, 2020, 1:46am

I don’t really like this terminology that is pretty old (and absolutely no one does true stochastic gradient descent anyway). When we say stochastic gradient descent, it’s the mini-batch gradient descent (and usually, when people that want to refer to the stochastic gradient descent of this definition, they say true stochastic gradient descent).

miko · April 15, 2020, 1:48am

Why did we reshape the images from a matrix into a vector?

sgugger · April 15, 2020, 1:49am

To be able to do a matrix multiplication with the set of those images by our weights: you can’t multiply a tensor of size N x 28 x 28 by some weights, but you can multiply a tensor of size N x 784 by some weights.