Wiki / Lesson Thread: Lesson 9

melissa.fabros · November 28, 2017, 9:46pm

Lesson Resources

Notes: (Under Construction)

Review of PyTorch components by writing logistic linear regression

Softmax vs. Sigmoid activation functions

Introduction to Gradient Descent

Introduction to Learning Rates

Introduction to Broadcasting

groverpr · November 29, 2017, 6:18am

I have a few questions from the class –

net = nn.Sequential( nn.Linear(28*28, 10), nn.LogSoftmax() )

In last non-linear layer, why did we use logsoftmax, not softmax? Weren’t we exponentiating outputs from 2nd last layer so as to make them all +ve ? Why back to log after doing [exp]/[sum of exp].

n.Parameter(torch.randn(*dims)/dims[0])

What is the reason of dividing by dims[0]. I tried and it doesn’t work if we don’t divide by dims[0]. By it doesn’t work, I mean fit() gives loss = nans and very bad accuracy.

Thanks

jeremy · November 29, 2017, 6:06pm

I just posted the video.

jeremy · November 29, 2017, 6:09pm

The loss functions in pytorch generally assume you have LogSoftmax, for computational efficiency reasons: https://discuss.pytorch.org/t/does-nllloss-handle-log-softmax-and-softmax-in-the-same-way/8835

This is He initialization (http://www.jefkine.com/deep/2016/08/08/initialization-of-deep-networks-case-of-rectifiers/) . Although I may have forgotten a sqrt there…

Without careful initialization you’ll get gradient explosion. We discuss this in the DL course.

groverpr · November 29, 2017, 7:30pm

Helpful links. Thanks

grimknight · February 8, 2019, 11:23am

Blogpost on Broadcasting in Pytorch

Broadcasting-With-Pytorch

rajeshtamada · March 2, 2019, 12:59pm

hi there,
I have a question pertaining to optimizer.zero_grad() . I have gone couple of times over the section where it is explained why do we have to call this function.
I still don’t understand it .
From pytorch forum , i understand unless for the special cases where one wants to simulate bigger batches by accumulating the gradients , one has to invoke optimizer.zero_grad() to clear the grandients for the next batch.
Would like to understand Jeremy explanation though.

rgarcia · October 11, 2019, 11:20am

The title changed “with” for “in” and the link got broken.

New one.
https://jidindinesh.github.io/2019-02-08-Broadcasting-In-PyTorch

rgarcia · October 11, 2019, 5:15pm

On the next video (lesson 10) around minute 25-26 he explains again why we need to clear the weights and bias.

Hope it helps.

hamzautd7 · April 25, 2020, 5:39pm

can anyone help me with this

geek4ray · May 1, 2020, 12:54pm

facing the same issue as @hamzautd7