Wiki / Lesson Thread: Lesson 9

(melissa.fabros) #1

Lesson Resources

Notes: (Under Construction)

Review of PyTorch components by writing logistic linear regression

Softmax vs. Sigmoid activation functions

Introduction to Gradient Descent

Introduction to Learning Rates

Introduction to Broadcasting

Wiki / Lesson Thread: Lesson 8
About the Intro to Machine Learning category
(Prince Grover) #2

I have a few questions from the class –

  1. net = nn.Sequential( nn.Linear(28*28, 10), nn.LogSoftmax() )

In last non-linear layer, why did we use logsoftmax, not softmax? Weren’t we exponentiating outputs from 2nd last layer so as to make them all +ve ? Why back to log after doing [exp]/[sum of exp].

  1. n.Parameter(torch.randn(*dims)/dims[0])

What is the reason of dividing by dims[0]. I tried and it doesn’t work if we don’t divide by dims[0]. By it doesn’t work, I mean fit() gives loss = nans and very bad accuracy.

Thanks :slight_smile:

(Jeremy Howard (Admin)) #4

I just posted the video.

(Jeremy Howard (Admin)) #5

The loss functions in pytorch generally assume you have LogSoftmax, for computational efficiency reasons:

This is He initialization ( . Although I may have forgotten a sqrt there…

Without careful initialization you’ll get gradient explosion. We discuss this in the DL course.

(Prince Grover) #6

Helpful links. Thanks :slight_smile: