Lesson 10 Discussion & Wiki (2019)

sgugger · April 4, 2019, 3:00am

It needs to run after cuda.

sgugger · April 4, 2019, 3:00am

Wait for notebook 10

hiromi · April 4, 2019, 3:03am

Did Jeremy say if we went from 27 numbers to 32 numbers, we are losing information?

champs.jaideep · April 4, 2019, 3:03am

could some one re explain on chosing kernel size for model like 77 or 55 and losing of information part…

skottapa · April 4, 2019, 3:05am

So only weights get pushed to GPU? somehow was under the impression even activations are stored GPU and if GPU memory is low, gradient checkpointing is used to recalculate them.

sgugger · April 4, 2019, 3:06am

No weights and inputs are on the GPU, then so are activations and gradients.

gietema · April 4, 2019, 3:07am

I still wonder where the kernel values come from. Are they generated? Defined somewhere?
(I guess I should dig into PyTorch’s Conv2d source code )

PierreO · April 4, 2019, 3:08am

It’s part of the weights of the model, they will get updated through gradient descent. Just like the weights of a fully connected layer.

amanmadaan · April 4, 2019, 3:08am

They are actually weights; they are initialized randomly and then learned during the training process.

karthik.subraveti · April 4, 2019, 3:08am

In the current discussion where we compute mean and standard dev, why don’t we instead compute mean absolute variance to see how the activations are. I wonder if outliers in activations affect the result

AlexisGallagher · April 4, 2019, 3:09am

Maybe a dumb question but … why are there cyclic spikes in the plot of activations as training progresses? Are these batches?

skottapa · April 4, 2019, 3:10am

Thanks for clarifying. Was confused a bit.

So I guess then only loss is calculated on cpu, or is that also on GPU

PierreO · April 4, 2019, 3:10am

To be clear, randomly but still with a set distribution (see all the discussion about initializations).

sgugger · April 4, 2019, 3:10am

No it is computed on the GPU, then stored on the CPU.

sgugger · April 4, 2019, 3:11am

If you are able to fully explain why, you will probably get the next Turing award

karthik.subraveti · April 4, 2019, 3:16am

I wonder if Backprop could be causing the weights to be constantly shifting back and forth initially and that could be causing this pattern.

PierreO · April 4, 2019, 3:18am

Looking at the graphs of the mean and std deviation, I’m wondering how useful standard methods of signal analysis / time series analysis would be to gain insight from them.

Do you know of any research on that?

Otherwise, “Try blah” I guess

yeldarb · April 4, 2019, 3:21am

Can someone explain how to read those colorful histograms?

champs.jaideep · April 4, 2019, 3:21am

when should we use kaiming_uniform…

sgugger · April 4, 2019, 3:21am

When using uniform init, by opposition to normal.