Lesson 10 Discussion & Wiki (2019)

It needs to run after cuda.

1 Like

Wait for notebook 10 :wink:

1 Like

Did Jeremy say if we went from 27 numbers to 32 numbers, we are losing information?

7 Likes

could some one re explain on chosing kernel size for model like 77 or 55 and losing of information part…

4 Likes

So only weights get pushed to GPU? somehow was under the impression even activations are stored GPU and if GPU memory is low, gradient checkpointing is used to recalculate them.

No weights and inputs are on the GPU, then so are activations and gradients.

2 Likes

I still wonder where the kernel values come from. Are they generated? Defined somewhere?
(I guess I should dig into PyTorch’s Conv2d source code :slight_smile: )

It’s part of the weights of the model, they will get updated through gradient descent. Just like the weights of a fully connected layer.

1 Like

They are actually weights; they are initialized randomly and then learned during the training process.

2 Likes

In the current discussion where we compute mean and standard dev, why don’t we instead compute mean absolute variance to see how the activations are. I wonder if outliers in activations affect the result

Maybe a dumb question but … why are there cyclic spikes in the plot of activations as training progresses? Are these batches?

1 Like

Thanks for clarifying. Was confused a bit.

So I guess then only loss is calculated on cpu, or is that also on GPU

To be clear, randomly but still with a set distribution (see all the discussion about initializations).

1 Like

No it is computed on the GPU, then stored on the CPU.

2 Likes

If you are able to fully explain why, you will probably get the next Turing award :slight_smile:

4 Likes

I wonder if Backprop could be causing the weights to be constantly shifting back and forth initially and that could be causing this pattern.

Looking at the graphs of the mean and std deviation, I’m wondering how useful standard methods of signal analysis / time series analysis would be to gain insight from them.

Do you know of any research on that?

Otherwise, “Try blah” I guess :wink:

1 Like

Can someone explain how to read those colorful histograms?

5 Likes

when should we use kaiming_uniform…

1 Like

When using uniform init, by opposition to normal.