It needs to run after cuda.
Wait for notebook 10
Did Jeremy say if we went from 27 numbers to 32 numbers, we are losing information?
could some one re explain on chosing kernel size for model like 77 or 55 and losing of information part…
So only weights get pushed to GPU? somehow was under the impression even activations are stored GPU and if GPU memory is low, gradient checkpointing is used to recalculate them.
No weights and inputs are on the GPU, then so are activations and gradients.
I still wonder where the kernel values come from. Are they generated? Defined somewhere?
(I guess I should dig into PyTorch’s Conv2d source code )
It’s part of the weights of the model, they will get updated through gradient descent. Just like the weights of a fully connected layer.
They are actually weights; they are initialized randomly and then learned during the training process.
In the current discussion where we compute mean and standard dev, why don’t we instead compute mean absolute variance to see how the activations are. I wonder if outliers in activations affect the result
Maybe a dumb question but … why are there cyclic spikes in the plot of activations as training progresses? Are these batches?
Thanks for clarifying. Was confused a bit.
So I guess then only loss is calculated on cpu, or is that also on GPU
To be clear, randomly but still with a set distribution (see all the discussion about initializations).
No it is computed on the GPU, then stored on the CPU.
If you are able to fully explain why, you will probably get the next Turing award
I wonder if Backprop could be causing the weights to be constantly shifting back and forth initially and that could be causing this pattern.
Looking at the graphs of the mean and std deviation, I’m wondering how useful standard methods of signal analysis / time series analysis would be to gain insight from them.
Do you know of any research on that?
Otherwise, “Try blah” I guess
Can someone explain how to read those colorful histograms?
when should we use kaiming_uniform…
When using uniform init, by opposition to normal.