I have to leave now, but wanted to post another reminder to not publicly share about new research until the MOOC is released.
Could we learn the mom parameter as well? Even use (learn) a seperate mom for each measure? Might be hard to keep reasonable/comparable I guess.
Could this running batch norm be modified so it can also properly implement batchnorm with the accumulate gradients callback?
stuff shown now looks complex for me atleast and hard to follow…
Any suggestion to get grab of it offline ?
FYI
The latest 07_batchnorm in github has issues ScriptModule does not get imported
also there is a type error when running
with Hooks
v = x.var((0,2,3)), keepdim=True)
TypeError var(): argument ‘dim’ (posistion1) must be int not tuple
That maybe an issue with the pytorch-nightly. I did git pull and conda update fastai and install nightly
Thanks for the great lessons! Lots of things to mull over once again
No, variance is defined for any distribution.
The version of nightly is 1.0.0.dev20190403 py3.6_cuda10.0.130_cudnn7.4.2.0 pytorch
I had a situation were pytorch release version was also installed
it happened when I did
conda update -c fastai fastai
so I have removed it but now 07 or 06 won’t
import torch.nn.functional
fastai=1.0.51=1
I removed 1.0.51 build 1 and replaced with fastai=1.0.50.post1
That clears the
with Hooks
v = x.var((0,2,3)), keepdim=True)
TypeError var(): argument ‘dim’ (posistion1) must be int not tuple
import torch.nn.functional issue
but not the ScriptModule missing
There is a cell
from torch.jit import ScriptModule, script_method, script
from typing import *
which was missing from the git pull version at 18.30 PDT
Now that I have changed back to previous version of fastai and added the cell to 07_batchnorm.ipynb
the notebook is running but has come to a pause at cell 19 get_learn_run.
6.33am I am back to bed. Thanks for you help
RE ScriptModule replace with nn.Module as per later post from jph00
Weirdly there are actually distributions that have an undefined variance (and mean for that matter) https://en.wikipedia.org/wiki/Cauchy_distribution#Explanation_of_undefined_moments
No, the purpose of mom (and eps) is to make training more stable, rather than to decrease the loss for a particular batch. So their gradients don’t help with the task they’re there for!
It’s still just the filter dimension. Remember that all layers of a neural net have a number of “channels” or “filters” - it doesn’t matter what type of data was in the input.
Apologies - it should have said nn.Module
. I’ve fixed it in the repo now.
No, because ‘mults’ scales the overall activations - which was the purpose of the init scaling. So we init ‘mults’ to 1.0.
You should probably use our shifted init (i.e. the GeneralRelu
defaults shown in the lesson), or else the very similar ELU.
Ah you caught me! No we didn’t.
But… we did show that conv is just a matrix multiply, with some tied weights and zeros, and we’ve already done that from scratch; so I figured we don’t gain much doing conv from scratch too. And it would be soooooo slooooow.
But for folks still feeling a little unsure about what a conv does - you absolutely should write it yourself!
It’s fine to have a negative class for a binary problem (NLP, vision, or anything else) since it’s simply sigmoid activation and we don’t have this same issue.
But we don’t have a negative class for multi-class NLP problems IIRC…
Yes, if you know you have one and exactly one class represented in each data item, then softmax is best, since you’re helping the model by giving it one less thing to learn.
nano doesn’t really do enough to be useful. I wouldn’t suggest spending time learning it. Use vim or emacs. Emacs is a little easy to get started with, although vim is better for manipulating datasets (although there are emacs extensions to help there).
Yes that’s what I was using. It’s pretty basic but it’s ok.