What is your seq2seq_reg doing?
def seq2seq_reg(output, xtra, loss, alpha=0, beta=0):
hs,dropped_hs = xtra
if alpha: # Activation Regularization
loss = loss + sum(alpha * dropped_hs[-1].pow(2).mean())
if beta: # Temporal Activation Regularization (slowness)
h = hs[-1]
if len(h)>1: loss = loss + sum(beta * (h[1:] - h[:-1]).pow(2).mean())
Alpha part is relatively easy to see: it’s L2 reg. of last hidden layer, right? (not sure about Dropped part. )
Beta part is beyond me.
Read paper, still confused.
Maybe a dumb question, but why do you need a ReLU at all? could you possibly just have two back to back convs there because ReLU is also changing things isn’t it?
Re: BatchNorm: Parallel processing on multi GPU’s - tips for doing this with current fastai codeset?
Could you do gradient clipping or lower learning rates at the beginning? And why is res scaling different than reducing the learning rate? Just curious if he tried other more normal tricks before going to this strange res_scaling thing.
If you want to try LARS, it’s very easy to implement as an optimizer in pytorch (did it in this gist).
Isn’t that what the NVIDIA demo is doing?
Are we using VGG16 n the model? SrResnet seems to build a model from Scratch?
What is a “learnable convolution” and what is an example of a convolution that isn’t learnable?
Curious about your context: isn’t what what the NVIDIA thing is doing?
Why are we using these little 3x3 squares of every color, instead of using noise in the new pixels?
I understand why we don’t just leave them blank, and maybe why we don’t copy the nearest-neighbors. But why not noise?
Does this mean I can replace
m = nn.DataParallel(m, [0,2])
with something, to get rid of the error below?
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorCopy.cu:204
Because then the sequential layers would functionally just be one layer, I think.
Yeah you probably want to change the
[0,2] to only contain numbers that actually correspond to GPU’s on your computer. Like, maybe
Can he explain progressive resizing again? I don’t understand how to use it
thanks … but yeah I had tried [0,0] and it didn’t help; [0,1] didn’t either.
I wonder how to find out what the correct values would be!!
Huh… I wonder if using load state_dict(strict=False) would work as a quick way to load weights from a pretrained model. Say: pretrained keras/tensflow retinanet, if you more/less match the architecture in pytorch.
Also, can we use progressive resizing to match the idea of backbone + head?
Is that a checkerboard pattern on the bluejay?