Part 2 Lesson 14 Wiki

rachel · May 1, 2018, 1:02am

This is a wiki thread. Please add links/tips/etc.

<<< Wiki: Lesson 13

Lesson video
Lesson notes from @hiromi

Links

Dynamic UNet

Papers

Perceptual Losses for Real-Time Style Transfer and Super Resolution
Enhanced Deep Residual Networks for Single Image Super-Resolution
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize
U-Net: Convolutional Networks for Biomedical Image Segmentation

YangL · May 1, 2018, 1:05am

What is your seq2seq_reg doing?

def seq2seq_reg(output, xtra, loss, alpha=0, beta=0):
    hs,dropped_hs = xtra
    if alpha:  # Activation Regularization
        loss = loss + sum(alpha * dropped_hs[-1].pow(2).mean())
    if beta:   # Temporal Activation Regularization (slowness)
        h = hs[-1]
        if len(h)>1: loss = loss + sum(beta * (h[1:] - h[:-1]).pow(2).mean())
    return loss

Alpha part is relatively easy to see: it’s L2 reg. of last hidden layer, right? (not sure about Dropped part. )

Beta part is beyond me.

Read paper, still confused.

KevinB · May 1, 2018, 2:03am

Maybe a dumb question, but why do you need a ReLU at all? could you possibly just have two back to back convs there because ReLU is also changing things isn’t it?

adrian · May 1, 2018, 2:06am

Re: BatchNorm: Parallel processing on multi GPU’s - tips for doing this with current fastai codeset?

blakewest · May 1, 2018, 2:07am

Could you do gradient clipping or lower learning rates at the beginning? And why is res scaling different than reducing the learning rate? Just curious if he tried other more normal tricks before going to this strange res_scaling thing.

sgugger · May 1, 2018, 2:08am

If you want to try LARS, it’s very easy to implement as an optimizer in pytorch (did it in this gist).

gerardo · May 1, 2018, 2:11am

Isn’t that what the NVIDIA demo is doing?

ramesh · May 1, 2018, 2:12am

Are we using VGG16 n the model? SrResnet seems to build a model from Scratch?

kro · May 1, 2018, 2:17am

What is a “learnable convolution” and what is an example of a convolution that isn’t learnable?

kro · May 1, 2018, 2:17am

Curious about your context: isn’t what what the NVIDIA thing is doing?

gerardo · May 1, 2018, 2:20am

kro · May 1, 2018, 2:22am

Why are we using these little 3x3 squares of every color, instead of using noise in the new pixels?

I understand why we don’t just leave them blank, and maybe why we don’t copy the nearest-neighbors. But why not noise?

kro · May 1, 2018, 2:24am

Does this mean I can replace

m = nn.DataParallel(m, [0,2])

with something, to get rid of the error below?

RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/THCTensorCopy.cu:204

kro · May 1, 2018, 2:25am

Because then the sequential layers would functionally just be one layer, I think.

blakewest · May 1, 2018, 2:34am

Yeah you probably want to change the [0,2] to only contain numbers that actually correspond to GPU’s on your computer. Like, maybe [0,1]?

nchukaobah · May 1, 2018, 2:34am

Can he explain progressive resizing again? I don’t understand how to use it

kro · May 1, 2018, 2:35am

thanks … but yeah I had tried [0,0] and it didn’t help; [0,1] didn’t either.
I wonder how to find out what the correct values would be!!

Borz · May 1, 2018, 2:38am

Huh… I wonder if using load state_dict(strict=False) would work as a quick way to load weights from a pretrained model. Say: pretrained keras/tensflow retinanet, if you more/less match the architecture in pytorch.

snagpaul · May 1, 2018, 2:39am

Also, can we use progressive resizing to match the idea of backbone + head?

KevinB · May 1, 2018, 2:40am

Is that a checkerboard pattern on the bluejay?