Lesson 18 official topic

In case this helps anyone, here’s the code I wrote while figuring out how resnet works. It’s more verbose but hopefully there’s a bit more info in case anyone is stuck.

I recommend reading the forward pass method first then go back and check init.


def noop(x, *args, **kwargs):
    return x

class ResBlock(nn.Module):
    def __init__(self, ni, nf, ks=3, stride=2, act=nn.ReLU, norm=None, bias=None):
        super().__init__()
        # Residual path
        self.convs = _conv_block(ni, nf, ks=ks, stride=stride, act=act, norm=norm, bias=bias)
        
        # Skip path/ shortcut path
        # Here we just decide what functions need to apply to the input
        # to allow the shapes to work out so that it can be to be added to the
        # output of self.convs / the residual path
        if ni == nf:
            #if the number of input channels = number of output channels, no need to conv the input to math the output from residual path
            self.idconv = noop
        else:
            # if not then make the simplest conv that can math input shape to output of residual path
            self.idconv = conv(ni, nf, ks=1, stride=1, act=None)


        if stride == 1:
        # If the residual path does not change the height and width of the image then no need to change height
        # and width of input to allow it to be added at the end
            self.pool = noop
        else:
            self.pool = nn.AvgPool2d(2, ceil_mode=True) # not sure why ceil_mode

        self.act = act()
    def forward(self, inp):
        # Calculate residual path
        res = self.convs(inp)

        # Fix shape of skip path
        skip = self.idconv(self.pool(inp)) # no change if ni==nf and stride=1. I wonder - does the order matter i.e. pool first then idconv? Need to check shapes 

        # Apply activation function
        out = self.act(res+skip)# This is the step that needs the idconv and pool ops in case of shape mismatch
        return out
1 Like

Hi all,

I wrote a blog about optimizers (SGD, RMSprop, and Adam).

I wanted to graph gradients like how we did with weights, so I used backward hooks. I wanted to implement classes like we did in the course, but they did not work very well for me.

Adam does have more stable gradients than SGD and RMSprop, so it was interesting to look at that. I originally wanted to track other parameters like beta1 and beta2 as well, but I could not figure out how to do that easily. I will probably do it later.

1 Like

Hi all,

I don’t understand why the last layer of the resnet model is a nn.BatchNorm1d(10) (in the 13_resnet.ipynb notebook). Why did it change ? Why are we not using softmax here as we used to do ?

Hi,

Jeremy said using batchnorm works well in the lesson. And we have been using convolutional neural nets, so we haven’t been using softmax.

Thank you for your answer :blush:. My bad I’m following the lecture using another language and got confused. My understanding is that the softmax activation (or similar) is required for the cross entropy loss function and I guess the pytorch one performs the softmax before computing the loss so no need to have this activation in the network.

Hi, I wrote about Resnet in my blog. I went over the paper and wrote the code version.
In the second part, I trained the model using different kinds of convolutional blocks. I also tried using nn.ReLU instead of GeneralReLU we’ve been using. I found out that they both have the same accuracy, but nn.ReLU trained faster. I guess they have the same accuracy because of the batch norm.

I hope this is useful.

1 Like

I made a pixel swap data augmentation. it uses the exact same pixels as the original, so the pixel statistics of the images should be preserved (mean and stdev). It is pretty slow since, there is an inner loop in python, so I made it swap blocks of 3x3 to speed it up a bit.

def pixel_swap(xb, nswaps=4):
    nrows = xb[0].shape[-2] - 4
    ncols = xb[0].shape[-1] - 4
    idxs = [(int(random.random()*nrows), int(random.random()*ncols), int(random.random()*nrows), int(random.random()*ncols))
            for _ in range(nswaps)]
    for x in xb:
        for (dtx, dty, stx, sty) in idxs:
            tmp = x[:,dtx:dtx+4, dty:dty+4]
            x[:, dtx:dtx+4, dty:dty+4] = x[:, stx:stx+4, sty:sty+4] 
            x[:, stx:stx+4, sty:sty+4] = tmp
    return xb
class PixelSwap(nn.Module):
    def __init__(self, nswaps=4):
        super().__init__()
        self.nswaps = nswaps
    def forward(self, x): return pixel_swap(x, self.nswaps)

Training after 20 epochs got:

0.941 0.172 19 eval

Which is better than the RandErase example.

1 Like

Not sure if its only me but my google colab keeps crashing as System RAM gets full, although GPU RAM stays well below threshold, I have tried lowering batch size and clearing cuda memory often, yet still crashes.

any solutions?