In case this helps anyone, here’s the code I wrote while figuring out how resnet works. It’s more verbose but hopefully there’s a bit more info in case anyone is stuck.
I recommend reading the forward pass method first then go back and check init.
def noop(x, *args, **kwargs):
return x
class ResBlock(nn.Module):
def __init__(self, ni, nf, ks=3, stride=2, act=nn.ReLU, norm=None, bias=None):
super().__init__()
# Residual path
self.convs = _conv_block(ni, nf, ks=ks, stride=stride, act=act, norm=norm, bias=bias)
# Skip path/ shortcut path
# Here we just decide what functions need to apply to the input
# to allow the shapes to work out so that it can be to be added to the
# output of self.convs / the residual path
if ni == nf:
#if the number of input channels = number of output channels, no need to conv the input to math the output from residual path
self.idconv = noop
else:
# if not then make the simplest conv that can math input shape to output of residual path
self.idconv = conv(ni, nf, ks=1, stride=1, act=None)
if stride == 1:
# If the residual path does not change the height and width of the image then no need to change height
# and width of input to allow it to be added at the end
self.pool = noop
else:
self.pool = nn.AvgPool2d(2, ceil_mode=True) # not sure why ceil_mode
self.act = act()
def forward(self, inp):
# Calculate residual path
res = self.convs(inp)
# Fix shape of skip path
skip = self.idconv(self.pool(inp)) # no change if ni==nf and stride=1. I wonder - does the order matter i.e. pool first then idconv? Need to check shapes
# Apply activation function
out = self.act(res+skip)# This is the step that needs the idconv and pool ops in case of shape mismatch
return out
I wrote a blog about optimizers (SGD, RMSprop, and Adam).
I wanted to graph gradients like how we did with weights, so I used backward hooks. I wanted to implement classes like we did in the course, but they did not work very well for me.
Adam does have more stable gradients than SGD and RMSprop, so it was interesting to look at that. I originally wanted to track other parameters like beta1 and beta2 as well, but I could not figure out how to do that easily. I will probably do it later.
I don’t understand why the last layer of the resnet model is a nn.BatchNorm1d(10) (in the 13_resnet.ipynb notebook). Why did it change ? Why are we not using softmax here as we used to do ?
Thank you for your answer . My bad I’m following the lecture using another language and got confused. My understanding is that the softmax activation (or similar) is required for the cross entropy loss function and I guess the pytorch one performs the softmax before computing the loss so no need to have this activation in the network.
Hi, I wrote about Resnet in my blog. I went over the paper and wrote the code version.
In the second part, I trained the model using different kinds of convolutional blocks. I also tried using nn.ReLU instead of GeneralReLU we’ve been using. I found out that they both have the same accuracy, but nn.ReLU trained faster. I guess they have the same accuracy because of the batch norm.
I made a pixel swap data augmentation. it uses the exact same pixels as the original, so the pixel statistics of the images should be preserved (mean and stdev). It is pretty slow since, there is an inner loop in python, so I made it swap blocks of 3x3 to speed it up a bit.
def pixel_swap(xb, nswaps=4):
nrows = xb[0].shape[-2] - 4
ncols = xb[0].shape[-1] - 4
idxs = [(int(random.random()*nrows), int(random.random()*ncols), int(random.random()*nrows), int(random.random()*ncols))
for _ in range(nswaps)]
for x in xb:
for (dtx, dty, stx, sty) in idxs:
tmp = x[:,dtx:dtx+4, dty:dty+4]
x[:, dtx:dtx+4, dty:dty+4] = x[:, stx:stx+4, sty:sty+4]
x[:, stx:stx+4, sty:sty+4] = tmp
return xb
Not sure if its only me but my google colab keeps crashing as System RAM gets full, although GPU RAM stays well below threshold, I have tried lowering batch size and clearing cuda memory often, yet still crashes.