Lesson 18 official topic

Riverbank3156 · September 12, 2023, 11:48am

In case this helps anyone, here’s the code I wrote while figuring out how resnet works. It’s more verbose but hopefully there’s a bit more info in case anyone is stuck.

I recommend reading the forward pass method first then go back and check init.


def noop(x, *args, **kwargs):
    return x

class ResBlock(nn.Module):
    def __init__(self, ni, nf, ks=3, stride=2, act=nn.ReLU, norm=None, bias=None):
        super().__init__()
        # Residual path
        self.convs = _conv_block(ni, nf, ks=ks, stride=stride, act=act, norm=norm, bias=bias)
        
        # Skip path/ shortcut path
        # Here we just decide what functions need to apply to the input
        # to allow the shapes to work out so that it can be to be added to the
        # output of self.convs / the residual path
        if ni == nf:
            #if the number of input channels = number of output channels, no need to conv the input to math the output from residual path
            self.idconv = noop
        else:
            # if not then make the simplest conv that can math input shape to output of residual path
            self.idconv = conv(ni, nf, ks=1, stride=1, act=None)


        if stride == 1:
        # If the residual path does not change the height and width of the image then no need to change height
        # and width of input to allow it to be added at the end
            self.pool = noop
        else:
            self.pool = nn.AvgPool2d(2, ceil_mode=True) # not sure why ceil_mode

        self.act = act()
    def forward(self, inp):
        # Calculate residual path
        res = self.convs(inp)

        # Fix shape of skip path
        skip = self.idconv(self.pool(inp)) # no change if ni==nf and stride=1. I wonder - does the order matter i.e. pool first then idconv? Need to check shapes 

        # Apply activation function
        out = self.act(res+skip)# This is the step that needs the idconv and pool ops in case of shape mismatch
        return out

galopy · November 7, 2023, 4:55am

Hi all,

I wrote a blog about optimizers (SGD, RMSprop, and Adam).

I wanted to graph gradients like how we did with weights, so I used backward hooks. I wanted to implement classes like we did in the course, but they did not work very well for me.

Adam does have more stable gradients than SGD and RMSprop, so it was interesting to look at that. I originally wanted to track other parameters like beta1 and beta2 as well, but I could not figure out how to do that easily. I will probably do it later.

rmonjo · November 9, 2023, 1:01pm

Hi all,

I don’t understand why the last layer of the resnet model is a nn.BatchNorm1d(10) (in the 13_resnet.ipynb notebook). Why did it change ? Why are we not using softmax here as we used to do ?

galopy · November 9, 2023, 6:58pm

Hi,

Jeremy said using batchnorm works well in the lesson. And we have been using convolutional neural nets, so we haven’t been using softmax.

rmonjo · November 10, 2023, 9:04am

Thank you for your answer . My bad I’m following the lecture using another language and got confused. My understanding is that the softmax activation (or similar) is required for the cross entropy loss function and I guess the pytorch one performs the softmax before computing the loss so no need to have this activation in the network.

galopy · January 17, 2024, 5:02am

Hi, I wrote about Resnet in my blog. I went over the paper and wrote the code version.
In the second part, I trained the model using different kinds of convolutional blocks. I also tried using nn.ReLU instead of GeneralReLU we’ve been using. I found out that they both have the same accuracy, but nn.ReLU trained faster. I guess they have the same accuracy because of the batch norm.

I hope this is useful.

corkangel · March 14, 2024, 11:11pm

I made a pixel swap data augmentation. it uses the exact same pixels as the original, so the pixel statistics of the images should be preserved (mean and stdev). It is pretty slow since, there is an inner loop in python, so I made it swap blocks of 3x3 to speed it up a bit.

def pixel_swap(xb, nswaps=4):
    nrows = xb[0].shape[-2] - 4
    ncols = xb[0].shape[-1] - 4
    idxs = [(int(random.random()*nrows), int(random.random()*ncols), int(random.random()*nrows), int(random.random()*ncols))
            for _ in range(nswaps)]
    for x in xb:
        for (dtx, dty, stx, sty) in idxs:
            tmp = x[:,dtx:dtx+4, dty:dty+4]
            x[:, dtx:dtx+4, dty:dty+4] = x[:, stx:stx+4, sty:sty+4] 
            x[:, stx:stx+4, sty:sty+4] = tmp
    return xb

class PixelSwap(nn.Module):
    def __init__(self, nswaps=4):
        super().__init__()
        self.nswaps = nswaps
    def forward(self, x): return pixel_swap(x, self.nswaps)

Training after 20 epochs got:

0.941 0.172 19 eval

Which is better than the RandErase example.

Mafaz-3 · March 23, 2024, 11:42am

Not sure if its only me but my google colab keeps crashing as System RAM gets full, although GPU RAM stays well below threshold, I have tried lowering batch size and clearing cuda memory often, yet still crashes.

any solutions?

pkai · December 17, 2024, 11:16pm

Up to this lesson in the fastAI Part2 course, I’ve been running the lessons’ jupyter notebooks on my 5-year old, intel-based Macbook Pro (pytorch does not run on the old gpus on this laptop) with acceptable performance when running training/fitting in the lesson notebooks.

However, each of the the learn.fit() steps in 14_augment.ipynb is now taking over 30 minutes on my MBP - too slow for my taste.

After considering and performing some quick hands-on with Google Colab (it’s hard to get the environment right to work with fastAI P2 notebooks) and MS Azure (I think I can get it to work - however, even though Azure ML Workspace is free, you need to provision a VM (with GPU) to support the workspace and VM is NOT free (well, at least my free trial period for Azure ran out a long time ago while using Azure for some other things unrelated to ML/AI), I am happy to report that I was able to get a freemium access to Lightning.ai and start running 14_augment.ipynb successfully. learn.fit() in that notebook each takes about 3 minutes to run on 1 A10G GPU (free).

There is a monthly $15 free credit for the ‘freemium’ tier. It might be enough to do the fastai P2 course at the pace I’m going through the lessons – we shall see. I’m pretty happy with what I see so far - so if I have to splurge for a paid subscription on lightning.ai, I might be happy to do it.

pkai · December 19, 2024, 7:44pm

2nd day of using lightnin.ai freemium account for running 14_augment.ipynb:

One disadvantage of the freemium account is that if your session is idle for more than 10 minutes, the session ‘goes to sleep’. It means that all runtime RAM (ie, values stored in the variables in the already executed notebook cells are lost) is lost after waking up from the sleep. Things like files and install python packages in your ‘studio’ persist, though.
In my case, I was half way through running the cells in my copy of 14_augment.ipynb yesterday. Today when I want to continue with following 14_augment.ipynb, I had to restart the kernel on GPU and rerun the cells from the beginning. I’m getting to the part of the notebook where different models are being fitted and each fit runs 10 to 50 epochs. I burned through about $7 of free credits so far today. $15 of free credits are not going to last long…

madfatlad · May 18, 2025, 7:12am

Just Completed the homework given for this lesson, I believe it is the simplest implementation while still satisfying the rules for it being correct (works with BaseSchedCB).

  def __init__(self, opt, max_step): self.opt = opt; self.max_step = max_step
  def step(self):
    self.num_step = getattr(self, 'num_step', 0) + 1
    self.max_lr = getattr(self, 'lr', self.opt.param_groups[0]['lr'])
    for g in self.opt.param_groups: 
      g['lr'] = 0.5 * self.max_lr * (1 + math.cos(math.pi * self.num_step / self.max_step))

class OneCycleLR():
  def __init__(self, opt, max_step): 
    self.opt = opt
    self.max_step = max_step
    self.max_lr = self.opt.param_groups[0]['lr']
    self.min_lr = self.max_lr/250000
    self.pct_start = 0.30
    self.base_mom = 0.85
    self.max_mom = 0.95
    for g in learn.opt.param_groups:
        g['lr'] = self.min_lr
        g['momentum'] = self.max_mom

  def step(self):
    self.num_step = getattr(self, 'num_step', 0) + 1
    self.p = self.num_step/self.max_step

    for g in learn.opt.param_groups:
      if self.p < self.pct_start:                                     # warm-up
          r = self.p / self.pct_start                                 
          g['lr'] = self.min_lr + 0.5 * (self.max_lr - self.min_lr) * (1 - math.cos(math.pi * r))
          g['momentum'] = self.max_mom - 0.5 * (self.max_mom - self.base_mom) * (1 - math.cos(math.pi * r))
      else:                                                           # cool-down
          r = (self.p - self.pct_start) / (1 - self.pct_start)        
          g['lr'] = self.min_lr + 0.5 * (self.max_lr - self.min_lr) * (1 + math.cos(math.pi * r))
          g['momentum'] = self.base_mom + 0.5 * (self.max_mom - self.base_mom) * (1 - math.cos(math.pi * r))

It performs almost as good as PyTorch’s Implementation.