Lesson 3 official topic

Hi, I’m trying to train a model on the whole MNIST set

class BasicOptimizer:
    def __init__(self, model, lr):
        self.model = model
        self.lr = lr
    def step(self):
        self.model.w1.data = self.model.w1.data - self.lr * self.model.w1.grad.data
        self.model.b1.data = self.model.b1.data - self.lr * self.model.b1.grad.data
        self.model.w2.data = self.model.w2.data - self.lr * self.model.w2.grad.data
        self.model.b2.data = self.model.b2.data - self.lr * self.model.b2.grad.data
    def zero_grad(self):
        self.model.w1.grad = None
        self.model.b1.grad = None
        self.model.w2.grad = None
        self.model.b2.grad = None
class Model():
    def __init__(self):
        self.w1 = self.init_parameters((28*28,30))
        self.b1 = self.init_parameters(30)
        self.w2 = self.init_parameters((30,10))
        self.b2 = self.init_parameters(10)
    def init_parameters(self, size):
        return torch.randn(size).requires_grad_()

    def forward(self, x):
        res = x@self.w1 + self.b1
        res = res.max(tensor(0.0))
        res = res@self.w2 + self.b2
        res = torch.nn.Softmax( dim=1 )(res)
        return res
class SimpleNetwork:
    def __init__(self, dl, val_dl, model, optimizer):
        self.dl = dl
        self.val_dl = val_dl
        self.model = model
        self.optimizer = optimizer
    def prediction(self, x):
        return self.model.forward(x)
    def loss(self, pred, target):
        return torch.nn.functional.cross_entropy(pred, target)
    def step(self, x, y):
        pred = self.prediction(x)
        loss = self.loss(pred, y)
    def validate(self, x, y):
        with torch.no_grad():
            pred = self.prediction(x)
            val = torch.eq(
                torch.argmax( pred, dim=1 ),
                torch.argmax( y, dim=1 )
        return val
    def learn(self, epochs):
        for epoch in range(epochs):
            print( "epoch " + str(epoch) )
            for xb, yb in self.dl:
                self.step(xb, yb)
            accs = [ self.validate(val_xb, val_yb) for val_xb, val_yb in self.val_dl ]
            acc = round( torch.stack( accs ).mean().item(), 4 )
            print("Accuracy " + str(acc))

My accuracy is going up which is good, but is very low. What I’m doing wrong?

epoch 0
Accuracy 0.127
epoch 1
Accuracy 0.1287
epoch 2
Accuracy 0.1299
epoch 3
Accuracy 0.1316
epoch 4
Accuracy 0.1327
epoch 5
Accuracy 0.1335
epoch 6
Accuracy 0.1352
epoch 7
Accuracy 0.1358
epoch 8
Accuracy 0.1367
epoch 9
Accuracy 0.1377

Found the issue, my learning rate was too small & model didn’t have enough nodes & layer.

When I try to list the timm models, I only receive a subset that do not include those in the video (especially _base.in22k).

!pip install timm
import timm



which includes the basic base but not 384 or in22k… Any advice? Even in his video it does not include the _base.clip* options that are displayed here:

1 Like

perhaps more complexing, it says timm is still not defined after displaying the architectures. I also tried with simply ‘convnext_base’ since that is what was returned from the list of convnext architectures.

!pip install timm
import timm


learn=vision_learner(dls,'convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384', metrics=error_rate).to_fp16()


Requirement already satisfied: mpmath>=0.19 in /home/ubuntu/.local/lib/python3.8/site-packages (from sympy->torch>=1.7->timm) (1.3.0)
['convnext_atto', 'convnext_atto_ols', 'convnext_base', 'convnext_femto', 'convnext_femto_ols', 'convnext_large', 'convnext_large_mlp', 'convnext_nano', 'convnext_nano_ols', 'convnext_pico', 'convnext_pico_ols', 'convnext_small', 'convnext_tiny', 'convnext_tiny_hnf', 'convnext_xlarge', 'convnext_xxlarge', 'convnextv2_atto', 'convnextv2_base', 'convnextv2_femto', 'convnextv2_huge', 'convnextv2_large', 'convnextv2_nano', 'convnextv2_pico', 'convnextv2_small', 'convnextv2_tiny']
NameError                                 Traceback (most recent call last)
Cell In[26], line 6
      2 import timm
      4 print(timm.list_models('convnext*'))
----> 6 learn=vision_learner(dls,'convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384', metrics=error_rate).to_fp16()

File ~/.local/lib/python3.8/site-packages/fastai/vision/learner.py:224, in vision_learner(dls, arch, normalize, n_out, pretrained, loss_func, opt_func, lr, splitter, cbs, metrics, path, model_dir, wd, wd_bn_bias, train_bn, moms, cut, init, custom_head, concat_pool, pool, lin_ftrs, ps, first_bn, bn_final, lin_first, y_range, **kwargs)
    222 n_in = kwargs['n_in'] if 'n_in' in kwargs else 3
    223 if isinstance(arch, str):
--> 224     model,cfg = create_timm_model(arch, n_out, default_split, pretrained, **model_args)
    225     if normalize: _timm_norm(dls, cfg, pretrained, n_in)
    226 else:

File ~/.local/lib/python3.8/site-packages/fastai/vision/learner.py:183, in create_timm_model(arch, n_out, cut, pretrained, n_in, init, custom_head, concat_pool, pool, lin_ftrs, ps, first_bn, bn_final, lin_first, y_range, **kwargs)
    180 def create_timm_model(arch, n_out, cut=None, pretrained=True, n_in=3, init=nn.init.kaiming_normal_, custom_head=None,
    181                      concat_pool=True, pool=True, lin_ftrs=None, ps=0.5, first_bn=True, bn_final=False, lin_first=False, y_range=None, **kwargs):
    182     "Create custom architecture using `arch`, `n_in` and `n_out` from the `timm` library"
--> 183     model = timm.create_model(arch, pretrained=pretrained, num_classes=0, in_chans=n_in, **kwargs)
    184     body = TimmBody(model, pretrained, None, n_in)
    185     nf = body.model.num_features

NameError: name 'timm' is not defined

Hi there, your loss seems to be oscillating, converging and again overshooting every time. It is probably a case of high learning rate. Have you tried reducing the learning rate? I believe that might fix this. Do let me know how it goes. I’ll be trying that soon too

I have a question regarding ResizeMethod.Pad, pad_mode='zeros'. I am using it on my images which are of same size 1088x1920 pixels. On resizing to 224 and padding with zeros create padding not in center location but in uneven locations. For example: See the padding in first 2 images and then see how padding is in the rest of the images .What can I do to avoid it?

Is not Pad supposed to be used in cases where the images has uneven sizes?

If every image has the same size, What is the benefit of using Pad?

Hello every one!

Short version: Why does Jeremy use a normal distribution for the add_noise function to update the array of points? Why not just simple random?

At Lesson 3: Practical Deep Learning for Coders 2022 - YouTube at minute 27:57

Jeremy creates an array of dots with a polinomial function. He also uses an add_noise() function which adds randomness to train the model… but he uses a normal distribution for the randomness… Why not just simple randomness? would It need more points? Do you know why he does it?


To make non-square images square.

1 Like

I suppose he uses it either arbitrarily, or for clarity’s sake.

If you look at the image above, by adding Gaussian noise to the quadratic, points around the vertex are moved around much more than the trailing points.

1 Like

The reason is gaussian noise has mean 0 and std=1. When you multiply noise by a small factor like 0.1 or 0.2, you are only controlling the deviation your points would have after you add this noise to them. So, when you add gaussian noise the mean of the underlying data does not change with a small standard deviation. So, if you add gaussian noise to points derived from 3x^2 + 2x + 1, then on fitting a model ax^2 + bx + c on those ‘noisy’ points, you will get a =2.99999 (or 3 or very close to 3), similarly, b = 1.9999 (or 2) and c = 0.9999 (or 1). If you didn’t have random noise then this distribution will change and hence your coefficients.


I am getting an undefined error when trying to use plot_function and I don’t find any recent solution to this. I have managed to find a solution to this, but I am wondering about where it is or what is supposed to replace it.

Here is the code I have adapted from some comments:

def plot_function(f, tx=None, ty=None, title=None, min=-2, max=2, figsize=(6,4)):
    x = torch.linspace(min,max, steps=100)
    fig,ax = plt.subplots(figsize=figsize)
    if tx is not None: ax.set_xlabel(tx)
    if ty is not None: ax.set_ylabel(ty)
    if title is not None: ax.set_title(title)

I checked and this function is working for me. May be you want to share what is not working for you.
Only thing I can guess right now is that there might be something wrong with your implementation of f . If you are using partial like in Jeremy’s class, re-check it. Otherwise post complete code here.

As I said, this is an adaptation I took. It 's , but if I am correct such a function is supposed to be in fastai. And it wasn’t working.

At some point in the lecture Jeremy mentioned that he’d rather start with one of the smaller models and tweak data-augmentations etc., so I was wondering as to how invariant the effectiveness of these augmentations and tweaks are to changing models in general?

Let’s say we find that doing a certain data-augmentation “a” works wonders with a smaller resnet model while another augmentation “b” does almost notthing.

  1. Can we assume that “a” would work better than “b” on bigger resnet models?
  2. Can we assume that “a” would work better than “b” on other types of models?

Would you have some thoughts on this? Question about the value that decides if an image is a 3 or a 7 - #2 by ihavequestions I simply don’t understand the intuition of why >0 equals a good prediction.

Let us start from the beginning. Randomly initializing weights gives values like so
We know that train_x has values between 0 and 1. So, train_x@weights (with weights initialized randomly) gives outputs like so

This confirms that our linear model outputs both negative and positive values around 0. Convincing yourself of this and then reading @ benkarr 's comment should help you understand why > 0.0 equals a good prediction.


I have a very general question that doesn’t need a specific answer but rather some advice.

I’m finding it very difficult to wrap my head around data formatting. I did all the practice questions from chapter 4 of the book and I feel like I have a really good grasp on the concepts.

However I struggled a lot with the final question for building an MNIST training model from scratch for the full MNIST set. I didn’t struggle with the SGD portion or actually training the model, but rather formatting the data correctly. I ended up finding a tutorial on how to do the problem, but I still just don’t really understand how to format tensors correctly to run models. PyTorch and Fastai seem to do a lot of the formatting for you, but I really want to understand what’s going on under the hood.

Does anyone have advice on how to practice this or additional reads / courses to go through to understand how to format tensors to solve specific problems?

1 Like

I was wondering if anyone else’s dfs have no rows?

I traced the error back to the get_data function. My df has one row, up until the final line, return df[df.family.str.contains('^re[sg]netd?|beit|convnext|levit|efficient|vit|vgg')], which gives it zero rows, hence no data in the output.

Unsure what they were trying to do, hence it’s quite difficult/impossible to debug. Any ideas?

Hi! How did you get the full MNIST dataset? I might be able to help you if I understand that.