Lesson 3 official topic

i have a question : Why do we haven’t had Pclass_3 column in the exel table at 1:09:40 of the video

There was a similar (or same? ig) question asked during the lecture.
pclass can take three values, i.e., 1, 2, and 3.
pclass_1’s value is either 0 or 1. 0 indicates that pclass’s value is not 1 and 1 indicates otherwise.
pclass_2’s value is, again, either 0 or 1. Again, 0 indicates that pclass’s value is not 2 and 1 indicates otherwise.
For example, say, pclass has a value of 3. This would mean that pclass_1 = 0 and pclass_2 = 0 which implies that a variable pclass_3, should we choose to define it, would have a value pclass_3 = 1. But as you might have noticed in this example, pclass_1’s and pclass_2’s values are enough to tell that the value in pclass is 3. A new variable, pclass_3, would be redundant for the task at hand and the model does just fine without it.

1 Like

You can get the full set like this:

image_path = untar_data(URLs.MNIST)

I ended up figuring this out - honestly I think this question for chapter 4 of the book is premature and not very helpful, as chapter 5 basically answers it and shows a more efficient way of formatting this data. I ended up taking a small pytorch tutorial to figure out what I was doing wrong!

1 Like

Do you mind sharing how you formatted your data (a Colab or GitHub link would be fine)? I found it to be pretty straightforward because I first visualized the input shapes and output shapes for the matrix multiplication operations (different layers).
Here’s how I did it: AsquirousSpeaks - Classifying handwritten digits (THE MNIST!)

Also, please share your implementation for all the 10 digits as I’d like to see the implementations of SGD for the full dataset and if it differs from what’s there in chapter 4 (I personally didn’t do the whole SGD thing and used fastai methods).

Got it , Thank you


The “Universal Approximation Theorem” states that a neural network with 1 hidden layer can approximate any function.

If we use ReLu, are we not using only functions with positive slopes?

Probable I am misunderstanding some concepts but I couldn’t figure it out by my self.


There was a similar question asked in the discord server here. I hope that answers it


Thank you

Hi everyone! I just completed training a model for the full MNIST dataset, adapting the content from chapter 4 in the book to work for multiple digits. Here’s my blog post about it Fast.ai Chapter 4: Full MNIST Challenge | by Jack Driscoll | Nov, 2023 | Medium

Feedback is welcome!

Hey gang!
I’ve been doing a recap+quiz blogpost for the lessons.
Here’s lesson3: Giant Morons 🧠 - FastAI Lesson 3

It features a tenacious animal, brought to you by dall-e, which I generated to inspire me. Read on to find out which animal!

My plan is to feature a new tenacious animal for every lesson going forward, so that’ll be your enticement for reading future posts (if my prose doesn’t do it :stuck_out_tongue_winking_eye:)

Hi would like some help for clarity. To better follow the book I was converting implicit using of parameters to explicit. It stoped working when started using PyTorch Linear model as I assume it operates on implicitly having certain variables. What would be a best resource to know which variables are expected implicitly, e.g. params, weights, bias, dl, lr and so on?

In my case linear_model as Linear.forward() will implicitly take parameters and learning data to perform equivalent of the following. But it is not clear for me how it does.

def linear1(xb): return xb@weights + bias


def linear1_explicit(xb, weights=weights, bias=bias): return xb@weights + bias

Thank you in advance!

Hi everyone! I loved lesson 3 and in particular the idea of using a simple spreadsheet to demonstrate that the core technique behind deep learning (gradient descent) is not “rocket science”… even if using a potentially complex solver as a black box to optimize the params seems to be a bit of a cheat ;).

As an exercise, I rewrote the gradient descent solution for Titanic survival predictions as a Kaggle notebook. (It’s also my first Kaggle notebook, thanks to the fastai course I’m discovering a lot of cool tech for the first time :dizzy:.) Feedback welcome!

Thank you for this amazing course to Jeremy and to everyone in the community, you rock!

Thank you! Signed up here just to post about the same thing. I was re-creating the quadratic example from scratch, and found that when I graphed the loss function it plotted out what looked like a sine wave. Asked ChatGPT, gave it my code, and it said that I appeared to be missing the code to zero out the gradient within the loop.

In the Titanic example, the Lin2 and ReLU 2 seem to be using the raw data as input against the weights. However, earlier in the course, it is mentioned that, for each layer, we’re

using the outputs of the previous layer as the inputs to the next layer

Shouldn’t Lin1 or Relu1 be passed as input to ReLU2? Or where does this “input to the next layer” process happen?


1 Like

or perhaps it’s a “one layer” neural network?

Has anyone had success actually using the model to predict whether an image is a 3 or a 7? I tried learn.predict similarly to the bear classifier but running into errors. Here is my simple test

x,y = first(dataloader)
print("X SHAPE",x.shape)

Here is the output

Sequential (Input shape: 256 x 784)

Layer (type) Output Shape Param # Trainable

                 256 x 30            

Linear 23550 True

                 256 x 1             

Linear 31 True

X SHAPE torch.Size([256, 784])

And then Im getting the error
‘list’ object has no attribute ‘decode_batch’

My x and the models input are the same shape. Does anyone know what is causing this ?

I am having trouble understanding from the book 05_pet_breeds, how you were able to arrive at this learning rate ?

We can see on this plot that in the range 1e-6 to 1e-3, nothing really happens and the model doesn’t train. Then the loss starts to decrease until it reaches a minimum, and then increases again. We don’t want a learning rate greater than 1e-1 as it will give a training that diverges like the one before (you can try for yourself), but 1e-1 is already too high: at this stage we’ve left the period where the loss was decreasing steadily.

In this learning rate plot it appears that a learning rate around 3e-3 would be appropriate, so let’s choose that:

I’m having trouble understanding the 3e-3 part.


hi, did anyone try to replicate the excel neural net into google sheets? I tried the add-on for google sheets called “open solver”. It looks very similar to the solver in excel, but the model used to solve it is not the same and they are for linear models. Which ours doesn’t seem to be… a bit confused here.


Hi, I ran into this as well. It appears to be an incompatibility with the latest versions of timm. I took a few guesses based on when the fastai version was released and found that version 0.6.13 of timm works for me so try:
! pip install timm=0.6.13

When I run 03-which-image-models-are-best.ipynb I get empty plots. I figured something has changed in of GitHub - huggingface/pytorch-image-models: PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more since the notebook was put together.

I can see plots if I go back to a specic commit of pytorch-image-models e.g.

! git clone https://github.com/rwightman/pytorch-image-models.git
%cd pytorch-image-models
! git reset --hard 02b806e
%cd results
1 Like