Lesson 3 official topic

The reason is gaussian noise has mean 0 and std=1. When you multiply noise by a small factor like 0.1 or 0.2, you are only controlling the deviation your points would have after you add this noise to them. So, when you add gaussian noise the mean of the underlying data does not change with a small standard deviation. So, if you add gaussian noise to points derived from 3x^2 + 2x + 1, then on fitting a model ax^2 + bx + c on those ‘noisy’ points, you will get a =2.99999 (or 3 or very close to 3), similarly, b = 1.9999 (or 2) and c = 0.9999 (or 1). If you didn’t have random noise then this distribution will change and hence your coefficients.

2 Likes

I am getting an undefined error when trying to use plot_function and I don’t find any recent solution to this. I have managed to find a solution to this, but I am wondering about where it is or what is supposed to replace it.

Here is the code I have adapted from some comments:

def plot_function(f, tx=None, ty=None, title=None, min=-2, max=2, figsize=(6,4)):
    x = torch.linspace(min,max, steps=100)
    fig,ax = plt.subplots(figsize=figsize)
    ax.plot(x,f(x))
    if tx is not None: ax.set_xlabel(tx)
    if ty is not None: ax.set_ylabel(ty)
    if title is not None: ax.set_title(title)

I checked and this function is working for me. May be you want to share what is not working for you.
Only thing I can guess right now is that there might be something wrong with your implementation of f . If you are using partial like in Jeremy’s class, re-check it. Otherwise post complete code here.

As I said, this is an adaptation I took. It 's , but if I am correct such a function is supposed to be in fastai. And it wasn’t working.

At some point in the lecture Jeremy mentioned that he’d rather start with one of the smaller models and tweak data-augmentations etc., so I was wondering as to how invariant the effectiveness of these augmentations and tweaks are to changing models in general?

Let’s say we find that doing a certain data-augmentation “a” works wonders with a smaller resnet model while another augmentation “b” does almost notthing.

  1. Can we assume that “a” would work better than “b” on bigger resnet models?
  2. Can we assume that “a” would work better than “b” on other types of models?

Would you have some thoughts on this? Question about the value that decides if an image is a 3 or a 7 - #2 by ihavequestions I simply don’t understand the intuition of why >0 equals a good prediction.

1 Like

Let us start from the beginning. Randomly initializing weights gives values like so
image
We know that train_x has values between 0 and 1. So, train_x@weights (with weights initialized randomly) gives outputs like so


This confirms that our linear model outputs both negative and positive values around 0. Convincing yourself of this and then reading @ benkarr 's comment should help you understand why > 0.0 equals a good prediction.

3 Likes

I have a very general question that doesn’t need a specific answer but rather some advice.

I’m finding it very difficult to wrap my head around data formatting. I did all the practice questions from chapter 4 of the book and I feel like I have a really good grasp on the concepts.

However I struggled a lot with the final question for building an MNIST training model from scratch for the full MNIST set. I didn’t struggle with the SGD portion or actually training the model, but rather formatting the data correctly. I ended up finding a tutorial on how to do the problem, but I still just don’t really understand how to format tensors correctly to run models. PyTorch and Fastai seem to do a lot of the formatting for you, but I really want to understand what’s going on under the hood.

Does anyone have advice on how to practice this or additional reads / courses to go through to understand how to format tensors to solve specific problems?

1 Like

I was wondering if anyone else’s dfs have no rows?

I traced the error back to the get_data function. My df has one row, up until the final line, return df[df.family.str.contains('^re[sg]netd?|beit|convnext|levit|efficient|vit|vgg')], which gives it zero rows, hence no data in the output.

Unsure what they were trying to do, hence it’s quite difficult/impossible to debug. Any ideas?

Hi! How did you get the full MNIST dataset? I might be able to help you if I understand that.

i have a question : Why do we haven’t had Pclass_3 column in the exel table at 1:09:40 of the video

There was a similar (or same? ig) question asked during the lecture.
pclass can take three values, i.e., 1, 2, and 3.
pclass_1’s value is either 0 or 1. 0 indicates that pclass’s value is not 1 and 1 indicates otherwise.
pclass_2’s value is, again, either 0 or 1. Again, 0 indicates that pclass’s value is not 2 and 1 indicates otherwise.
For example, say, pclass has a value of 3. This would mean that pclass_1 = 0 and pclass_2 = 0 which implies that a variable pclass_3, should we choose to define it, would have a value pclass_3 = 1. But as you might have noticed in this example, pclass_1’s and pclass_2’s values are enough to tell that the value in pclass is 3. A new variable, pclass_3, would be redundant for the task at hand and the model does just fine without it.

1 Like

You can get the full set like this:

image_path = untar_data(URLs.MNIST)

I ended up figuring this out - honestly I think this question for chapter 4 of the book is premature and not very helpful, as chapter 5 basically answers it and shows a more efficient way of formatting this data. I ended up taking a small pytorch tutorial to figure out what I was doing wrong!

1 Like

Do you mind sharing how you formatted your data (a Colab or GitHub link would be fine)? I found it to be pretty straightforward because I first visualized the input shapes and output shapes for the matrix multiplication operations (different layers).
Here’s how I did it: AsquirousSpeaks - Classifying handwritten digits (THE MNIST!)

Also, please share your implementation for all the 10 digits as I’d like to see the implementations of SGD for the full dataset and if it differs from what’s there in chapter 4 (I personally didn’t do the whole SGD thing and used fastai methods).

Got it , Thank you

Question:

The “Universal Approximation Theorem” states that a neural network with 1 hidden layer can approximate any function.

If we use ReLu, are we not using only functions with positive slopes?

Probable I am misunderstanding some concepts but I couldn’t figure it out by my self.

Thanks

There was a similar question asked in the discord server here. I hope that answers it

2 Likes

Thank you

Hi everyone! I just completed training a model for the full MNIST dataset, adapting the content from chapter 4 in the book to work for multiple digits. Here’s my blog post about it Fast.ai Chapter 4: Full MNIST Challenge | by Jack Driscoll | Nov, 2023 | Medium

Feedback is welcome!

Hey gang!
I’ve been doing a recap+quiz blogpost for the lessons.
Here’s lesson3: Giant Morons 🧠 - FastAI Lesson 3

It features a tenacious animal, brought to you by dall-e, which I generated to inspire me. Read on to find out which animal!

My plan is to feature a new tenacious animal for every lesson going forward, so that’ll be your enticement for reading future posts (if my prose doesn’t do it :stuck_out_tongue_winking_eye:)