Lesson 3 official topic

oharlem · July 27, 2024, 4:50am

Hi,

What is the maximum accuracy you managed to get with the original implementation, i.e. non-ImageDataLoaders with resnet?

I can get only 20% while resnet is consistently above 98%:
https://github.com/oharlem/fastai-lesson3/blob/main/fastai_lesson3_custom_learner_full_MNIST.ipynb

Please advise!

Ty
D.

jeriga · July 30, 2024, 3:09pm

Hello there,

I tried to redo the Excel titanic example on kaggle notebook sharing it here. Looking for some feedbacks / comments

Thanks

mkw5053 · August 1, 2024, 5:23pm

I was also confused, so I went back to the workbook to look at the actual python definitions:

def mae(preds, acts): return (torch.abs(preds-acts)).mean()

def quad_mae(params):
    f = mk_quad(*params)
    return mae(f(x), y)

# earlier we defined y as a list of noisy measurements
# y = add_noise(f(x), 0.15, 1.5)

I think confusion might arise (as it did for me) because quad_mae uses y (which is all of the noisy measurements) without it needing to be passed in as a parameter.
Hope this helps!

heitorfr · August 5, 2024, 6:35pm

Hi. As an exercise for chapter 4 of the book I’ve tried a simple model to classify the full MNIST dataset. I’m getting about 92% with logistic regression and above 95% with a two layer model. Feedback is more than welcome: Fastbook ch4 MNIST complete | Kaggle

Thanks

DagmarC · September 4, 2024, 8:24am

Hello I have watched the Karpathy`s video that was mentioned here and made the article/notes from it, so if anyone is interested in reading itI post the link. The video itself is really great source and I recommend it as well. Link: Understanding Backpropagation and Chain Rule Through Micrograd and simple Netflix example: A Beginner’s Guide Inspired by Andrej Karpathy’s Tutorial | by Dag | Sep, 2024 | Medium

anonymoose12 · November 4, 2024, 3:35pm

I hope this is okay to share here… I found an issue that might be due to a new version of matplotlib regarding the matplotlib.rc?


#hide_output
im3_t = tensor(im3)
df = pd.DataFrame(im3_t[4:15,4:22])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

This code gives an attribute error (ColormapRegistry object has no attribute ‘get_cmap’).

The workaround that I found is the following:

plt.rcParams["image.cmap"] = 'Blues'
df.style.background_gradient(cmap=None)

I hope this helps anybody else who encountered a similar struggle

cinemarob · April 13, 2025, 8:13pm

2025 study group, anyone?

These concepts are no less relevant in the era of vibe coding. I am looking to make this learning experience more social and evenly-paced, if anybody would like to join me. I live in a UTC-4 timezone.

Omarr · April 25, 2025, 11:11am

If anyone looking for a refresher in calculus before doing the book chapter or even this lecture
3b1b
have a good playlist, the first 2 chapters is enough. You could watch it all would give a lot of intuition and visual representations.

codacml · May 1, 2025, 11:29am

Hi all,
In the microsoft excel exercise, Jeremy shows a gradient descent on two sets of parameters. If I want to visualize it in terms of a neural network architecture, would it be a neural network with one layer and two neurons as there are two sets of parameters and two outputs which we are summing up?

Thanks
Ajesh

adilsiraju · September 9, 2025, 5:26pm

Above is Jermeys Code.. I guess either he made a mistake.. or im having a doubt.
Why is he taking only avg from 662-715, shldnt he like take the full loss collumn ?

Like this ?

Can someone please explain this

JumpyJason · September 11, 2025, 1:38pm

I reckon that Jeremy was taking the loss from the validation set.

JorgePedro78 · March 3, 2026, 3:42pm

It took 3 lessons for me to be compelled to write something in this forum.
Just a small contribution, since I am using AI assistance to learn about ML.
For those struggling to understand some concepts, I found this simple analogy absolutely brilliant:
" Imagine you’re trying to tune an old analogue radio to find a station. You have one knob (the weight). You turn it, listen to how much static there is (the loss), and decide which direction to turn it next. You keep adjusting until the music is clear.
Now imagine a radio with a million knobs, all interacting with each other. That’s a neural network. The process of tuning is gradient descent."

Let’s keep learning!

blaise · March 4, 2026, 3:07am

How are you working with outdated packages and such? That’s one thing that makes my head hurt. I have completed L1 & L2 but not without AI help.

adamhajari · March 5, 2026, 12:22am

I believe there’s piece missing from the quadratic gradient descent example. After calculating the gradient we must zero it or else the gradient accumulates. If we don’t do this the loss bounces up and down over many iterations no matter how small you make the learning rate.

with torch.no_grad(): abc -= abc.grad * alpha

should be

with torch.no_grad(): 
    abc -= abc.grad * alpha
    # --- THE FIX ---
    abc.grad.zero_() 
    # ----------------

adamhajari · March 18, 2026, 7:49pm

Interesting to note that not zeroing out the gradient is called out as a strategy to deal with training using large models on smaller GPUs (you actually do still have to zero out the gradient, but not after every batch).