Lesson 13 official topic

This is a wiki post - feel free to edit to add links from the lesson or other useful info.

<<< Lesson 12Lesson 14 >>>

Lesson resources

  • Lesson Videos
    • Edited video
    • Stream (You can watch this any time during or after the session is complete)

Links from the lesson

Other resources


Excellent series on neural networks by 3blue1brown. Especially the 3rd and 4th lesson for backprop intuition.


Is the reason why we use logs related to the fact that we don’t want values to explode in neural nets, like get extremely big multiplication values ?

Yes; also, addition takes fewer operations for a computer than multiplication.

1 Like

During the lesson today somebody mentioned that having a debugger was the one missing piece for Jupyter Notebooks. So just wanted to mention available alternatives that I’ve actually used :slight_smile:

The cheapest/easiest is Visual Studio Code (VSCode). It’s free, works well, and has most of the features that I wanted out of Jupyter Notebooks that I can’t get without installing extensions. Also, it runs Jupyter automatically for notebooks without you having to run the Jupyter instance via terminal. You do have to have Jupyter installed though …

More info can be found here:

The other option that I’ve personally tried (which provide easy debugging) is DataSpell from JetBrains:

This is a paid product but they do have a 30-day evaluation version. And they do provide a free open source license to qualifying developers. I found VSCode to be faster/easier for my personal use but the general impression I got was that DataSpell has a lot more features. But it feels a little slow and has a slightly higher learning curve.


Just wanted to share something I had noticed, not sure if it makes a difference to others.

For a classification task where it’s one hot encoded. the cross entropy loss becomes -log(predication), if you graph -log(x) it makes a really nice curve for gradient decent to move prediction close to 1.


hey Fahim,

Thanks for sharing that. That is life changing and it comes with intellisense!!!

Yeah, the intellisense and markdown folding were the features I love the most via VSCode :slight_smile: But there’s a bunch of small things which makes the overall experience so much better under VSCode! The only trouble is that the features are spread across several different menus and so you have to hunt around a bit till you get used to it …

If you’re hungry for another run at derivatives, the chain rule, and backprop, I enjoyed working through Andrej KArpathy’s “micrograd” videos. It zooms in more to how single “neurons” interact and digs a bit deeper into building a Pythonic autograd class hierarchy. Really great stuff that complements JH’s material. The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube


pdb is a good debugger once you learn its features, IMO. Graphical debuggers have their place too

There is a debugger with jupyter lab (think since version 3) which works pretty well.


I agree. I have to do most code development on a remote host and have tried most of the better known and a few lesser known IDEs. When I need to debug in this environment %debug and pdb are the tools I use to solve the majority of my problems since they are so easy and %debug in particular gets you straight to the problem.

The debate about which IDE is going to last forever. Me experience is:

  • Pycharm: Excellent for writing code with many advanced features but needs the paid pro version for remove host and has in my opinion a fundamental flaw in that the code has to reside locally and pycharm then syncs to the remote machine to run the code when required. If you also run notebooks on the remote host over ssh then this can often mess up the syncing and causes problems. For this reason I stopped using it. Debugging code worked very well with pycharm, possibly the best of all of them and very fast.

  • Wing Pro: Less well known and again you need the pro license to work on a remote host. Support for writing code not as good as pycharm or vs code but very good for remote debug sessions with probably the fastest and most robust use.

  • VS Code: Great for writing code as mentioned by @Fahim but I find very slow and flaky when I try to use it in debug mode over a remote host

The one thing that messes all of them up is if you have to debug something that has multiple threads such as pytorch dataloaders, but usually this can be fixed by setting num_workers to 1 or something similar.

I’d be interested in the opinion of others on the course that probably have more experience than me.



The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube

I found this to be excellent also. Implementing backprop “by hand” made it all click for me. Showing it with regular ole scalars helps focus the core concepts. It’s just one screen of code! After that, I felt like I could implement anything from scratch.

1 Like

I just caught up on all the lessons! So pumped that we are doing everything from scratch. Looking forward to following along and creating notebooks to train on my own datasets!

If, like me, anybody was having trouble visualizing why the following expression is true

inp.g = out.g @ w.t()

I found this intuitive explanation super useful.
Adding a screenshot for reference


We saw the log rules for the quotient and multiplication:

just a reminder that ln(x) / ln(y) and ln(x) * ln(y) which look similar have no simplification rules.
A tip that helps me remember this: x * y can lead to large numbers and applying log to it converts the multiplication to a summation, which is smaller than the multiplication.
Hope this helps someone :slight_smile:

1 Like

This was the first of the fastai lessons so far where I got REALLY lost, probably because I’ve never studied calculus and because things move faster around that.

I’m going through everything in this week’s notebook really slowly and expanding my explanations to myself in notes as and where something feels like a step or progression is compressed. Same with some of the new terms / shorthands for things we’d been doing in part 1 but that were never referred to with those terms (like ‘back propagation’ etc). Will get there in the end, I hope, and will try not get dissuaded by the forward march of the lessons!


adding to this, Say we have a linear layer.
L is our Loss.

linear layer:
input * w + b = output

we need to find:
input.grad (or dL/dinput)

output.grad (or dL/doutput)

applying chain rule, dL/dinput = (dL/doutput) * (doutput/dinput)
or, input.grad = output.grad * (doutput/dinput)

doutput/dinput = d ( input * w + b ) / dinput
doutput/dinput = w

therefore, input.grad = output.grad * w (w.T to take care of sizes)

the crux of backdrop is to make use of local-derivative and global derivative

as explained by karpathy here, timestamp: 1:06:51:


I don’t think there is a substitute for debugging in the exact environment the code is running in and which it was developed for. Anything else is a compromise.