Lesson 8 (2019) discussion & wiki

Please use this thread to discuss lesson 8. Since this is Part 2, feel free to ask more advanced or slightly tangential questions - although if your question is not related to the lesson much at all, please use a different topic.

Thread for general chit chat (we won’t be monitoring this).

Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little edit icon at the bottom of this post. Here’s a pic of what to look for:

Lesson resources


Sometimes, occasionally, shockingly, Jeremy makes mistakes. It is rumored that these mistakes were made in this lesson:

  1. Jeremy claimed that these are equivalent:
for i in range(ar):
    c[i] = (a[i].unsqueeze(-1)*b).sum(dim=0)
    c[i] = (a[i,None]*b).sum(dim=0)

But they’re not (noticed by @stas - thanks!) The 2nd one isn’t indexing the second axis. So it should be:

for i in range(ar):
    c[i] = (a[i,:,None]*b).sum(dim=0)

Things mentioned in the lesson


Notes and other resources

Use this Jupyter notebook to start running the “Deep Learning From the Foundations” notebooks on Colab
Annotated notebooks for Lessons 8 - 12
Lesson 8 notes by @Lankinen
Lesson 8 notes by @wittmannf
Lesson 8 notes by @gietema
Lesson 8 notes by @Borz
Lesson 8 notes by @timlee

Blog posts and tutorials

“Assigned” Homework

  • Review concepts 16 concepts from Course 1 (lessons 1 - 7): Affine Functions & non-linearities; Parameters & activations; Random initialization & transfer learning; SGD, Momentum, Adam; Convolutions; Batch-norm; Dropout; Data augmentation; Weight decay; Res/dense blocks; Image classification and regression; Embeddings; Continuous & Categorical variables; Collaborative filtering; Language models; NLP classification; Segmentation; U-net; GANS
  • Make sure you understand broadcasting
  • Read section 2.2 in Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
  • Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself

I think its lesson 1 instead of Lesson 8.

1 Like

I believe the lesson orders are all connected. Part 1 had the first seven lessons. Part 2 will have the next set of lessons starting at #8.


It is lesson 8 mod 7: )



I don’t see the course ml2 link, will it be open soon?

1 Like

Small question before the beginning of the lesson : how redundant is part 2 v3 with part 2 v2 ? Is it worth it to go through part 2 v2 after finishing part 2 v3 ?

I got them from github: https://github.com/fastai/fastai_docs/tree/master/dev_course/dl2
which the forum tells me @jeremy also posted here recently.

The notebooks for today’s class look great!


Gentle reminder to avoid @ mentioning unless necessary.


Noob tip:
I keep 2 setups:

  • Bleeding edge (pip installed)
  • Conda installed setup (For no bleeding)

What will the “suggested homework” look like for that new format of fastai part 2 ? In part 1 it was using what we saw in kaggle competitions for example, how to do that with the lessons focused on building fastai from foundations ?

1 Like

#off topic
Finally Jeremy is kinda teaching Python As well indirectly!
Thanks :slight_smile:


I’m so glad you asked the question! Homework is writing docs and new tests :slight_smile:
You will know how the library works, so you’ll have all the tools to do that!


what is the best way to study this part 2?

(for part 1 we learnt the best way is to watch videos 3 times, write blogposts and try blah). Is the approach for part 2 the same?


That’s another way to get lots of contributors to the fastai librairy haha :wink:


For the distributed training, will we examine multi-node single/multi-GPU in addition to single-node mult-GPU?

Not sure. It will probably will be single-node only.

I love the new direction of part2 of the course! Getting into the fundamentals can eliminate those last little bits of hesitation and delay when dealing with modifying and customizing state-of-the-art training algorithms. Also I’m very excited about the direction of performance optimization, distributed training, and the emphasis on engineering. I have already convinced our engineers to get into pytorch, and I will have a year or two before I can get them to get into Swift.


I heard Julia language is built for numerical computation, why not Julia instead?