Lesson 8 (2019) discussion & wiki

jeremy · March 18, 2019, 9:46pm

Please use this thread to discuss lesson 8. Since this is Part 2, feel free to ask more advanced or slightly tangential questions - although if your question is not related to the lesson much at all, please use a different topic.

Thread for general chit chat (we won’t be monitoring this).

Note that this is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little edit icon at the bottom of this post. Here’s a pic of what to look for:

Lesson resources

Edited lesson video
Slides
Course notebooks
Excel spreadsheets (today’s is called broadcasting.xlsx). There’s also a Google Sheet version thanks to @Moody
Ensure your fastai lib is up to date
You’ll also need to: conda install nbconvert
You’ll also need to: conda install fire -c conda-forge
Notes thread

Errata

Sometimes, occasionally, shockingly, Jeremy makes mistakes. It is rumored that these mistakes were made in this lesson:

Jeremy claimed that these are equivalent:

for i in range(ar):
    c[i] = (a[i].unsqueeze(-1)*b).sum(dim=0)
    c[i] = (a[i,None]*b).sum(dim=0)

But they’re not (noticed by @stas - thanks!) The 2nd one isn’t indexing the second axis. So it should be:

for i in range(ar):
    c[i] = (a[i,:,None]*b).sum(dim=0)

Things mentioned in the lesson

Jeremy’s blog posts about Swift: fast.ai Embracing Swift for Deep Learning and High Performance Numeric Programming with Swift: Explorations and Reflections
Rachel’s post on starting to blog: Why you (yes, you) should blog
Numpy docs on broadcasting
Numpy docs on einsum (has lots of great examples)
The Matrix Calculus You Need For Deep Learning by Terence Parr and Jeremy
Detexify (for finding math symbols) and Wikipedia list of symbols
The matrix multiplication song
Thread by @pierreguillou seeking feedback on best practices for study groups

Papers

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification– 2015 paper that won ImageNet, and introduced ResNet and Kaiming Initialization.
Understanding the difficulty of training deep feedforward neural networks– paper that introduced Xavier initialization
Fixup Initialization: Residual Learning Without Normalization – paper highlighting importance of normalisation - training 10,000 layer network without regularisation

Notes and other resources

Use this Jupyter notebook to start running the “Deep Learning From the Foundations” notebooks on Colab
Annotated notebooks for Lessons 8 - 12
Lesson 8 notes by @Lankinen
Lesson 8 notes by @wittmannf
Lesson 8 notes by @gietema
Lesson 8 notes by @Borz
Lesson 8 notes by @timlee

Blog posts and tutorials

Jake VanderPlas’ explanation of broadcasting
Mathpix - turns images into LaTeX
Tutorial by @jeffhale on "How to use if __name__=='__main__'"
Khan academy lesson on the chain rule
Basic PyTorch Tensor Tutorial (Includes Jupyter Notebook)
Xavier Initialisation (why divide over sqrt(M)) - Link to a blog post that explains it nicely
Xavier and Kaiming initialisation : Link to a blog post that explains the two papers, and in particular the math in detail

“Assigned” Homework

Review concepts 16 concepts from Course 1 (lessons 1 - 7): Affine Functions & non-linearities; Parameters & activations; Random initialization & transfer learning; SGD, Momentum, Adam; Convolutions; Batch-norm; Dropout; Data augmentation; Weight decay; Res/dense blocks; Image classification and regression; Embeddings; Continuous & Categorical variables; Collaborative filtering; Language models; NLP classification; Segmentation; U-net; GANS
Make sure you understand broadcasting
Read section 2.2 in Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself

ram_cse · March 18, 2019, 11:44pm

I think its lesson 1 instead of Lesson 8.

sariabod · March 18, 2019, 11:48pm

I believe the lesson orders are all connected. Part 1 had the first seven lessons. Part 2 will have the next set of lessons starting at #8.

dreambeats · March 19, 2019, 1:19am

It is lesson 8 mod 7: )

suvasis · March 19, 2019, 1:29am

https://www.crestle.com/dashboard

I don’t see the course ml2 link, will it be open soon?

PierreO · March 19, 2019, 1:35am

Small question before the beginning of the lesson : how redundant is part 2 v3 with part 2 v2 ? Is it worth it to go through part 2 v2 after finishing part 2 v3 ?

kro · March 19, 2019, 1:35am

I got them from github: https://github.com/fastai/fastai_docs/tree/master/dev_course/dl2
which the forum tells me @jeremy also posted here recently.

paul · March 19, 2019, 1:36am

The notebooks for today’s class look great!

rachel · March 19, 2019, 1:36am

Gentle reminder to avoid @ mentioning unless necessary.

init_27 · March 19, 2019, 1:44am

Noob tip:
I keep 2 setups:

Bleeding edge (pip installed)
Conda installed setup (For no bleeding)

PierreO · March 19, 2019, 1:46am

What will the “suggested homework” look like for that new format of fastai part 2 ? In part 1 it was using what we saw in kaggle competitions for example, how to do that with the lessons focused on building fastai from foundations ?

ecdrid · March 19, 2019, 1:47am

#off topic
Finally Jeremy is kinda teaching Python As well indirectly!
Thanks

sgugger · March 19, 2019, 1:47am

I’m so glad you asked the question! Homework is writing docs and new tests
You will know how the library works, so you’ll have all the tools to do that!

mrandy · March 19, 2019, 1:47am

what is the best way to study this part 2?

(for part 1 we learnt the best way is to watch videos 3 times, write blogposts and try blah). Is the approach for part 2 the same?

PierreO · March 19, 2019, 1:48am

That’s another way to get lots of contributors to the fastai librairy haha

bwenig · March 19, 2019, 1:49am

For the distributed training, will we examine multi-node single/multi-GPU in addition to single-node mult-GPU?

sgugger · March 19, 2019, 1:50am

Not sure. It will probably will be single-node only.

paul · March 19, 2019, 1:51am

I love the new direction of part2 of the course! Getting into the fundamentals can eliminate those last little bits of hesitation and delay when dealing with modifying and customizing state-of-the-art training algorithms. Also I’m very excited about the direction of performance optimization, distributed training, and the emphasis on engineering. I have already convinced our engineers to get into pytorch, and I will have a year or two before I can get them to get into Swift.

iyersathya · March 19, 2019, 1:52am

I heard Julia language is built for numerical computation, why not Julia instead?