Did YOU do the homework? 😄

radek · March 20, 2020, 3:29pm

When I talk to people, they often raise that it is a little bit confusing what to work on, so here is a summary of the suggestions from Jeremy on what to focus on before the 2nd lecture I have done a so-so job on this so far myself, but still 4 more days (assuming the next class is on Tuesday ) so there is hope!

Link to the part of the lecture where Jeremy makes the suggestions.

make sure you can spin up a GPU server, that you can shut it down when it is finished
run the code shown in the lecture
use the documentation, use the doc function inside juypter notebook
do some searching of the fast.ai docs
see if you can grab the fast.ai documentation notebooks and try running them
read a chapter of the fast.ai book
do the questionnaire at the end of the chapter (not everything has been covered yet, answer only the questions that you can)
try to get comfortable with running code

Which part of the book to read? Chapter #1

List of NB commands / tip & tricks worth knowing

doc(...) - show the documentation for a given function or object
append/prepend a given function or object with ?? eg ??cnn_learner or cnn_learner?? for documentation
Along with doc the ‘??’ Appended or prepended for documentation
Shift tab -for the parameter list
Hovering over the method name gives you the parameter list too.
With doc if you are not seeing a link to the source code you are probably missing nbdev installation

Lecture 2 homework(interpreted from lecture):

Run & understand what’s happening in #Click me cell of chapter 1
See if you can think of any use cases in domain of your interest where problem is not defined as computer vision problem & creative ways to turn that into one
Get comfortable with the terminology used(mentioned in Jargon recap of chapter1)
Go through the questionnaire & further research at the end of chapter1
Read the paper on the effects of temperature and humidity on transmission of COVID-19 & interpretation of results mentioned in it
Checkout blog post by Jeremy on data project checklist
Experiment time - Most important step according alumni of course
- Signup for bing & extract images
- Understand data block api
- Build model to perform classification
- Interpret confusion matrix
- Analyse the results by plotting losses
- Exporting model & Inference on any other image of your choice
Go through chapter 2 of the book

barnacl · March 20, 2020, 3:38pm

Thanks radek for this .
Along with doc the ‘??’ Appended or prepended For documentation
Shift tab -for the parameter list
Hovering over the method name gives you the parameter list too.
With doc if you are not seeing a link to the source code you are probably missing nbdev installation

We should probably make a wiki for all the NB commands. Should add your link to the Twitter with the %debug tips too

radek · March 20, 2020, 3:44pm

I have done the reasonable thing now that I have not thought about before and seems the chapter of the book to read is given in the Lecture 1 thread!

Turns out it is chapter #1!

radek · March 20, 2020, 3:53pm

I think I have wikified the post now (hopefully) Everyone, let us work our magic

init_27 · March 20, 2020, 4:41pm

Hi @radek I was going to summarise the lecture + Things Jeremy says to do (and then convert that into a podcast) today.

You’ve already done 70% of it, is it okay, if I add the remainder to the OP-it might be slightly off of the theme so I asked before doing it

JorgeBriones · March 20, 2020, 5:00pm

Thank you! Very useful to get started with this adventure.

radek · March 20, 2020, 6:02pm

Mhmm mhmm mhmm… I’m thinking it would be good to keep this specific to the homework, especially as for many it will be the first homework in the course and potentially their introduction to this way of learning.

That also means that if anyone has any homework related questions, it would be great to discuss them in this thread

If it is on the homework, then sure, add it here If it is more of a summary of valuable thoughts on learning / doing machine learning, then it might be better to put it in the main Lesson 1 - Official topic under notes. But you have all the info, not me, so you decide

BTW what happened to the awesome Share your work here ✅ - #1792 by subhadityamukherjee from v3? This thread added a lot of value and was a nice place for people to casually post what they have been working on even if it was just a short blog post they posted on some tiny notebook on github? People did a bunch of awesome things and seems we don’t have such a place in this version? Should we create it?

I have been thinking of doing a summary of the lecture and I think it might go into such a thread, that might be a good place for all those little wonderful things and also it was such a popular thread people kept on checking it religiously. If you add your podcast to the wiki here there will be no notification I believe so there is a great chance many people who would be interested will miss it!

We have this very nice thread, but its more of people discussing what they will do in the future and while such conversations can be interesting, we need a separate thread where people could come in and share their work that has been completed Would be awesome for instance to see notebooks on github from people running lesson 1 code but on datasets of their choice / creation.

Anyhow - great question Sanyam I am also starting to feel this course is yet again by an order of magnitude larger than the previous iteration, with so many active new people (and I am sure there will be even more joining the activity on the forums in days to come ), its definitely important for us to give thought where we put things. Kudos for bringing this up.

arora_aman · March 20, 2020, 10:17pm

Thanks @radek for starting this thread! Such a wonderful thought to start a homework thread

Because, you have started this thread I think it would be brilliant to keep this thread going week on week as a central repository to mention homework and discuss homework related questions.

I am sure this can thread can be used to share the work already completed as well. Did not mean to keep it “future work only”

arora_aman · March 20, 2020, 10:22pm

I did end up updating the thread and information a little https://forums.fast.ai/t/share-your-v2-projects-here/65757 so we can use the same thread to share completed/future work. IMHO it would be verbose to have two threads - one for future and one for current work.

ilovescience · March 21, 2020, 12:34am

If you are struggling or want to check your answers, check out my post here! (almost done, will update the remaining question solutions in a few hours)

Also, there may be errors or alternative answers, so feel free to reply with your thoughts or edit the wiki!

Jess · March 21, 2020, 1:06am

@barnacl Radek took your great idea and wikified, so I added your tips.

lauwinggin · March 21, 2020, 5:50am

Thanks, I was worried I was behind when people posted questions beyond 01_intro. Do we have deliverables when we finish chapter 1, or are we just required to learn the material and be able to run the code?

radek · March 21, 2020, 6:58am

The homework is not checked or anything like this. Anything we do in this course is just for us - meaning you should do whatever you feel is helping you learn

Part of what seems to work very well (as Jeremy suggests in the lecture) is running code, seeing how things change with changes to inputs, checking out the docs, playing our with jupyter notebook -> getting acquainted with the whole ecosystem.

This thread is about giving people a bit of a helping hand with what they can do for the first lecture to get going, but generally you can come up with anything you feel would help you learn (running the code on your own data or some other dataset - maybe even one built into fast.ai, this links to fastai v1 though, writing a post on the forum explaining something, asking a question, writing a blog post, creating a NB on something that interests you, pushing to github and sharing on the forums, etc).

I am not sure if it’s part of the top-down way of learning (didn’t read the book) but in the way the fast.ai courses play out, you control your destiny, or what you get from the course! Sounds very similar to life in that regard

arora_aman · March 21, 2020, 12:48pm

This is gold.

reshama · March 24, 2020, 6:32pm

Hi Radek,
Where can I find information on how to do this:

see if you can grab the fast.ai documentation notebooks and try running them

init_27 · March 24, 2020, 6:35pm

Hi @reshama,
I believe This is the link you’re looking for.

0tist · March 28, 2020, 6:17am

hi @radek and fellow members, can someone please update the Did YOU do the homework? with this week’s homework and topics that we can study ourselves that will be suitable for the course.

init_27 · March 28, 2020, 7:11am

@0tist Please don’t hesitate to do it yourself, We’re all here to learn even though our speeds vary since we started our “walks with fastai” at different points in time, but the great thing is we’re here.

Most of the times someone on the forums start something and many people follow. Radek might say this is similar to how it happens in life

geetha.ai · March 29, 2020, 7:27pm

hi @0tist just added things from lecture2 that can be interpreted as homework in my perspective & thanks @radek for starting this thread, it has really helped me

go_go_gadget · March 31, 2020, 11:24pm

Regarding the instruction to read and understand the #Click me cell of Chapter 1, these are my thoughts and questions. As will be obvious, I’m a novice programmer.

from fastai2.vision.all import *
Import everything (classes, libraries, etc.) from the fastai vision library

path = untar_data(URLs.PETS)/'images'
I had a misunderstanding about this one. I thought untar_data(URLs.PETS) was downloading the URLs of the pet images, possibly because I’m predisposed to think of downloading URLs for the classifier for Lesson 1 from v3, but also because it’s URLs plural, not URL. So I checked the docs, and it turns out there’s a URLs class we’re using, and PETS is one of its methods. There are similar URLs methods for other datasets, but only the fastai ones. This approach doesn’t generalize to non-fastai datasets (but we’ll be learning other approaches that do generalize!).

So the dataset is extracted, and the location of the extracted dataset is returned to path. But what does the /'images' at the end do? I searched the forum and found the notes from Lesson 3 of v3, and if I’m extrapolating correctly, I think the pets dataset has a folder named ‘images’, and we’re telling the path to point specifically to that folder, rather than to the dataset folder as a whole. Is that right?

def is_cat(x): return x[0].isupper()
Define a function is_cat to which we pass x, the filename of each pet image. A characteristic of this particular dataset is that the first character of the filename is uppercase if the file is an image of a cat, so is_cat returns True if the first character of x is uppercase, and False otherwise.

dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))

I have questions about this one, too. The book says:

" The fourth line tells fastai what kind of dataset we have, and how it is structured. There are various different classes for different kinds of deep learning dataset and problem–here we’re using ImageDataLoaders . The first part of the class name will generally be the type of data you have, such as image, or text. The second part will generally be the type of problem you are solving, such as classification, or regression."

What is “the second part of the class name” that is “the type of problem you are solving…”? We’re doing classification, but it’s not obvious to me where that’s declared in the class name.

Then we’re using the from_name_func method of the ImageDataLoaders class, which creates our DataLoaders (dls as we’re calling them here), setting aside 20% of our data as the validation set, setting the optional seed value to 42, setting the labelling function to be our is_cat function defined above, and selecting the Resize(224) as the transformation to be applied to the images, resizing them all to 224x244 pixels for historical reasons.

But why is the seed set to 42? The book and the docs say it’s for reproducibility, and I understand that getting the same validation set every time is what gives us reproducible results, but what is a seed, and how does it achieve a reproducible validation set? I Googled “reproducibility seed” and found this post helpful:

“The “seed” is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers.”

But if the elements of the validation set are chosen randomly, how does starting from the same point help? And why 42? Is there a practical consideration at work, or is it just Douglas Adams?

learn = cnn_learner(dls, resnet34, metrics=error_rate)
Use the cnn (convolution neural network) learner, telling it to use the dls we established above, the ResNet34 architecture, and the error rate as a metric. Pretty straightforward for me.

learn.fine_tune(1)
Since we’re using a pretrained model, we don’t want to start fitting the model from scratch, as we would if we used learn.fit. Instead, we’ll fine-tune the model for our particular dataset for one epoch (a complete pass through the dataset) to create the head of our model, which is unique to this dataset. The book says:

“After calling fit , the results after each epoch are printed, showing the epoch number, the training and validation set losses (the “measure of performance” used for training the model), and any metrics you’ve requested (error rate, in this case).”

But it must mean “After calling fine_tune.”

I did have another hiccup, trying to use ?? to see the docs for methods, e.g. ??cnn_learner; I keep getting an error “Object cnn_learner not found.” Other shortcuts such as b to create a new cell are working for me, so I’m not sure what I’m doing wrong with this one.

And that’s the lot! Thanks for reading all of this, and please let me know if you can answer any of my questions, or if I’ve mischaracterized anything.