Wiki: Lesson 1


(Ibrahim Khalil) #84

I understand the overall concept of the Notebook now. When going line by line, following line stood out

Why is that we use only the probability of dog throughout the notebook. Why don’t we use the probability of cat anywhere? I presume we could’ve used either one as the sum of probability along each row is one or almost 1.


#85

I think they are moved to the fastai repo.


However, I don’t see the folded table of contents in the notebook from lesson 1. Am I missing something?


#86

Hi,
I have just started with fastai Part1v2 course and have finished watching the first video. How are the files divided in Train/Valid etc? How do I know more about these terms and dividing files accordingly?


#87

Interested to find out more about this too!

@alessa / @lukebyrne do you guys have any insights? I saw a post [Wiki: Lesson 1] which talked briefly about this, but I’m not sure if this was ever resolved.

Reason I’m asking is that the ImageClassifierData.from_paths method takes the following args:

            trn_name: a name of the folder that contains training images.
            val_name:  a name of the folder that contains validation images.
            test_name:  a name of the folder that contains test images.

Any insights into the train/valid/test split required to feed into this method will be really helpful.

Thanks!


(Alessa Bandrabur) #88

Here you find more details stackoverflow

Usually the dogs and cats examples have only train and valid dataset, where the training dataset is 12500 files per class, and the validation dataset is 1000 files per class (~7% of the data).

What you need to pay attention is when you build your datasets, to cut sample files from training dataset and move then in the validation dataset. (Instead of just copy paste).

If you do kaggle competition for example, you will have also a test dataset (with no labels/classes). In this case, you can train your model using the cross-validation technique. And in the end, you can put all of your files in the training dataset (no more validation set), and this will be your final weights.


(Alessa Bandrabur) #89

Here is a short video on how to split the data by Andrew Ng Coursera

[60% training, 20% validation, 20% test]
You can change these params and check how it affects your final model performance.


#90

Thanks @alessa! This was helpful for a general train / test / split. I was wondering more specifically -
do you know of any specific setup requirements for the train / test / split for fastAi’s method?

Thanks again.


(Matthew Kleinsmith) #91

When choosing a learning rate with the LR finder, you can plot a vertical line to ensure you choose the correct x-coordinate of the point you’re interested in. Otherwise it can be difficult to interpret values on the x-axis, since they’re in log scale.

image

import matplotlib.pyplot as plt
learn.sched.plot()
plt.axvline(x=1.6e-2, color="red");

(Aditya) #92

Also adding %matplotlib notebook seems awesome (edit the same plot until created a new one)


(Navin Kumar) #93

Kaggle CatsDogs Redux Kernel competition asks us to report whats the probability of that image to be a dog.hence interested in calculating dog probability


(Jeremy Howard) #94

Good idea


(Matthew Kleinsmith) #95

Thanks. I invite you to the LR finder plots thread.


(Jeremy Howard) #96

I thought some one had created a timeline for this lesson, but I can’t find it - am I imagining things? @hiromi @EricPB where did we get to with this for the new version of the video? Sorry for my poor memory!


(Hiromi Suenaga) #97

I believe there was one for the original lesson 1 video, but I don’t recall one for the re-taped version. I can certainly create one :slight_smile:


(Jeremy Howard) #98

That would be quite wonderful! I’ve nearly finished the new course web site and suddenly discovered we don’t have a timeline! :open_mouth:


#99

@jeremy, question about V2 vs. Machine Learning For Coders: I’ve had significant dev experience and want to complete one of these, do you have a suggestion of which course to take? Is Machine Learning For Coders ready for the public? I’ve started both of the first videos of the respective courses. Thanks.


(Eric Perbos-Brinck) #100

Master @Jeremy,

You indeed found the nasty secret in the “Video Timelines for Part 1 V2”: there is none existing so far for Lesson 1, but Lessons 2 to 7 are covered with the help of your humble servants here (hiromi was super efficient/fast at fixing my mistakes on L7) :blush:

I will work on it tomorrow/this weekend.

Did anyone mention that he/she was looking for your notebook on Favorita Comp, including Preds and Submission, now that it’s over ? :sunglasses:


(Jeremy Howard) #101

They’re both ready. Perhaps read the experiences of other students on the forum and see what you think. They’re both worth doing.


(Jeremy Howard) #102

You and @hiromi are both very kind :slight_smile:

I’m rather embarrassed that I never got around to creating a groceries model that I’m actually happy with. But I’ll endeavor to dig up my notebook after I get this course out…


(Hiromi Suenaga) #103

@EricPB, I’m at a hackathon tonight so I’ll see how far I can get during my breaks. You can make it prettier for me when you get a chance :slight_smile: