Lesson 3 In-Class Discussion ✅

avinashj · November 9, 2018, 3:27am

Can you please explain again what exactly threshold 0.2 mean.

rramphal · November 9, 2018, 3:27am

Potentially useful:

Currying is a function of 1 argument which takes a function f and returns a new function h. Partial application is a function of 2+ arguments which takes a function f and 1+ additional arguments to f and returns a new function g.

champs.jaideep · November 9, 2018, 3:28am

graph label puts it validation loss… so it finds loss on training ds or validation set at the end of iteration

ricknta · November 9, 2018, 3:28am

The planet example once again shows train_loss > valid_loss… see

Please vote that post up!

Lothar · November 9, 2018, 3:28am

It means that we’ll accept a label as true if we predicted its probability above .2, or 20%

bhollan · November 9, 2018, 3:28am

I would be very careful to inspect any human feedback. Anything from a “user” can be malicious. Horrible images and/or incorrect corrections are just the beginnings that come to my mind.

tqi2 · November 9, 2018, 3:29am

Should we stop training if the validation loss is higher than the training loss? Or what is the stop rule?

cedric · November 9, 2018, 3:29am

The Data Block API dot something is functional programming style. Think of the operations like UNIX pipe.

data = (ImageFileList.from_folder(planet)            
        #Where to find the data? -> in planet and its subfolders
        .label_from_csv('labels.csv', sep=' ', folder='train', suffix='.jpg')  
        #How to label? -> use the csv file labels.csv in path, 
        #add .jpg to the names and take them in the folder train
        .random_split_by_pct()                     
        #How to split in train/valid? -> randomly with the default 20% in valid
        .datasets()
        #How to convert to datasets? -> use ImageMultiDataset
        .transform(planet_tfms, size=128)             
        #Data augmentation? -> use tfms with a size of 128
        .databunch())                          
        #Finally? -> use the defaults for conversion to databunch

sgugger · November 9, 2018, 3:29am

Oh graph_label is wrong then.

keyurparalkar · November 9, 2018, 3:29am

I think data block api’s are similar to currying functions in javascript or function chaining in python.

sahilk1610 · November 9, 2018, 3:29am

One question which comes to my mind is supposedly if we have a dataset which is images of animals eg - zebra, lion, elephant etc. so we have separate folders for all these images and we train our dataset on these images. But what if we now have images which have some of these classes together (zebra, lion, monkey etc ) in one single image. What would happen then ? Would our classifier give probabilities for all of these ?

PegasusWithoutWinds · November 9, 2018, 3:29am

Data Block API has a particular Javascript-ish style with it.

champs.jaideep · November 9, 2018, 3:29am

so what it should be ideally Training loss ?

sandmann · November 9, 2018, 3:30am

Let’s assume that you have a model that predicts three different labels. For each classified image, you will get three probabilities:

label1: 0.4
label2: 0.1
label3: 0.3

A threshold of 0.2 would then return each of the labels that have a higher probability, e.g. label1 and label3 in this case.

iyersathya · November 9, 2018, 3:30am

Is it always assumed that labels will match corresponding data folders?

agoldina · November 9, 2018, 3:30am

I would think that’s why you have to have a way to validate it, at the minimum look it over or see if the same image has an issue a set amount of times, etc.

sgugger · November 9, 2018, 3:30am

I don’t recall what it says, but if it says validation loss, it’s wrong.

keyurparalkar · November 9, 2018, 3:30am

It is also equivalent to keras’s functional APIs

zachcaceres · November 9, 2018, 3:30am

thought the same when I saw it! functional style

sgugger · November 9, 2018, 3:31am

No, not necessarily: in the planet example for instance, you look at the labels in a csv file.