Lesson 3 In-Class Discussion ✅

Can you please explain again what exactly threshold 0.2 mean.

1 Like

Potentially useful:

Currying is a function of 1 argument which takes a function f and returns a new function h. Partial application is a function of 2+ arguments which takes a function f and 1+ additional arguments to f and returns a new function g.

7 Likes

graph label puts it validation loss… so it finds loss on training ds or validation set at the end of iteration

The planet example once again shows train_loss > valid_loss… see

Please vote that post up!

2 Likes

It means that we’ll accept a label as true if we predicted its probability above .2, or 20%

2 Likes

I would be very careful to inspect any human feedback. Anything from a “user” can be malicious. Horrible images and/or incorrect corrections are just the beginnings that come to my mind.

2 Likes

Should we stop training if the validation loss is higher than the training loss? Or what is the stop rule?

The Data Block API dot something is functional programming style. Think of the operations like UNIX pipe.

data = (ImageFileList.from_folder(planet)            
        #Where to find the data? -> in planet and its subfolders
        .label_from_csv('labels.csv', sep=' ', folder='train', suffix='.jpg')  
        #How to label? -> use the csv file labels.csv in path, 
        #add .jpg to the names and take them in the folder train
        .random_split_by_pct()                     
        #How to split in train/valid? -> randomly with the default 20% in valid
        .datasets()
        #How to convert to datasets? -> use ImageMultiDataset
        .transform(planet_tfms, size=128)             
        #Data augmentation? -> use tfms with a size of 128
        .databunch())                          
        #Finally? -> use the defaults for conversion to databunch
10 Likes

Oh graph_label is wrong then.

I think data block api’s are similar to currying functions in javascript or function chaining in python.

1 Like

One question which comes to my mind is supposedly if we have a dataset which is images of animals eg - zebra, lion, elephant etc. so we have separate folders for all these images and we train our dataset on these images. But what if we now have images which have some of these classes together (zebra, lion, monkey etc ) in one single image. What would happen then ? Would our classifier give probabilities for all of these ?

4 Likes

Data Block API has a particular Javascript-ish style with it.

2 Likes

so what it should be ideally Training loss ?

Let’s assume that you have a model that predicts three different labels. For each classified image, you will get three probabilities:

  • label1: 0.4
  • label2: 0.1
  • label3: 0.3

A threshold of 0.2 would then return each of the labels that have a higher probability, e.g. label1 and label3 in this case.

5 Likes

Is it always assumed that labels will match corresponding data folders?

I would think that’s why you have to have a way to validate it, at the minimum look it over or see if the same image has an issue a set amount of times, etc.

I don’t recall what it says, but if it says validation loss, it’s wrong.

It is also equivalent to keras’s functional APIs

thought the same when I saw it! functional style

1 Like

No, not necessarily: in the planet example for instance, you look at the labels in a csv file.