Can you please explain again what exactly threshold 0.2 mean.
Potentially useful:
Currying is a function of 1 argument which takes a function
f
and returns a new functionh
. Partial application is a function of 2+ arguments which takes a functionf
and 1+ additional arguments tof
and returns a new functiong
.
graph label puts it validation loss… so it finds loss on training ds or validation set at the end of iteration
The planet example once again shows train_loss > valid_loss… see
Please vote that post up!
It means that we’ll accept a label as true if we predicted its probability above .2, or 20%
I would be very careful to inspect any human feedback. Anything from a “user” can be malicious. Horrible images and/or incorrect corrections are just the beginnings that come to my mind.
Should we stop training if the validation loss is higher than the training loss? Or what is the stop rule?
The Data Block API dot something is functional programming style. Think of the operations like UNIX pipe.
data = (ImageFileList.from_folder(planet)
#Where to find the data? -> in planet and its subfolders
.label_from_csv('labels.csv', sep=' ', folder='train', suffix='.jpg')
#How to label? -> use the csv file labels.csv in path,
#add .jpg to the names and take them in the folder train
.random_split_by_pct()
#How to split in train/valid? -> randomly with the default 20% in valid
.datasets()
#How to convert to datasets? -> use ImageMultiDataset
.transform(planet_tfms, size=128)
#Data augmentation? -> use tfms with a size of 128
.databunch())
#Finally? -> use the defaults for conversion to databunch
Oh graph_label is wrong then.
I think data block api’s are similar to currying functions in javascript or function chaining in python.
One question which comes to my mind is supposedly if we have a dataset which is images of animals eg - zebra, lion, elephant etc. so we have separate folders for all these images and we train our dataset on these images. But what if we now have images which have some of these classes together (zebra, lion, monkey etc ) in one single image. What would happen then ? Would our classifier give probabilities for all of these ?
Data Block API has a particular Javascript-ish style with it.
so what it should be ideally Training loss ?
Let’s assume that you have a model that predicts three different labels. For each classified image, you will get three probabilities:
- label1: 0.4
- label2: 0.1
- label3: 0.3
A threshold of 0.2 would then return each of the labels that have a higher probability, e.g. label1 and label3 in this case.
Is it always assumed that labels will match corresponding data folders?
I would think that’s why you have to have a way to validate it, at the minimum look it over or see if the same image has an issue a set amount of times, etc.
I don’t recall what it says, but if it says validation loss, it’s wrong.
It is also equivalent to keras’s functional APIs
thought the same when I saw it! functional style
No, not necessarily: in the planet example for instance, you look at the labels in a csv file.