Kaggle Comp: Plant Seedlings Classification

This one looks beginner friendly, yet the images are sufficiently different than the ones in ImageNet to make it interesting.

17 Likes

OK folks, come and get me! :smiley:

28 Likes

@jeremy
Looks like this is the metric that I need to include in the evaluation.

from sklearn.metrics import f1_score
f1_score(data.val_y, preds, average=ā€˜microā€™)

What is the way to add it to the model??

@gerardo please donā€™t at-mention me unless you specifically need me to answer your question and no-one else. Thereā€™s lots of folks here who can be helpful! :slight_smile:

Look at how ā€˜metricsā€™ is defined in the various lesson notebooks weā€™ve seen so far to see how weā€™ve done this - especially the Planet one.

Iā€™m sorry to bother. :open_mouth:
I just feel this weird attachment to your work and I have you ā€œcyber-closeā€ that I have the urge to ask questions.

Iā€™m pretty sure that Iā€™m not alone.

I will keep asking questions while Iā€™m keep looking for the them in the forums.
:+1:

Thanks for all the hard work.

No problem!

Hey i know its sillyā€¦
But how do you divide the train folder into valid (i mean getting everything perfectly into sub-folders).

I did it manually
200 images per category for training
the rest for validation.

1 Like

@gerardo
hmmm I would love to know if thereā€™s any shortcut script if someone wrote itā€¦
else I know what to do next :wink:

1 Like

I created a CSV using python and then used from_csv.

3 Likes

I also wrote a script to create a labels.csv file with headers file,species.

from glob2 import glob
import pandas as pd


df = pd.DataFrame(columns=["file", "species"])

for image in glob("train/**/*.png"):
    dir_ = image.split('/')
    file_, species = dir_[-1], dir_[-2]

    df = df.append({
        "file": file_,
        "species": species
        }, ignore_index=True)

df.to_csv('labels.csv', index=False)

Then, you can use the from_csv method.

Once you are done creating labels.csv, donā€™t forget to remove the species folders in train. Keep the images, remove the folders.

17 Likes

@jeremy ok got youā€¦:slight_smile:

14 Likes

Wow you killed it! What architecture are you using, if I may ask?

2 Likes

Resnet50ā€¦how about you?

Also resnet50. Oddly enough I found Resnext50 and Nasnet both a little worse (Resnext50 was better on validation, but worse when submitted).

I tried resnext50 first cause I also thought it would do better, but kept getting cuda out of memory errors on fine-tuning so thats how I ended up resorting to resnet50ā€¦will test out a few others. Also still need to train rn50 with all the data or cv, this is just one fold for now

Wow thatā€™s impressive - mine is trained with all data.

To train the bigger models I decreased the batch size. For Nasnet I need bs=12 on a 16GB P3!

1 Like

Hey how did you include metrics in?
This is what i tried to do.

from sklearn.metrics import f1_score
def meanfscore(y_pred, targ):
    return f1_score(targ, y_pred, average='micro')

learn.fit(0.01, 5, metrics=[meanfscore])

This should work but itā€™s throwing a weird error.

ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

Personally, I didnā€™t use f1_score as a metric during training. I just used regular accuracy. The LB is scored by f1 based on the predictions you submit.

2 Likes

log_preds,y = learn.TTA()
preds = np.exp(log_preds)
preds = np.argmax(log_preds, axis=1)
metrics.f1_score(data.val_y, preds, average=ā€˜microā€™)

I used this code to calculate the f1_score but I would like to add it the learn.fit but does not seems to be working.