Kaggle Comp: Plant Seedlings Classification


(Zarak) #1

This one looks beginner friendly, yet the images are sufficiently different than the ones in ImageNet to make it interesting.


Visualizing layers of Resnet50 model trained on Plant Seedlings data
(Jeremy Howard) #2

OK folks, come and get me! :smiley:


(Gerardo Garcia) #4

@jeremy
Looks like this is the metric that I need to include in the evaluation.

from sklearn.metrics import f1_score
f1_score(data.val_y, preds, average=‘micro’)

What is the way to add it to the model??


(Jeremy Howard) #5

@gerardo please don’t at-mention me unless you specifically need me to answer your question and no-one else. There’s lots of folks here who can be helpful! :slight_smile:

Look at how ‘metrics’ is defined in the various lesson notebooks we’ve seen so far to see how we’ve done this - especially the Planet one.


(Gerardo Garcia) #6

I’m sorry to bother. :open_mouth:
I just feel this weird attachment to your work and I have you “cyber-close” that I have the urge to ask questions.

I’m pretty sure that I’m not alone.

I will keep asking questions while I’m keep looking for the them in the forums.
:+1:

Thanks for all the hard work.


(Jeremy Howard) #7

No problem!


(Saurav Singh) #8

Hey i know its silly…
But how do you divide the train folder into valid (i mean getting everything perfectly into sub-folders).


(Gerardo Garcia) #9

I did it manually
200 images per category for training
the rest for validation.


(Saurav Singh) #10

@gerardo
hmmm I would love to know if there’s any shortcut script if someone wrote it…
else I know what to do next :wink:


(Jeremy Howard) #11

I created a CSV using python and then used from_csv.


(Shubham Singh Tomar) #12

I also wrote a script to create a labels.csv file with headers file,species.

from glob2 import glob
import pandas as pd


df = pd.DataFrame(columns=["file", "species"])

for image in glob("train/**/*.png"):
    dir_ = image.split('/')
    file_, species = dir_[-1], dir_[-2]

    df = df.append({
        "file": file_,
        "species": species
        }, ignore_index=True)

df.to_csv('labels.csv', index=False)

Then, you can use the from_csv method.

Once you are done creating labels.csv, don’t forget to remove the species folders in train. Keep the images, remove the folders.


(James Requa) #13

@jeremy ok got you…:slight_smile:


(Jeremy Howard) #14

Wow you killed it! What architecture are you using, if I may ask?


(James Requa) #15

Resnet50…how about you?


(Jeremy Howard) #16

Also resnet50. Oddly enough I found Resnext50 and Nasnet both a little worse (Resnext50 was better on validation, but worse when submitted).


(James Requa) #17

I tried resnext50 first cause I also thought it would do better, but kept getting cuda out of memory errors on fine-tuning so thats how I ended up resorting to resnet50…will test out a few others. Also still need to train rn50 with all the data or cv, this is just one fold for now


(Jeremy Howard) #18

Wow that’s impressive - mine is trained with all data.

To train the bigger models I decreased the batch size. For Nasnet I need bs=12 on a 16GB P3!


(Saurav Singh) #19

Hey how did you include metrics in?
This is what i tried to do.

from sklearn.metrics import f1_score
def meanfscore(y_pred, targ):
    return f1_score(targ, y_pred, average='micro')

learn.fit(0.01, 5, metrics=[meanfscore])

This should work but it’s throwing a weird error.

ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets


(James Requa) #20

Personally, I didn’t use f1_score as a metric during training. I just used regular accuracy. The LB is scored by f1 based on the predictions you submit.


(Gerardo Garcia) #21

log_preds,y = learn.TTA()
preds = np.exp(log_preds)
preds = np.argmax(log_preds, axis=1)
metrics.f1_score(data.val_y, preds, average=‘micro’)

I used this code to calculate the f1_score but I would like to add it the learn.fit but does not seems to be working.