Kaggle Comp: Plant Seedlings Classification

ecdrid · November 28, 2017, 6:21pm

Have a look at the messages from the top…
It will answer all…
Regarding the CSV…
After removing the space inbetween,
At the time of submission we need to undo that …using

data.classes…

Moody · November 28, 2017, 6:30pm

The steps are:

Download data from Kaggle under “data/seedlings/”
unzip train.zip under “data/seedlings/”
run the script and generate the labels.csv under “data/seedlings/” (then you can use this labels.csv to count and visualize the data)

Since we are going to use ImageClassifierData.from_csv, all the images need to sit under “train” folder and the sub-folders become redundant.
4. mv train/**/*.png to move files from species sub-folders to “train” folder

rm -r train/**/ to remove all species sub-folders

Hope this help. All credit to @shubham24 to provide the codes.

elfrank · November 28, 2017, 6:32pm

After five experiments (~3 hours), I submitted the best one.
Kudos to the fast.ai library.
I hope that put things in perspective for other people.

The thread to parse the data and to create the final CSV file was super useful.
Thanks to everyone that shared their insights.

kcturgutlu · November 29, 2017, 1:09am

Maybe a late reply and you probably don’t need it anymore but still

def f1(preds, targs):
     preds = np.argmax(preds, 1)
     targs = np.argmax(targs, 1)
     return sklearn.metrics.f1_score(targs, preds, average='micro') 
learn = ConvLearner.pretrained(f_model, data=data, ps=0.5,xtra_fc=[], metrics=[f1])

My targets are one-hot encoded for example [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] -> class 3

This worked for me

Sree · November 29, 2017, 5:11am

Thank you.

Partha · November 29, 2017, 6:31am

Shubam, trying to create labels.csv …Unable to find glob2…what am i doing wrong

ecdrid · November 29, 2017, 6:37am

https://drive.google.com/file/d/1WCQtBTWD2cpNZ76h1YgYae4SXqjai3lt/view?usp=sharing

Use this csv file

its has been edited…

vikbehal · November 29, 2017, 6:40am

conda install glob2

Partha · November 29, 2017, 6:53am

Thanks

Partha · November 29, 2017, 7:00am

thanks aditya

vikbehal · November 29, 2017, 8:54am

Okay. Looks like I’m struggling with this competition but seems like it’s giving me an opportunity to play with many parameters.
With bs of 64 or 32 I wasn’t getting a curve which flattens or loss starts to increase so I tried bs of 16. This is how my lr curve is. This suggests me to use 1 as lr which eventually giving no better than .6 f1 score.

Any guidance?

Sree · November 29, 2017, 11:45am

I am getting the same error. I have deleted the sub-folders in train folder but i checked again to make sure. How did you fix this?

sermakarevich · November 29, 2017, 11:50am

I would say you have some issues with data/labels/filenames/anything_not_related_to_training. You have an incredibly low loss (lower than even @jamesrequa declared in this thread) and very low accuracy with high variance. In three epochs (one of which after unfreeze) you should get 0.9 + accuracy with 4-5 times higher loss.

ecdrid · November 29, 2017, 11:52am

Check whether you are passing suffix or not…

In my case I was…

Sree · November 29, 2017, 11:55am

Oh yes! I was passing suffix. My bad. Thanks!

vikbehal · November 29, 2017, 11:56am

I redid everything and ignored f1 metrics for now and things are looking promising.

Question: reduced batch size to 32 for this problem (even though GPU could accommodate) so that each epoch (gradient) gets more time to learn? Or right way to think is because we had less number of images?

10 places improvement happened because of this equation.
lrs=np.array([lr/18,lr/6,lr/2]) than original lrs=np.array([lr/9,lr/3,lr])

Thank you @mmr, @ecdrid for continuous guidance.

ecdrid · November 29, 2017, 12:06pm

Now it’s solved?

Sree · November 29, 2017, 12:07pm

Yes!

mmr · November 29, 2017, 12:17pm

Jeremy would be proud.

alexvonrass · November 29, 2017, 1:49pm

Disregard this, didn’t see that you’ve fixed it already