Kaggle Comp: Plant Seedlings Classification


(Aditya) #92

Have a look at the messages from the top…
It will answer all…
Regarding the CSV…
After removing the space inbetween,
At the time of submission we need to undo that …using

data.classes…


(Sarada Lee) #93

The steps are:

  1. Download data from Kaggle under “data/seedlings/”

  2. unzip train.zip under “data/seedlings/”

  3. run the script and generate the labels.csv under “data/seedlings/” (then you can use this labels.csv to count and visualize the data)

Since we are going to use ImageClassifierData.from_csv, all the images need to sit under “train” folder and the sub-folders become redundant.
4. mv train/**/*.png to move files from species sub-folders to “train” folder

  1. rm -r train/**/ to remove all species sub-folders

Hope this help. All credit to @shubham24 to provide the codes.


#94

After five experiments (~3 hours), I submitted the best one.
Kudos to the fast.ai library.
I hope that put things in perspective for other people.

The thread to parse the data and to create the final CSV file was super useful.
Thanks to everyone that shared their insights.


(Kerem Turgutlu) #95

Maybe a late reply and you probably don’t need it anymore but still :smiley:

def f1(preds, targs):
     preds = np.argmax(preds, 1)
     targs = np.argmax(targs, 1)
     return sklearn.metrics.f1_score(targs, preds, average='micro') 
learn = ConvLearner.pretrained(f_model, data=data, ps=0.5,xtra_fc=[], metrics=[f1])

My targets are one-hot encoded for example [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] -> class 3

This worked for me :slight_smile:


(K Sreelakshmi) #96

Thank you.


(Parthasarathy Mohan) #97

Shubam, trying to create labels.csv …Unable to find glob2…what am i doing wrong


(Aditya) #98

https://drive.google.com/file/d/1WCQtBTWD2cpNZ76h1YgYae4SXqjai3lt/view?usp=sharing

Use this csv file

its has been edited…


(Vikrant Behal) #99

conda install glob2


(Parthasarathy Mohan) #100

Thanks


(Parthasarathy Mohan) #101

thanks aditya


(Vikrant Behal) #102

Okay. Looks like I’m struggling with this competition but seems like it’s giving me an opportunity to play with many parameters.
With bs of 64 or 32 I wasn’t getting a curve which flattens or loss starts to increase so I tried bs of 16. This is how my lr curve is. This suggests me to use 1 as lr which eventually giving no better than .6 f1 score.
image

image

Any guidance?


(K Sreelakshmi) #103

I am getting the same error. I have deleted the sub-folders in train folder but i checked again to make sure. How did you fix this?


(sergii makarevych) #104

I would say you have some issues with data/labels/filenames/anything_not_related_to_training. You have an incredibly low loss (lower than even @jamesrequa declared in this thread) and very low accuracy with high variance. In three epochs (one of which after unfreeze) you should get 0.9 + accuracy with 4-5 times higher loss.


(Aditya) #105

Check whether you are passing suffix or not…

In my case I was…


(K Sreelakshmi) #106

Oh yes! I was passing suffix. My bad. Thanks!


(Vikrant Behal) #107

I redid everything and ignored f1 metrics for now and things are looking promising.

Question: reduced batch size to 32 for this problem (even though GPU could accommodate) so that each epoch (gradient) gets more time to learn? Or right way to think is because we had less number of images?

10 places improvement happened because of this equation.
lrs=np.array([lr/18,lr/6,lr/2]) than original lrs=np.array([lr/9,lr/3,lr])

Thank you @mmr, @ecdrid for continuous guidance.


(Aditya) #108

Now it’s solved?


(K Sreelakshmi) #109

Yes!:+1:


#110

Jeremy would be proud.


(Alexander Rass) #111

Disregard this, didn’t see that you’ve fixed it already