Kaggle Comp: Plant Seedlings Classification

gerardo · November 25, 2017, 2:52am

I cannot believe it 10th place

I have been using Kaggle for 2 years and this is the first time that I can accomplish this

merajat · November 25, 2017, 3:02am

Just there. Thanks @jeremy

damngamerz · November 25, 2017, 4:04am

Yup Can’t believe it either Thanks @jeremy and @jamesrequa
So top 4 is officially fastai students and community.
This I didn’t even used the whole training set…I think i can still get a bump using Kstratified CV
and using my full training set.

brightertiger · November 25, 2017, 6:02am

Hi,

Can someone share the accuracy / top 1 error rate for their model?

Regards

jamesrequa · November 25, 2017, 6:24am

Here’s mine:
best validation loss 0.08718
best accuracy 0.97656

ecdrid · November 25, 2017, 3:43pm

Should we remove all the folders in the train/folder and extract their files together and then use from_csv

Keep the images, remove the folders.

shubham24 · November 25, 2017, 4:17pm

@ecdrid: Yes, first move all the images to the parent folder i.e train.
user@computer:~/fastai/courses/dl1/data/plant-seedlings-classification$ mv train/**/*.png train

and then, remove all the species folders:
user@computer:~/fastai/courses/dl1/data/plant-seedlings-classification$ rm -r train/**/

ecdrid · November 25, 2017, 4:40pm

@shubham24

Getting the following traceback

/io/opencv/modules/imgproc/src/color.cpp:10606: error: (-215) scn == 3 || scn == 4 in function cvtColor

The notebook is attached

https://nbviewer.jupyter.org/github/AdityaSoni19031997/Machine-Learning/blob/master/Untitled.ipynb

How to fix this??

PS-:Can someone confirm whether the following directory structure is correct or not?

~/data/planet/
test train labels.csv
test - containing files as it is downloaded
train - containing files from all species folder(species folders removed)
labels.csv(mapping of training image names to classes)

shubham24 · November 25, 2017, 5:15pm

/io/opencv/modules/imgproc/src/color.cpp:10606: error: (-215) scn == 3 || scn == 4 in function cvtColor

Are you sure you have deleted the species folders after moving the images?
The errors says that cvtColor with the COLOR_BGR2RGB expects an image with 3 or 4 channels, but got something different. Most likely, you haven’t deleted the folders.

Also, when you create labels.csv, do replace the relevant part with the following:

df = df.append({
  "file": file_,
  "species": species.replace(" ", "_")
   }, ignore_index=True)

There are spaces in the species name. Replace them with underscores.

And, when you finally generate predictions and submission file, do the reverse:

log_preds_test = np.argmax(log_preds_test, axis=1)
preds_classes = [data.classes[i].replace("_", " ") for i in log_preds_test]
probs = np.exp(log_preds_test)

submission = pd.DataFrame(preds_classes, os.listdir(f'{PATH}test'))

ecdrid · November 25, 2017, 5:49pm

According to the docs…

  suffix: suffix to add to image names in CSV file (sometimes CSV only contains the file name without file
                    extension e.g. '.jpg' - in which case, you can set suffix as '.jpg')

I shouldnt pass suffix then as the labels files has it

This solves the problem…

brightertiger · November 25, 2017, 5:51pm

Hi,

I was looping through the dataset and I see that the image sizes are varying a lot (49 to 3000+) . I need to scale them. torchvision.transforms.Resize is not working. What is the strategy to handle this? Border padding or thumbnail transformation?

ecdrid · November 25, 2017, 5:52pm

fast.ai does that for you
resize(sz*1.3)

damngamerz · November 25, 2017, 7:59pm

Mine was.
best validation loss 0.12573
best accuracy 0.95814

jeremy · November 25, 2017, 11:42pm

You don’t need to do that - that’s just a time saver. The best way is to use fastai lib transforms. We don’t use torchvision. See any of the lesson notebooks for lots of examples. We’ve been using sz for the size variable BTW.

gerardo · November 25, 2017, 11:59pm

What happen with the size?
The seedlings images are 1380x1380 pixels
The model from cats and dogs is 224x224.

What’s the best way to tackle this issue?
When I changed the sz=1380 the whole model never finished until I had to stop the whole notebook.

jamesrequa · November 26, 2017, 12:07am

You can set sz to any size you want and all of the images will be resized to that same consistent image size. Personally, I tried both 224x224 and 300x300.

thiago · November 26, 2017, 1:05am

@jeremy got you! =D

gerardo · November 26, 2017, 1:45am

When I put the conffusion matrix

plot_confusion_matrix(cm, data.classes)
Something like this shows up

[[ 54 0 0 0 5 0 4 0 0 0 0 0]
[ 1 189 0 0 0 0 0 0 0 0 0 0]
[ 0 0 85 0 2 0 0 0 0 0 0 0]
[ 0 0 0 403 1 0 1 0 4 2 0 0]
[ 0 0 0 0 21 0 0 0 0 0 0 0]
[ 2 4 0 1 3 265 0 0 0 0 0 0]
[ 63 0 0 0 1 0 387 1 2 0 0 0]
[ 0 0 0 0 0 0 0 21 0 0 0 0]
[ 1 0 2 9 0 0 0 0 301 3 0 0]
[ 0 0 0 0 0 0 0 0 1 30 0 0]
[ 0 0 2 0 0 0 1 0 0 0 293 0]
[ 0 1 1 2 12 0 1 3 0 0 0 165]]

I would like to run the model only on those elements that are out of the diagonal

I’m not looking for answers I’m looking for guidance on best practices to analyze those cases and try to fix them if possible.

jeremy · November 26, 2017, 2:20am

I hate you all!

memetzgz · November 26, 2017, 3:08am

Not quite there yet, but hot in pursuit . . .