DataBlock-Splitter error (Image)

Hello, I am trying to train image classification task (Bees vs. Wasps) by using Kaggle’s data [https://www.kaggle.com/jerzydziewierz/bee-vs-wasp].

This task provides “labels.csv” and columns tell which item is train, valid, or test.

So, I split train and test data set separately,

Then, I made “splitter” function to split train and validation set by using train data set,
(PP) Bee3

and… It looked like working well when I created “datablock”.

But here is the problem.

When I put “splitter” in DataBlock, there were some errors.


I guessed the problem is “index” problem, so my “splitter” function might be wrong.

I would like to ask you if there is any smart way to make “splitter”?

The “labels.csv” has images’ path, and I think there is something ways to use that information to create DataBlock.

I really need your help, and thank you for your time!

There’s a default splitter you can use: splitter=ColSplitter('is_validation'). Try it out but might also need to change the column from 1/0 to True/False.

I think the problem is that indices returned from splitter are expected to be from range [0, n) (like you’d use in iloc) and your index contains holes after you took out the final validation records.
So either reindex after you create df_train. Or enumerate in your split function to get the actual positions.

Thanks for your reply!
I already tried to use “ColSplitter” by using both int and bool values but it didn’t work…

Thanks for your answer!
As you said, I tried to rebuild index after I created df_train. It works very well!