Inconsistent results when checking # of classes in a dataset

I was checking how many classes were in the dataset, sometimes it said 17 classes, or sometimes 16 classes?!

After some investigation, I concluded that the default settings grab 80% of the dataset for test. Most likely when the computer tells me there are 16 classes, the 17th class does not appear in the test set, and only appears in the validation set.

What do y’all think of this explanation for this behaviour from the computer?

(This was using dataset “URLs.PLANET_SAMPLE”)

If I had to guess, you may be randomly selecting it and missing one or two instances of a very rare class

Agreed! I’m able to confirm this by printing out the names of the classes and manually checking

How are you splitting the data? Randomly?

Hey Zach, my initial question is solved!

The data was split randomly

1 Like

That would do it :wink:

1 Like