[PR #1050] Add sorting in uniqueify

Link to PR

If we create an ImageDataBunch, classes ordering (class names in the classes list) will depend on the files in the data source. If we remove some of the files / add files, even if the actual classes do not change, the order in the list classes might be different. This can raise issues if people modify their datasets as showed in lecture #2.

Apologies for no tests - will be happy to add them. I probably will not be able to spend much time in front of my computer before Monday though.

I have also not verified that this works as intended. Also, it will be a breaking change vs the models trained with earlier versions, something worth considering.

I’m unsure what bugs this fixes. Can you be more specific?

Create an ImageDataBunch from folder A. Train the model. Get more of same data and put it in folder B (same classes). Create ImageDataBunch from folder B. Load weights. Class ordering might differ - model will no longer work as intended.

This is the behavior with using regexps but it should also affect other methods for creating ImageDataBunch.

Any modifications to data (adding, removing, renaming) might change the order in classes list passed to dataset constructors.

This is my understanding - maybe I am wrong. Will look into this further when I’m in front of a PC and write a test demonstrating this if it indeed is the case.

Ah, OK.
Note that if you want to use a model that was trained on certain classes, you should always pass them in the constructor (which will override the line you amend).

2 Likes

Maybe this is not a bug than? Feels like something nice to do for the users but not sure if implementing this now makes much sense (due to it being a breaking change)

Maybe setting the classes upon loading a Learner would be nicer?

1 Like

What seems to be quite important here is that we don’t seem to have a way of passing classes into ImageDataBunch.from_x.

On one hand, this requiring users to save and pass classes if we can provide a simple automated way of going about this that works most of the time does not seem that great. On the other hand I really dislike sticking the sorting in uniqueify. sort_and_uniqueify would be better, uniqueify with a default argument sort=False also seems better.

Anyhow, would really appreciate your opinion on this if you feel that this requires fixing. In the least I think we need to make classes passable to the builder class methods.

The from_x methods are nice, but all the flexibility will come from the new data block API, in which you can pass this class argument. We’ll rewrite the from_x methods soon-ish in any case (to use the data block API), so I’ll keep in mind this for when we do.

1 Like