I am in the process of creating a image classifier to identify mushrooms. I have about 3.5m images that are categorized down to the species level. I’m currently playing around with a few different ideas for improving my results.
The first thing I tried, which is similar to what this guy did:
He uses the taxonomy to creating a specialized loss function under the assumption that assumption that since breed is a subset of species, that if the classifier gets the species right, it shouldn’t be penalized as much. I’m using the genus → species hierarchy for my loss function. It’s seems to work ok but I feel I can do better.
I was then thinking about training a model on recognizing the “family”, then re-training on “genus”, then re-training again on the “species”. But I’m not sure of the best way to do this. Should I just train a model, swap out the learner.dls
with the genus dataloaders, then do it again for the species dataloaders? Is there a better way to do this.
Also, if you have any other ideas to try, let me know as this is my first real deep learning project and would love any input.
Edit:
As a quick follow up, I’m also experimenting with changing the loss functions, i.e. training a model using a loss function based on the “family”, then training again with a “genus” one and finally one focused on “species”. This way I’m always using the same dataloaders and am not mixing up the images I am training on over time.