Hi everybody,
I’ve been experimenting on a project to help a friend. Right now, I’m only doing multi-label classification (as a kind of warmup), but I want to move on to instance segmentation.
I’ve run into a problem, which is that the real-world predictions given by the model are inaccurate, despite there being a high accuracy_multi
score.
The real-world images are almost exactly the same format as the training images (taken with a special rig / setup), so I don’t think it’s a case of the test images being different.
I have a hunch that the problem might be because of the number of instances of each of the labels being heavily unbalanced.
I was wondering if you think that’s the problem too?
I’ve been blogging my work on this, and here’s the post I wrote, which gives my code and images:
https://lloydjones.io/2020/03/21/salmon-part-three.html
I wanted to ask the community their opinion on balancing the data first, because doing it is quite a big task (unless there’s a fast way to do it that I’m unaware of) and if the real issue is unrelated, it’s not worth doing.
My second thought is: Is there a way to quickly balance the dataset, in either Google Sheets or Python? I’d love to know this if so.