Two Questions on using cropped images based on bounding boxes to improve model

From the lesson 7 notes …

We can extend the efficiency of this model by predicting bounding boxes and then feeding a cropped image of the fish in the bounding box to a classification model.


  1. Would we feed the cropped images into a separate model that we create and train, and then use the output of that classification model for our final predictions/submissions?

  2. How would we handle different sizes of cropped images since they will vary based on the bounding box?

Also, would it be possible/worthwhile to train a classifier on unrelated images of the different kinds of fish we need to identify … and then somehow use that data to find the fish in the fisheries competition images? Perhaps even be able to find multiple fish in the same image even???

Q1) yes, the winners of right whale recognition used that approach
Q2) You will resize them to constant size maybe 128x128 or 224x224 just as how you resized the original uncropped images

I think the imagenet has different kinds of fish in them.So using a pre-trained model would suffice.But if you have a large training data containing only fish, it can be transfered well to the fisheries competition.

for more info check

Cool! Thanks for the link and reply.

What if your starting point is only having images of specific fish … is there a way to say, “See if this particular fish exists in this image from the fisheries competition, and if it does, put a bounding box around where you think it is”?

@melissa.fabros we were just talking about this