I’m trying to train a “Faster R-CNN” to crop to products in ad catalogues (like the image below). The products are small and although I’ve tried decreasing the
anchor_box_scales, my output does not look good. I wonder if I just need to label more data - it’s just a binary (product vs background) detection so was hoping 20 images or so is enough - does anyone have an idea of how much data you need?
I am using this Keras implementation:
Are you using pretrained weights? Which dataset have you used for pretraining?
Using resnet weights as the input_weights argument but not using pre-trained F-RCNN network if that makes sense?
I’d appreciate if you could clarify my understanding here - would it make sense to use a pre-trained F-RCNN network trained on the VOC dataset to apply to my problem? Would that mean using a pre-trained region proposal network AND a pre-trained convolutional network and does it make sense to logically separate the two? I believe currently I am using a pre-trained CNN and training the region proposal network but not 100% sure
BTW just realized I made a silly mistake in my annotation format, had
x1,x2,y1,y2 but supposed to be
x1,y1,x2,y2 - still re-training but hopefully that gives better results…
Yes, I think you have to use a pre-trained network because you don’t have enough images to train from scratch. However, your data is not very close to the VOC dataset so you should retrain parts of the model. The VOC dataset has only 20 classes  that aren’t very close to what you give as input. If you can find a pretrained network on , it could help.
In Faster RCNN the first n layers are usually trained on imagenet. I did not check how it’s done in this implementation but it’s probably the case. On top of that you have the RPN and the object detector that are trained on VOC.
Hi, were you also able to lower the false positives . I am also working on my project, using tensorflow version of faster RCNN, though results are good, FPs show up. Any ideas to reduce them.
Yeah I’d suggest doing a second model that sort of filters the proposed object detected regions.
You could even show these to humans in a MTurk style queue. There are cases with asymmetrical payoffs where the cost of a false negative far outweighs the cost of a false positive, for example.