General questions for creation of images for datasets


For following along the lessons 2 and 3, I’ve been making some pictures of parts with OpenCV and my webcam. I’m having some mixed results when I’m training. Especially, I have very high accuracy, very low validation loss, and low training loss (but a factor of 10-50 higher than the validation loss). So I guess I’m underfitting.


I’m taking Lego connectors as models, different sizes, different colours. As said there’s high accuracy, but when I test the model in an opencv program, it’s difficult to correctly determine the part.

I would like to ask about what it takes to create good usable images if you have the physical environment under control.

For example, I make now pictures on a white surface, with a circular loop light above.
Should I take different pictures of the same part on different surfaces, in different lighting conditions? For example print random patterns on the surface I place the part on? Take pictures outside in the sun? I.o.w. should I provide as much variation as possible?

Does the quality of the images have a huge difference in the model prediction performance? Taking pictures with a webcam vs DSLR camera.
Is there noticeable difference in high/low contrast, or some more graduations in the pictures?

What’s a ballpark figure (per part) that is needed to train the model? I now have some 5400 pictures of 17 classes. To little?

Thank you fro your time,