Classifying unlabelled images

hud · January 4, 2019, 4:30am

Hello

I have gathered some thousands of unlabelled photos of sewers in my hometown and would like to classify them (e.g. broken, cracked, clean, clogged, etc.).

From what I gather I need to first manually label some (hundreds?) of them and train the rest but the manual labelling seems like a lot of work already.

I was wondering if there’s a way to train the unlabelled images as-is and automatically get the images pseudo-labelled / sorted into arbitrary categories?

Hope this makes sense!

Cheers, Hud

Daniel.R.Armstrong · January 4, 2019, 1:34pm

I have been trying to find the fastest way to create instance segmentation masks and one of the solutions I found that would save time is prodi.gy, which uses a model to minimize how much labeling you have to do. It is from the makers of spacy. It was too expensive for me at this time(390). It has some really neat nlp tools, but also offers a CV tool that picks which images you need to label based on how unsure the model is of the outcome, that way you are labeling the images that will have greatest impact on your model. Apparently you gave it any python based model, but I am not really sure how that works.

I am hoping platform.ai does something similar. If you use platform.ai it will cluster the images so that might save you time as well.

There are a ton of labeling tools but I found alot of the project have been abandoned.

digitalspecialists · January 4, 2019, 1:45pm

Yes that’s exactly what you could do. Label enough images to get goodish results and predict the rest, using the results as new training labels.
You can manually brute-force review the labels predicted for accuracy, which is usually more rapid than labelling from scratch. Or you can use the pseudo labels to train more fully, then review the ‘most incorrect’ instances from your validation set.

hud · January 6, 2019, 11:11pm

Hey Daniel thanks for this but that is 390 dollars too expensive for me. I did find suitable ones from a list here.

Currently working with labelme.

hud · January 6, 2019, 11:14pm

Hi how do you pseudo-label or review via brute-force ? Do you have example e.g. notebooks that show this ?

Daniel.R.Armstrong · January 7, 2019, 4:55am

Did you try RectLabel? I think it has some great features, easy to use, and it is pretty cheep. The only problem is you have to use a mac. Are you using a polygon or a bounding box for labeling specific aspects of the images? If you are just classifying images, as broken, cracked, etc. I would just use the different folder aproach that Jeremy talks about in the class(a different folder for each class). You could just use the fast.ai tools to move the misclassified images to the correct folder.

hud · January 9, 2019, 1:40am

The open-source labelme works similar to RectLabel.

But labelme has the problem (after some hours worth of labelling, ugh) that it creates individual JSON file for each image … unlike the COCO JSON files which can be interpreted using fastai.
So far I think there is no solution to this.
Unless if fastai has a function for this?

So I am trying out the RectLabel it exports to xml and can be converted to coco-format, csv and labelled PNG files. If it pans out well I might just buy it its so inexpensive. Thanks for the tip @Daniel.R.Armstrong