Data labeling

Hi,

I have ~350K text examples and around ~100K are labelled. Anyone can suggest what technique should I choose to label other examples.

I have tried basic techniques/algorithms like clustering, ML model but results are not good.

Any suggestions?

Labelling tool with active learning should be the way to go. If you have access to Prodigy, you could use that. It seems that KNIME also supports this.

https://www.knime.com/blog/labeling-with-active-learning

There is also one other tool ->https://github.com/RTIInternational/SMART

Try using SageMaker ground truth labeling service on AWS

I hope there will be more better answer because I have few k of image and text that are waiting me to label. But spending days and days for labeling is just a waste of time. Some of the image require multi labeling. Do any one have a better solution than putting them into folder or inputting more than one labels for one image in excel?

Data labelling and cleaning IMO is some of the hardest/most time consuming process for it. If you’re nifty with JS or C++, (or any other app based program) you could write a quick mini-app that goes through a directory of images and you could select a few labels for it.

Something like so: https://datascience.stackexchange.com/questions/14039/tool-to-label-images-for-classification

Or labelbox.io

I’ll add though this won’t be a fun process, but data labeling (for the most part) never is.

I’ll add though that the start of this thread was on NLP labeling, not image labeling :slight_smile:

try out https://platform.ai/

1 Like