How to efficiently annotate your own datasets?


(leon yin) #1

I work in a social science lab and labelling data is always a bottleneck.
Mind you, this is just labeling an initial dataset to validate against crowdsourced labels.

What kind of tools, tricks, or platforms do you use to label data within your institution?

For mutually exclusive images I send annotators a zip file of images and have them drag images into sub-directories for each class. What about for non-mutually exclusive images (IE where one image has more than one label)? How about for metadata and text?

I hope that the question is clear, but more importantly I hope this thread can serve as a resource for others who need to label datasets without (or before) crowd workers,


(Marc) #2

I have not used this myself, but I have heard people that were very happy with this, although it is not free:

I am in not way affiliated with this, just an option to maybe check out. (especially for nlp stuff it seems) It is by the authors of the spaCy nlp lib.
But seems to be for image annotation as well (but the happy people mentioned above were from the nlp space)


(Kevin Bird) #3

If you don’t mind spending some money, you can get your labeling done through a crowdsourcing site such as Mechanical Turk

Cancel that, this is for the initial dataset. Good question that I will keep looking at.


(Nick) #4

https://github.com/Microsoft/VoTT is ok for images