Fastai + label studio/speedy labelling

ModdingLeo · January 8, 2021, 9:07am

I’m looking to label data much quicker by using human in the loop learning.

I’ve come across label studio which seems like a really good tool to develop the labelling portion of the exercise. It has functionality where you can input a model which can be used to improve labelling (I believe it will try to automatically predict the label).

I’m curious if anyone has done something similar or if there is another tool/method someone has used?

My current method would be

Manually label some data
Train and predict with it
Get the lowest confidence examples
Manually label them
Repeat

darek.kleczek · January 9, 2021, 5:02am

There was a nice demo in the last Neurips for similar capability:
AI-assisted data labeling demo: https://neurips-assistance.mybluemix.net/

I’m also interested in this topic, please share any learnings if you can!

jimmiemunyi · January 9, 2021, 12:44pm

Thanks for the link! I didn’t even know such a system existed. I am now interested in this topic too and will try and do some research

ModdingLeo · January 9, 2021, 1:16pm

Yes I understand the concept of it, looking for some code!

Here’s a jupyter widget to train a logistic regressor on images. It has a quick demo with MNIST to understand it. As long as your model has a sklearn like fit / predict you can work with different models. The creator said things like sklearn or skorch Superintendent

Here’s a local/web app to do tonnes of different labelling types (images, text, time series, audio…), can be pretty easy to get going. You can do manual labelling or there is active learning functionality but I have yet to try it out the latter Label studio

ChrisRussell · January 14, 2021, 6:19am

Your methods are impressive

Monkeytronic · February 2, 2021, 1:31pm

I have just started using Label Studio, and I got the automatic predictions functionality to work, but I haven’t been able to get active learning working yet. They say you should be able to start active learning using the >>> --sampling prediction-score-min
option, but I found that throws an error currently. Instead, you have to manually make two changes to the config.json file in your server directory:

“sampling”: “prediction-score-min”
“experimental_features”: true

It will run with an ML backend, but when I try to get it to sort by prediction score, it seems like the prediction scores don’t fully make it from the ML backend server to the frontend UI. I will try a few more things and report back if I get anything working, but in meantime, I’d be happy to know if anyone else got it working!

Edit: To get the predictions from your ML backend to tasks on your UI frontend, you have to tell the UI backend API to predict and update the tasks on the server. To do this, just use [GET/POST] /api/models/predictions?mode=all_tasks, on your UI server. You can also just run it from your browser search bar[i.e. http://localhost:8080/api/models/predictions?mode=all_tasks] if you’re too lazy to customize Label Studio to do it automatically, or have a button that triggers it.

ModdingLeo · February 5, 2021, 2:07pm

@Monkeytronic I presume your predictions are from a previous model?

Monkeytronic · February 5, 2021, 2:31pm

I did have predictions from a previous model when initially sorting by prediction score.

However, once I got the model backend working in Label-Studio, I’ve been able to sort by prediction score from that model. I also found that, at least with their latest release, you can sort by any of the columns in the task page and the tasks should be shown to you in that order while labeling, though I’m not sure how that works for remote labelers/non-admins.

asoellinger · February 5, 2021, 3:21pm

I have been having a conversation about label studio with data versioning pipelines with Jimmy at Pachyderm. He has written a demo that integrates label studio with pachyderm to version control all the data artifacts. I think it’s worth posting here at least. I have been working on ways to use fastai with pachyderm. Would love to collaborate if anyone is interested!

kBodolai · February 5, 2021, 4:58pm

Nice ideas, everyone.
A college and I have been trying to work on something like this entirely from the comfort our notebooks. A good idea would be to integrate some kind of widget that allows you to label the images inside the notebook, and resumes training automatically after doing so, perhaps even we could work over the ClassifierCleaner widget. For now we adapted pigeon do this every few epochs, it’s really fast to implement if you want to start playing around. Still far from an elegant/automated way of active learning, but seems to be working!

ModdingLeo · February 5, 2021, 5:20pm

@kBodolai I like the look of this for ease of use!

kBodolai · February 5, 2021, 6:40pm

You can get something going really fast, it’s working great for us so far.

ModdingLeo · April 26, 2021, 7:45am

Do you have a rough workflow to get this going or have any success with a fastai model? I’m looking to setup an object detector with “new” labels (not ones in a pretrained coco based model). Or was it the case of you’d have to do manual labelling for say 1k images, train then load that in for the predictions?

I’m curious if anyone else had some good progress with this topic?

asoellinger · April 26, 2021, 6:12pm

We have an integration for segmentation masks discussed here, and you can dm me for the full code:
https://towardsdatascience.com/development-of-a-benchmark-dataset-with-an-interface-to-the-fastai-dataloader-using-label-studio-d3aa3c26661f