Pseudo Labeling in ML

vwrideout · December 1, 2016, 6:42pm

A quick google search on psuedo labeling only returns articles and papers pertaining to deep learning. I’m curious whether psuedo labeling is useful in other machine learning contexts, such as training a random forest classifier.

yad.faeq · December 1, 2016, 9:32pm

Add the word Semi-Supervised Learning with Pseudo-Label in the search, it can bring up more results. In case of random forests, there could be a risk of over-fitting.

Here are two papers that do some-type of semi-supervised approach (which could be looked at as Pseudo labeling, in terms of making use existing data)

Unified Face Analysis by Iterative Multi-Output Random Forests, they use 2 Random Forests, one to assist the other.
Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests: “propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset”

jeremy · December 5, 2016, 5:46am

I haven’t seen those before - thanks for the links! And very interesting question @vwrideout ; I’d be interested to hear if you try it…

berkmeister · January 8, 2017, 6:23pm

In the pseudo-labelling paper, it is suggested that:

For unlabeled data, Pseudo-Labels, just picking up the class which has the maximum predicted probability, are used as if they were true labels.

However, in Statefarm.ipynb, we are concatenating the predicted probabilities, and not the MAP estimates, to the trained labels.

val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])

Aren’t we supposed to concatenate to_categorical(bn_model.predict_classes(conv_val_feat, ...) ) to the training labels here? ( or does it not matter because in this case the model was overconfident and produce just 0 and 1 probabilities anyway)

jeremy · January 9, 2017, 9:35pm

That’s why I describe the approach as being a mix of pseudo labeling and knowledge distillation (which uses the probabilities).

dmhv · February 7, 2017, 11:31am

Subjectively, the idea of pseudo-labeling sounds a bit like putting a car on a highway and placing a brick on the gas pedal - for a while, the car will be driving very well indeed, with significant positive acceleration. But after a while a turn may come, or, in our terminology, the properties of the data stream which you’re putting into your predictive model may change, requiring you to step back and perhaps go through the labeling process once again.

shushi2000 · May 18, 2017, 5:11am

Those predicted pseudo labels must have some “wrong” labels, because the model is not 100% accurate. So, Why it is helpful to use Pseudo labeling? I thought there is a saying goes as “garbage in, garbage out”, right?

jeremy · May 19, 2017, 11:41pm

As it turns out, no! Try reading the paper for a deeper understanding: http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf

jeremy · May 19, 2017, 11:41pm

This is true of all models - nothing specific to pseudo-labeling AFAICT!

msp · August 9, 2017, 8:32pm

Actually, if you read the paper, the idea is to continuously re-pseudo-label the unlabelled points based on the current model:

“For unlabeled data, Pseudo-Labels re-calculated every weights update are used for the same loss function of supervised learning task.”

swigi · August 13, 2017, 1:53pm

Am I correct that I’m thinking of pseudo-labeling similar to K-means clustering? Of course, it only makes sense if you apply it many times, i.e. re-guess the unlabeled data on the subsequent iterations.