A quick google search on psuedo labeling only returns articles and papers pertaining to deep learning. I’m curious whether psuedo labeling is useful in other machine learning contexts, such as training a random forest classifier.
Add the word Semi-Supervised Learning with Pseudo-Label in the search, it can bring up more results. In case of random forests, there could be a risk of over-fitting.
Here are two papers that do some-type of semi-supervised approach (which could be looked at as Pseudo labeling, in terms of making use existing data)
Unified Face Analysis by Iterative Multi-Output Random Forests, they use 2 Random Forests, one to assist the other.
Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests: “propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset”
I haven’t seen those before - thanks for the links! And very interesting question @vwrideout ; I’d be interested to hear if you try it…
In the pseudo-labelling paper, it is suggested that:
For unlabeled data, Pseudo-Labels, just picking up the class which has the maximum predicted probability, are used as if they were true labels.
However, in Statefarm.ipynb, we are concatenating the predicted probabilities, and not the MAP estimates, to the trained labels.
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])
Aren’t we supposed to concatenate
to_categorical(bn_model.predict_classes(conv_val_feat, ...) ) to the training labels here? ( or does it not matter because in this case the model was overconfident and produce just 0 and 1 probabilities anyway)
That’s why I describe the approach as being a mix of pseudo labeling and knowledge distillation (which uses the probabilities).
Subjectively, the idea of pseudo-labeling sounds a bit like putting a car on a highway and placing a brick on the gas pedal - for a while, the car will be driving very well indeed, with significant positive acceleration. But after a while a turn may come, or, in our terminology, the properties of the data stream which you’re putting into your predictive model may change, requiring you to step back and perhaps go through the labeling process once again.
Those predicted pseudo labels must have some “wrong” labels, because the model is not 100% accurate. So, Why it is helpful to use Pseudo labeling? I thought there is a saying goes as “garbage in, garbage out”, right?
As it turns out, no! Try reading the paper for a deeper understanding: http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf
This is true of all models - nothing specific to pseudo-labeling AFAICT!
Actually, if you read the paper, the idea is to continuously re-pseudo-label the unlabelled points based on the current model:
“For unlabeled data, Pseudo-Labels re-calculated every weights update are used for the same loss function of supervised learning task.”
Am I correct that I’m thinking of pseudo-labeling similar to K-means clustering? Of course, it only makes sense if you apply it many times, i.e. re-guess the unlabeled data on the subsequent iterations.