A quick google search on psuedo labeling only returns articles and papers pertaining to deep learning. I’m curious whether psuedo labeling is useful in other machine learning contexts, such as training a random forest classifier.
Add the word SemiSupervised Learning with PseudoLabel in the search, it can bring up more results. In case of random forests, there could be a risk of overfitting.
Here are two papers that do sometype of semisupervised approach (which could be looked at as Pseudo labeling, in terms of making use existing data)

Unified Face Analysis by Iterative MultiOutput Random Forests, they use 2 Random Forests, one to assist the other.

Realtime Articulated Hand Pose Estimation using Semisupervised Transductive Regression Forests: “propose the Semisupervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset”
I haven’t seen those before  thanks for the links! And very interesting question @vwrideout ; I’d be interested to hear if you try it…
In the pseudolabelling paper, it is suggested that:
For unlabeled data, PseudoLabels, just picking up the class which has the maximum predicted probability, are used as if they were true labels.
However, in Statefarm.ipynb, we are concatenating the predicted probabilities, and not the MAP estimates, to the trained labels.
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])
Aren’t we supposed to concatenate to_categorical(bn_model.predict_classes(conv_val_feat, ...) )
to the training labels here? ( or does it not matter because in this case the model was overconfident and produce just 0 and 1 probabilities anyway)
That’s why I describe the approach as being a mix of pseudo labeling and knowledge distillation (which uses the probabilities).
Subjectively, the idea of pseudolabeling sounds a bit like putting a car on a highway and placing a brick on the gas pedal  for a while, the car will be driving very well indeed, with significant positive acceleration. But after a while a turn may come, or, in our terminology, the properties of the data stream which you’re putting into your predictive model may change, requiring you to step back and perhaps go through the labeling process once again.
Those predicted pseudo labels must have some “wrong” labels, because the model is not 100% accurate. So, Why it is helpful to use Pseudo labeling? I thought there is a saying goes as “garbage in, garbage out”, right?
As it turns out, no! Try reading the paper for a deeper understanding: http://deeplearning.net/wpcontent/uploads/2013/03/pseudo_label_final.pdf
This is true of all models  nothing specific to pseudolabeling AFAICT!
Actually, if you read the paper, the idea is to continuously repseudolabel the unlabelled points based on the current model:
“For unlabeled data, PseudoLabels recalculated every weights update are used for the same loss function of supervised learning task.”
Am I correct that I’m thinking of pseudolabeling similar to Kmeans clustering? Of course, it only makes sense if you apply it many times, i.e. reguess the unlabeled data on the subsequent iterations.