Well, you are inserting true labels for data it already got right anyway, on the other hand your making it harder for it to get the wrong labels right since your pushing it to be wrong. I think in general it’s better when the valid/test are from different distributions than the training data, like state-farm, since in this case there is something important to learn outside the training-set.
The end-goal is to use pseudo-labeling on the test-set, not the validation set. You use the validation set only to get an indication that your pseudo-labeling parameters are good.
I think in the end it depends on your goal. If your goal is to do well on the test-set, like competitions, then it can help. But even in Kaggle competitions that isn’t really the end-goal. The end-goal there is for the competition creator to get a model that works well in the real-world, where there are no pseudo-labels. They can even take a winning model and retrain it on the whole set of data (Including test-data with real labels) to get a production model. In that case, by giving them a model which got a good competition result by using pseudo-labels, aren’t we sort of cheating them? I’d prefer a production model that generalized well without pseudo-labels, since that seems to have more potential on never before seen data.