Data Distillation: Towards Omni-Supervised Learning

Any thoughts on this paper from Facebook AI Research ?

They mention four steps:
(1) training a model on manually labeled data (just as in normal supervised learning);
(2) applying the trained model to multiple transformations of unlabeled data;
(3) converting the predictions on the unlabeled data into labels by ensembling the multiple predictions;
(4) retraining the model on the union of the manually labeled data and automatically labeled data

They use flipping and scaling for step 2, so is the code we used in lessons. But what about step 3, is it something we already do or can benefit from ?