Great idea! In the last few days I have been implementing (or trying to) the Unsupervised Data Augmentation and MixMatch papers. I still cannot share the code as I am doing it at work, but I should be able to do it soon enough and would love to get a few eyes on it.
What I have got so far (keeping in mind that I suspect everything will be much easier with the next release of fast.ai and when part 2 of the course is made public):
- Just completed a working version of the basic UDA cycle which should be usable with any classification task that has some kind of data augmentation.
- Got mid way through a working version of the Training Signal Annealing (from the UDA paper). Currently missing a modified version of the cross entropy loss. The main challenge here is finding a way of implementing it so that it also applies to the Mixup loss or a modified version of it (I hope to do some experiments with MixMatch and UDA at the same time).
- Did a few experiment on CIFAR-10 with 250 and 4000 labels. So far it looks like UDA helps making the training more stable and squeezing 1-2% points of improvements on error rates (but I am still miles away from the results in the paper: I am around 40% error rate at the moment with 4000 labels), it also seems to be a good regularisation technique.
- Half way through refactoring the code so that I can reuse all the supervised training code that is not specific to UDA for MixMatch and other experiments
- Finish implementing tsa
- Try to replicate the ablation studies in the UDA paper (there’s no way in hell I will manage to use Autoaugment, so I will have to try with simple cropping and flipping and see id I can get to 16-17% error rate).
- Autoaugment. From what I understand, it requires way too much computing power. UDA seems to be heavily reliant on the augmentation policies.
- In order to make good use of unsupervised training, I have to use good batch sizes on the unlabled examples, and I soon run into GPU memory problems with anything larger than WideResNet 28-2 (I also suspect I have a memory leak somewhere).
- No mention of hyperparameters tuning in the paper, so some experiment will have to be done.
In order to address the memory problems, I am also thinking to split the unsupervised and supervised training into different cycles, NoGAN style and keeping the semisupervised training for the last bit.
Not sure if I’ll have the time though (I am still doing all of this at work, after all).