I’ve been trying to implement in Fast.ai a new semi-supervised learning method called “MixMatch” described in a paper here: link.

Unfortunately, I’ve no idea how to handle batch processing of items when we have to divide our training set into two parts: one part labeled and the other - unlabeled.

According to the paper we have to divide the batch equally, so that precisely one half of the batch data is labeled (they have to come from the “labeled” folder, basically), and the other - unlabeled. It seems unclear to me, however, how to implement such a behaviour in Fast.ai library.

I know that it’s possible to implement a custom PyTorch data loader, but I’m not sure if that’s the right approach (and if it is, I’m unsure where should I put it).

I’d appreciate any help or feedback on this, as I’ve been trying to figure this out for a while together with bunch of my colleagues.