My critique would be that you picked an easy data set. MNIST is centered and has no rotation, but real data sets in the wild are not so clean (not centered, with rotation, and all sorts of other artifacts).
Additionally, you trained to predict the rotation using a lot of images, but could you learn the relevant convolutions with just a few images (maybe 50)? For applications where self-supervision could be really helpful, the data set size is normally small hence the desire to use self-supervision.
I do plan on playing with other (harder) datasets, and perhaps some other self-supervised learning methods as well, such as CPC (Contrastive Predicitive Coding).
Traditionally, 3D datasets of CT or MRI are expensive to produce since they required highly specialized tools and technicians to run the machines. As a result, they are generally small.
So, maybe a 3D dataset of less than 50 samples.
I would also be curious to see the jigsaw paper implemented in fastai.