Nice work. I really like it a lot.
My critique would be that you picked an easy data set. MNIST is centered and has no rotation, but real data sets in the wild are not so clean (not centered, with rotation, and all sorts of other artifacts).
Additionally, you trained to predict the rotation using a lot of images, but could you learn the relevant convolutions with just a few images (maybe 50)? For applications where self-supervision could be really helpful, the data set size is normally small hence the desire to use self-supervision.
Again, this is great work. Thanks for sharing.