I’ve noticed something ‘weird’ about create_head.
It creates this sequence of layers: BatchNorm1D -> Linear -> ReLU. Is that correct?
I was taught that BN should come before the non-linearity. If you look at ResNet models, it is arranged as BatchNorm2D -> ReLU -> Conv2D.
So is this a bug or a design choice? Shouldn’t it be BatchNorm1D -> ReLU -> Linear. If it’s not a bug, why?