One possibility that I read about to jump-start a model in a new domain (provided you have enough data) is using its convolution layers as the encoder part in a encoder-decoder configuration.
You can train this encoder-decoder on unlabeled data, so you don’t need ground truth.
That way the convolution layers learn how to extract meaningful features from the (unlabeled) data so that the decoder can reconstruct the input image.
You can then detach the encoder part, attach it to a dense part (if the model needs it) and train with labeled data starting from there.
Makes a lot of sense to me, but it also sounds pretty labor intensive