How to make a ResNet do Black and White Images

muellerzr · February 18, 2020, 4:26am

Recently faced trying to tackle this. The solution is to change the first ConvLayer to where it’s ni is 1. To do so assume we have made an encoder via doing create_body (this example is a resnet34):

body = create_body(resnet34, pretrained=True)
body[0] = nn.Conv2d(1, 64, kernel_size=(7,7), stride=(2,2), padding=(3,3), bias=False)

From here we’d probably want to make this particular layer trainable Hope this helps someone

bwarner · February 18, 2020, 4:34am

You also can convert and use the original weights, as Ross Wightman does in pytorch-image-models:

github.com

rwightman/pytorch-image-models/blob/f098fda2ca48bf66b611237787b52b9614003a8b/timm/models/helpers.py#L74




def load_pretrained(model, cfg=None, num_classes=1000, in_chans=3, filter_fn=None, strict=True):
    if cfg is None:
        cfg = getattr(model, 'default_cfg')
    if cfg is None or 'url' not in cfg or not cfg['url']:
        logging.warning("Pretrained model URL is invalid, using random initialization.")
        return


    state_dict = model_zoo.load_url(cfg['url'], progress=False, map_location='cpu')


    if in_chans == 1:
        conv1_name = cfg['first_conv']
        logging.info('Converting first conv (%s) from 3 to 1 channel' % conv1_name)
        conv1_weight = state_dict[conv1_name + '.weight']
        state_dict[conv1_name + '.weight'] = conv1_weight.sum(dim=1, keepdim=True)
    elif in_chans != 3:
        assert False, "Invalid in_chans for pretrained weights"


    classifier_name = cfg['classifier']
    if num_classes == 1000 and cfg['num_classes'] == 1001:
        # special case for imagenet trained models with extra background class in pretrained weights

Pomo · February 18, 2020, 6:12am

It seems like you’d want to use the mean across the three input channels of the pretrained weights.

Is that what this code is doing?

jeffbiss · October 11, 2021, 4:56pm

OK, I’m working on B&W spectrograms and found that Resnet expects RGB so I added the single channel as two additional layers as shown by my programmatic checks of my images:

the shape of pix is:  (224, 224, 3)
image format:  PNG
image size:  (224, 224)
image mode:  RGB

I am getting what appear to be poor loss results from resnet18 and 34 and am not sure what you are saying here. create_body states:

"Cut off the body of a typically pretrained arch as determined by cut"

What would ever make me think that that would solve my problems? Can you point to a source that discusses this type of detail? The API doesn’t provide any information that is intelligible to me.