Change activation function in ResNet model

Hi,

I want to create a UNet model and train it on solar data, which contains negative values as well. I want to use resnet as base model since it extracts features very well. The only problem is, it uses relu as activation function which cancels out negative values. So is there some way I can change the activatetion function to tanh or something else? I’m going to use an untrained resnet model.

Please let me know.

1 Like

You can change the activation function, but you probably shouldn’t, and there really is no need to.

Even for image models, we use negative inputs. Often the pixel values (0-255) are scaled to the range [-1, 1] or to roughly [-3, 3] when using mean/std normalization.

You don’t really “lose” the negative values in your data because the convolution weights can also be negative, resulting in a positive activation.

2 Likes

This function can change all the activation in any model. I recommend Mish as of course it’s new and does awesome :wink:

def convert_act_cls(model, layer_type_old, layer_type_new):
    conversion_count = 0
    for name, module in reversed(model._modules.items()):
        if len(list(module.children())) > 0:
            # recurse
            model._modules[name] = convert_act_cls(module, layer_type_old, layer_type_new)

        if type(module) == layer_type_old:
            layer_old = module
            layer_new = layer_type_new
            model._modules[name] = layer_new

    return model

This was taken from the convert_MP_to_blurMP in ImageWoof/Nette and converted over for replacing any layer. For instance:

learn.model = convert_act_cls(learn.model, nn.ReLU, Mish())

Technically we can pass an act_cls to unet_learner however that will only change the unet part, not the encoder. This will do both

5 Likes

Hmmm. Okay, I can try that. But I have a question here. Let’s say we have multiple hidden layers and in of the previous hidden layers, we have a negative output. Then this negative output passes through Relu and becomes zero. This zero will now get multiplied by the weight (could be negative) in the next layer (and gets added with bias but that could be a small value) and the output of that convolution will be zero, right? And this will continue till the final layer. Won’t it?

Yes, that could happen. But remember that each convolution layer has many different filters. In some filters the weight might be negative, in other filters the weight will be positive. So even if you have a negative output from one layer, it will still be used by many of the filters from the next layer.

Thank you very much. I will check this out. Thank you.

@muellerzr Thanks for pointing to Mish. @sarvagya1991 Feel free to give it a try. Mish is available in FastAI itself. If you wanna check out my work, feel free to visit this page - This contains the links to my code repository and the official BMVC paper. I’m happy to answer any questions regarding Mish if you have any :slight_smile:

2 Likes

I have read about Mish activation function. I am definitely going to try it. Thanks a lot.

1 Like

@machinethink @muellerzr @Diganta , just one more query though. I want to use the resnet model on a (1,512,512) image with 16 to 32 batches (my GPU isn’t able to process 64 batches). But the resnet is trained on 64 batches. So what do you recommend I should do?

Sorry, I found the solution.

I see that there are a few variants of Mish (the Fastai JIT-compiled Mish, and mish_cuda). Which one is this fastest as of now?

1 Like

FYI this function seems to not be doing what it should…I get some errors with Mish I need to look into. In the meantime I’ve been seeing success with just passing in an act_cls and seeing improvement. (Once I get a technique for adjusting this activation function I will update this comment)

1 Like

Mish_CUDA is the fastest in terms of computation time while @rwightman memory efficient mish is the cheapest in terms of memory.

2 Likes

Is there any difference between your convert_act_cls function and the following? Thanks : )

Yijin

def replace_relu_to_mish(model):
    for child_name, child in model.named_children():
        if isinstance(child, nn.ReLU):
            setattr(model, child_name, Mish())
        else:
            # recurse
            replace_relu_to_mish(child)

replace_relu_to_mish(learn.model)
3 Likes

That one likely works better (mine didn’t really looking back on it :slight_smile: )

1 Like