I want to create a UNet model and train it on solar data, which contains negative values as well. I want to use resnet as base model since it extracts features very well. The only problem is, it uses relu as activation function which cancels out negative values. So is there some way I can change the activatetion function to tanh or something else? I’m going to use an untrained resnet model.
You can change the activation function, but you probably shouldn’t, and there really is no need to.
Even for image models, we use negative inputs. Often the pixel values (0-255) are scaled to the range [-1, 1] or to roughly [-3, 3] when using mean/std normalization.
You don’t really “lose” the negative values in your data because the convolution weights can also be negative, resulting in a positive activation.
Hmmm. Okay, I can try that. But I have a question here. Let’s say we have multiple hidden layers and in of the previous hidden layers, we have a negative output. Then this negative output passes through Relu and becomes zero. This zero will now get multiplied by the weight (could be negative) in the next layer (and gets added with bias but that could be a small value) and the output of that convolution will be zero, right? And this will continue till the final layer. Won’t it?
Yes, that could happen. But remember that each convolution layer has many different filters. In some filters the weight might be negative, in other filters the weight will be positive. So even if you have a negative output from one layer, it will still be used by many of the filters from the next layer.
@muellerzr Thanks for pointing to Mish. @sarvagya1991 Feel free to give it a try. Mish is available in FastAI itself. If you wanna check out my work, feel free to visit this page - This contains the links to my code repository and the official BMVC paper. I’m happy to answer any questions regarding Mish if you have any
@machinethink@muellerzr@Diganta , just one more query though. I want to use the resnet model on a (1,512,512) image with 16 to 32 batches (my GPU isn’t able to process 64 batches). But the resnet is trained on 64 batches. So what do you recommend I should do?
FYI this function seems to not be doing what it should…I get some errors with Mish I need to look into. In the meantime I’ve been seeing success with just passing in an act_cls and seeing improvement. (Once I get a technique for adjusting this activation function I will update this comment)