Implementing new activation functions in FastAI library

(Theodoros Galanos) #1

Hi everyone,

I have to start by saying I was not sure where to post this, hope this section is ok.

I have been experimenting lately with several new activation functions that have been published. I wanted to share some of that here and see if anyone feels that it could be an interesting addition to fastai. I should say that I’m not that familiar with pyTorch, so I really hope that the implementations I provide below are correct and accurate enough to be useful to some of you (I have only implemented them in Keras before). If anyone can share more efficient code please let me know!

The first activation function comes from Google (source: and is called swish. As it often happens, the idea behind it is incredibly simple!

#   Swish activation function, a special case of ARiA, 
#   for ARiA = f(x, 1, 0, 1, 1, b, 1)

    class swish(nn.Module):

        def __init__(self, b = 1.):
            super(swish, self).__init__()
            self.b = b

        def forward(self, x):
            sigmoid = F.sigmoid(x) ** self.b
            return x * sigmoid

That’s all it is, multiplying the sigmoid function with the raw activations!

The next two functions (ARiA and ARiA2) were published recently (source: and showed promising results, at least in the specific datasets and architectures listed in the paper.

#   ARiA activation function

class Aria(nn.Module):

    def __init__(self, A=0, K=1., B = 1., v=1., C=1., Q=1.):
        super(Aria, self).__init__()
        # ARiA parameters
        self.A = A # lower asymptote, values tested were A = -1, 0, 1
        self.k = k # upper asymptote, values tested were K = 1, 2
        self.B = B # exponential rate, values tested were B = [0.5, 2]
        self.v = v # v > 0 the direction of growth, values tested were v = [0.5, 2]
        self.C = C # constant set to 1
        self.Q = Q # related to initial value, values tested were Q = [0.5, 2]

    def forward(self, x):
        aria = self.A + (self.k - self.A) / ((self.C + self.Q * F.exp(-x) ** self.B) ** (1/self.v))
        return x * aria

ARiA2 is a special version of ARiA, which runs a bit faster since it has less parameters.

#   ARiA2 activation function, a special case of ARiA, 
#  for ARiA = f(x, 1, 0, 1, 1, b, 1/a)

class Aria2(nn.Module):

    def __init__(self, a=1.5, b = 2.):
        super(Aria2, self).__init__()
        self.a = a
        self.b = b

    def forward(self, x):
        aria2 = 1 + ((F.exp(-x) ** self.b) ** (-self.a)) 
        return x * aria2

Finally, I also tried to develop an activation function of my ‘own’. It is actually [inspired by] the Gombertz function (source: which is derived by the Gompertz–Makeham law of mortality (source:–Makeham_law_of_mortality) developed in the early 19th century. In the 1960s it was used for the first time to model tumor growth.

Note: I have only really tried it in one dataset (an RNN seq2seq model) so please use it at your own risk! It seemed to perform quite good and might be interesting to experiment with. Although, apparently multiplying by the raw activation (like in swish before) didn’t seem to work, at least in this problem / architecture.

class Gompertz(nn.Module):

    def __init__(self, a=1., b = 0.5, c=0.5):
        super(Gompertz, self).__init__()
        self.a = b
        self.b = b
        self.c = c

    def forward(self, x):
        gompertz = self.a * F.exp(-self.b * F.exp(-self.c * x))
        return gompertz

I hope all this was interesting for some of you. It would be nice to experiment with some of this stuff in FastAI. I haven’t really looked into how to implement these yet in the library (I know some modules in pyTorch might not support new functions?), but it looks as if it should be fairly straightforward.

Thanks for reading!

Kind regards,

(Jeremy Howard) #2

They should work just fine with fastai - thanks for sharing! Since there’s nothing fastai-specific about them, you may even want to try submitting a PR to Pytorch directly, so that everyone using Pytorch can use these.

(Theodoros Galanos) #3

Thanks Jeremy I will try. Not very comfortable with the pytorch specifics so I’ll try to see first if there are some best practices as to how they implement these.


(Narendra Patwardhan) #4

Hello Theodore,

Thank you for going through my paper. Like you I am a FastAI student as well. It is nice that you are trying out new activations. I found a couple of small errors in implementations you have used, so I will try to give my own implementation.

    def forward(self,x):
        return x*F.sigmoid(self.b * x)

     def forward(self,x):
         return x* ((1 + F.exp(-x)**self.b)**(-self.a))
 #Alternatively x*(F.sigmoid(self.b * x)**self.a)

I think one of the reasons Gompertz function would work well in RNNs is its similarity to tanh/sigmoid. Multiplying by a preactivation does not limit the curve to 1 which is desirable in RNNs. I can see it working well in a CNN or DNN architecture if you use it with preactivation. I hope you do try it out :slight_smile:

(Theodoros Galanos) #5

Thanks a lot for your input @narendra_patwardhan and for correcting my code.

Perhaps you would like to submit those to pytorch, I’m sure the community would appreciate it. As for the Gompertz yes that was the idea. It looked like a parametric version of those, much like the ARiA I guess. I’m really not sure it works of course since I haven’t tested but looked interesting. But I will do so once I find some free time.

Kind regards,