Hi everyone,
I have to start by saying I was not sure where to post this, hope this section is ok.
I have been experimenting lately with several new activation functions that have been published. I wanted to share some of that here and see if anyone feels that it could be an interesting addition to fastai. I should say that I’m not that familiar with pyTorch, so I really hope that the implementations I provide below are correct and accurate enough to be useful to some of you (I have only implemented them in Keras before). If anyone can share more efficient code please let me know!
The first activation function comes from Google (source: https://arxiv.org/abs/1710.05941) and is called swish. As it often happens, the idea behind it is incredibly simple!
# Swish activation function, a special case of ARiA,
# for ARiA = f(x, 1, 0, 1, 1, b, 1)
class swish(nn.Module):
def __init__(self, b = 1.):
super(swish, self).__init__()
self.b = b
def forward(self, x):
sigmoid = F.sigmoid(x) ** self.b
return x * sigmoid
That’s all it is, multiplying the sigmoid function with the raw activations!
The next two functions (ARiA and ARiA2) were published recently (source: https://arxiv.org/abs/1805.08878) and showed promising results, at least in the specific datasets and architectures listed in the paper.
# ARiA activation function
class Aria(nn.Module):
def __init__(self, A=0, K=1., B = 1., v=1., C=1., Q=1.):
super(Aria, self).__init__()
# ARiA parameters
self.A = A # lower asymptote, values tested were A = -1, 0, 1
self.k = k # upper asymptote, values tested were K = 1, 2
self.B = B # exponential rate, values tested were B = [0.5, 2]
self.v = v # v > 0 the direction of growth, values tested were v = [0.5, 2]
self.C = C # constant set to 1
self.Q = Q # related to initial value, values tested were Q = [0.5, 2]
def forward(self, x):
aria = self.A + (self.k - self.A) / ((self.C + self.Q * F.exp(-x) ** self.B) ** (1/self.v))
return x * aria
ARiA2 is a special version of ARiA, which runs a bit faster since it has less parameters.
# ARiA2 activation function, a special case of ARiA,
# for ARiA = f(x, 1, 0, 1, 1, b, 1/a)
class Aria2(nn.Module):
def __init__(self, a=1.5, b = 2.):
super(Aria2, self).__init__()
self.a = a
self.b = b
def forward(self, x):
aria2 = 1 + ((F.exp(-x) ** self.b) ** (-self.a))
return x * aria2
Finally, I also tried to develop an activation function of my ‘own’. It is actually [inspired by] the Gombertz function (source: https://en.wikipedia.org/wiki/Gompertz_function) which is derived by the Gompertz–Makeham law of mortality (source: https://en.wikipedia.org/wiki/Gompertz–Makeham_law_of_mortality) developed in the early 19th century. In the 1960s it was used for the first time to model tumor growth.
Note: I have only really tried it in one dataset (an RNN seq2seq model) so please use it at your own risk! It seemed to perform quite good and might be interesting to experiment with. Although, apparently multiplying by the raw activation (like in swish before) didn’t seem to work, at least in this problem / architecture.
class Gompertz(nn.Module):
def __init__(self, a=1., b = 0.5, c=0.5):
super(Gompertz, self).__init__()
self.a = b
self.b = b
self.c = c
def forward(self, x):
gompertz = self.a * F.exp(-self.b * F.exp(-self.c * x))
return gompertz
I hope all this was interesting for some of you. It would be nice to experiment with some of this stuff in FastAI. I haven’t really looked into how to implement these yet in the library (I know some modules in pyTorch might not support new functions?), but it looks as if it should be fairly straightforward.
Thanks for reading!
Kind regards,
Theodore.