Learning GeneralRelu Params: Here is LearnedRelu

whamp · April 25, 2019, 3:40pm

So going back to lesson 10 and studying the concept of batchnorm and learnable parameters gave me an idea. Thanks to the more general and abstract representation of GeneralRelu, it becomes clear that we should just learn the parameters of GeneralRelu since this will let them vary throughout the phases of training and also to vary by layer. On my own i have no intuition for how leaky or how much shift should be added to the relu on the first layer versus the relu in the middle versus a relu at the end, but the NN probably knows. So why don’t we just start with some sensible general parameters and let the model adjust as needed?

That’s what i did here:

Disclaimer This code didn’t work as anticipated. The new relu parameters weren’t training becuase i had to call .item() on them to get the associatged functions to work.

class LearnedRelu(nn.Module):
    def __init__(self, leak=0.1, sub=0.25, maxv=100):
        super().__init__()
        self.leak = nn.Parameter(torch.ones(1)*leak)
        self.sub  = nn.Parameter(torch.zeros(1)+sub)
        self.maxv = nn.Parameter(torch.ones(1)*maxv)

    def forward(self, x): 
        x = F.leaky_relu(x,self.leak.item())
        x.sub_(self.sub)
        x.clamp_max_(self.maxv.item()) 
        return x

In my initial experiments it’s worked great so far. The telemetry is much smoother and the model achieves higher val accuracy more quickly on mnist than with predefined relu.

The mnist task seems too easy though so I’m going to apply this to Imagenette next and then to image woof to see if we can start to see the impact more clearly.

For now here is a gist of my work as applied to the 07_batchnorm notebook:

gist.github.com

https://gist.github.com/Whamp/7bcb0c6b2b3db5125a04bf354f68649d

07_batchnorm-LearnReLu.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",

This file has been truncated. show original

gistfile1.txt

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",

This file has been truncated. show original

I think a logical next step is to convert this to a running average of the learned parameters. But for now i’m focusing on this version. Another idea is to freeze the learning of the RELU parameters after, for example, half the epochs have been completed, so that the network has a chance to find good relu parameters but then also has the benefit of stability after “good” relu parameters have been learned.

whamp · April 25, 2019, 9:10pm

So just as an update, I’ve noticed, not that surprisingly, that sometimes during longer training i get an error that i think is caused by a learned relu parameter “blowing out”. My solution right now is to add some boundaries in terms of acceptable ranges but that’s sort of a hacky solution. It’s looking more like the better way to handle this will be to create an exponentially weighted moving average of the learned RELU parameters so that one bad batch doesn’t cause things to go off the rails.

In terms of performance, i’ve been focusing on imagenette size 128 with 5 and 20 epochs. So far i’ve seen results 2% better than the current leaderboard using Xresnet18, 34 and 50 (remarkably architecture agnostic performance) with the normal relu replaced with the learned relu. The problem right now seems to be stability hence the improvements mentioned above.

whamp · April 26, 2019, 1:09am

So here is the new version of LearnedReLU i’ve been using to train imagewoof:

class LearnedRelu(nn.Module):
    def __init__(self, leak=0.05, sub=0.25, maxv=10):
        super().__init__()
        self.leak = nn.Parameter(torch.ones(1)*leak)
        self.sub  = nn.Parameter(torch.zeros(1)+sub)
        self.maxv = nn.Parameter(torch.ones(1)*maxv)
   
    def forward(self, x):
        if self.training:
            with torch.no_grad():
                self.leak.clamp_(0,.5) 
                self.sub.clamp_(0,1) 
                self.maxv.clamp_(5,100) 
        x = F.leaky_relu(x,self.leak.item())
        x.sub_(self.sub)
        x.clamp_max_(self.maxv.item()) 
        return x

So far the results top the leaderboard for 5 epochs and 20 epochs for image size 128 for both Xresnet18 and Xresnet34. The model seems to struggle when i bump the epochs up to 80 and i’m not sure why which means it’s probably time to incorporate some telemetry to see what’s going on. I also think i should build a callback to stop updating the relu parameters after a certain amount of training.

Here is the gist with the imagewoof results:

gist.github.com

https://gist.github.com/Whamp/aa68d774450bffe8ef269133942014fe

imagewoof-train-LearnedRelu.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",

This file has been truncated. show original

whamp · April 26, 2019, 2:47am

One final batch of results.
by the way these are all done on 1 1080ti

Xresnet18_LR:
5 epochs: 56.4 vs 60.2
20 epochs: 81.4 (best epoch 82.4) vs 82.4

Xresnet34_LR:
5 epochs: 63 vs 60.2
20 epochs: 82.6 vs 82.4

Xresnet50_LR:
5 epochs: 66.2 vs 60.2
20 epochs: 85.2 vs 82.4

gist here:

gist.github.com

https://gist.github.com/Whamp/c2e2836666b86027740a6003ed1a844f

imagewoof-train-LearnedRelu-size192.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",

This file has been truncated. show original

whamp · April 26, 2019, 6:12pm

It seems like the notebooks may not be rendering properly in the gists for reasons i can’t seem to figure out. Here is a picture of results using Xresnet34_LR for 5 epochs on imagewoof 192. Current leaderboard is 60.2 for size 192 and it seems like it uses resnet50 if i’m reading the script right. LearnedReLU modification gets to 68.2% in 5 epochs:

whamp · April 29, 2019, 3:38pm

Current version of LearnedReLU:

Note doesn’t accept a max value threshold because clamp doesn’t allow parameters as input and I haven’t figured out a work around yet:

class LearnedRelu(nn.Module):
    def __init__(self, nf, leak=0.05, sub=0.25, maxv=10):
        super().__init__()
        self.leak = nn.Parameter(torch.ones(nf,1,1)*leak)
        self.sub  = nn.Parameter(torch.zeros(nf,1,1)+sub)
        self.maxv = nn.Parameter(torch.ones(nf,1,1)*maxv)
   
    def forward(self, x):
        x = x.clamp_min(0)+self.leak*x.clamp_max(0) #Had to re write leaky relu because F.leakyrelu wouldn't accept paramters as input
        x = x.sub(self.sub) # sub seems to have no problem accepting parameter as input
    #    x = x.maxv = x.clamp_max(self.maxv)  #clamp_max doesn't allow a parameter as input
        return x

And here is how I tracked parameters values:

class SequentialModel(nn.Module):
    def __init__(self, *layers):
        super().__init__()
        self.layers = nn.ModuleList(layers)
        self.leaks, self.subs = [[] for _ in layers], [[] for _ in layers]
        
    def __call__(self, x):
        for i,l in enumerate(self.layers):
            x = l(x)
            if hasattr(l,'__getitem__'):
                if hasattr(l[1],'leak'):
                    self.leaks[i].append(l[1].leak)
                    self.subs [i].append(l[1].sub)
        return x
    
    def __iter__(self): return iter(self.layers)