Chapter 4 - Further Research (Building a Learner from Scratch)

sahilk1610 · September 8, 2020, 3:00pm

I was trying to build the learner from scratch, but there seems to be something weird happening. Below is the Code for my Leaner and I have also included the other necessary information.

class Learner_try:
def __init__(self, dl, model, opt):
    self.dl_train = dl[0]
    self.dl_valid = dl[1]
    self.model = model
    self.opt = opt(self.model.parameters(), lr = 0.1)
    

def mnist_loss(self, preds, targets):
    preds = preds.sigmoid()
    return torch.where(targets==1, 1 - preds, preds).mean()

def batch_accuracy(self, x, y):
    preds = x.sigmoid()
    correct = (preds>0.5) == y
    return correct.float().mean()

def validate_epoch(self):
    accs = [self.batch_accuracy(self.model(x), y) for x,y in self.dl_valid]
    return round(torch.stack(accs).mean().item(), 4)

def cal_grad(self, x, y):
    preds = self.model(x)
    loss = self.mnist_loss(preds, y)
    loss.backward()

def train_epoch(self):
    for x, y in self.dl_train:
        self.cal_grad(x, y)
        self.opt.step()
        #self.opt.zero_grad()      #This is the step which is acting wierd
        
def fit(self, epochs):
    for i in range(epochs):
        self.train_epoch()
        print(self.validate_epoch(), end = " ")
    
simple_net = nn.Sequential(nn.Linear(28 * 28, 30),
                      nn.ReLU(),
                      nn.Linear(30, 1),
                      nn.Sigmoid())

opt = SGD

learn = Learner_try(dls, simple_net, opt = opt)

learn.fit(20)

I have used the SGD directly as my optimizer, when I fit the learner without the “self.opt.zero_grad” step in the “train_epoch” method it works fine(getting me a score above 0.96 - 0.97- which it should actually do) but when I run it with the “self.opt.zero_grad” step it kind of sticks at one point getting a value eg:0.4957 for “n” number of epochs.

I might have done something wrong or missed something, First I thought that SGD must be handling step() and zero_grad() together but when I checked the source code SGD when it returns to Optimizer - Optimizer does have the zero_grad() method.

Any help here would be appreciated.

alex.larrimore · October 2, 2020, 7:45pm

I just went through this assignment and your code definitely helped me in a few places. Hopefully you got yours working as well? I used the basic optimizer covered in Lesson 4 instead of SGD and mine gave me no problems. Got my results up over .98. Here’s my init function if it helps, most of the rest of my class is very similar to what you have -
def init(self, dls, model):
self.model = model
self.opt = BasicOptim(self.model.parameters(), 0.1)

tharinda · October 3, 2020, 2:26pm

Hi need your assitance, in lesson for the when running the BasicOptimisor under the training epoch function i keep getting the error ‘NoneType’ object has no attribute ‘data’, problem occurs at .step(). Further the even when I run the SGD class, the optmization gets stuck in one place.

Would you know why this happens and how to correct it? Thank you

SaintAardvark · November 3, 2020, 12:29am

Hi @sahilk1610 – thank you very much for posting your code! I made my own effort at doing this exercise, and frankly it was a bit of a mess…I’m grateful for the chance to learn from your efforts.

Your question about what happens when zero_grad() is toggled interested me, and I dug into it a bit. I also noticed that you have two instances of sigmoid() in your code – one in the model (where it’s the final layer), and one in the Learner class (where it’s applied to the predictions when calculating loss). I wanted to dig into that too.

In the end, I came up with a notebook that re-implemented your code and went through those cases:

Sigmoid: used in the model, used in the Learner class, used in both, or not used at all
zero_grad(): used, or not used

You can find the whole notebook in GitHub, but the main conclusions are:

The best performance came from using the sigmoid function just in the model, with zero-gradient. It was closely followed by using the sigmoid function just in the Learner class (as you have up above), with zero-gradient. However, note that having the sigmoid in the Learner class results in performance starting quite poor, and only getting as good as sigmoid-in-the-model after ~ 10-20 epochs:

The sigmoid function appears to make the model work pretty well even without using zero-gradient.
However, using the sigmoid function twice (so that the predictions become sigmoid(sigmoid(x))) do poorly. Looking at a graph of that function, this makes sense: that function, in the range 0,1, is confined to (roughly) 0.5 to 0.6. In other words, it predicts – but not confidently – that each character is a 3.
Leaving out the sigmoid function entirely results in models that don’t do better than chance.

Thanks again for your post!

amiyo · November 29, 2020, 5:16am

Thanks, @sahilk1610 for the original problem and @SaintAardvark for the github repo.

I’m running into the same issue. In my case i’m using a linear regressor model nn.Linear. It does have a sigmoid function within it per se. I do have a sigmoid function in my version of the learner class.

With zero grad - it starts with poorish accuracy and gets a little worse. WIthout zero grad it goes from bad to v. good. I’m unable to explain this , if you have any insights do let me know. My code below:

class akc_Learn:
def init(self,dls,model,lr):
self.dls = dls
self.train_dl = dls[0]
self.valid_dl = dls[1]
self.model = model
self.lr = lr

def show_mod(self):
    print(self.model)

def train_model1(self,epochs):
    for i in range(epochs):
        self.train_epoch1()
        print(validate_epoch(self.model), end=' ')
         

def validate_epoch1(self):
    accs = [self.batch_accuracy1(self.model(xb), yb) for xb,yb in self.valid_dl]
    return round(torch.stack(accs).mean().item(), 6)
 
def train_epoch1(self):
    for xb,yb in self.train_dl:
        self.calc_grad1(xb, yb)
        opt = SGD(self.model.parameters(), self.lr)
        opt.step()
        #opt.zero_grad()
        
def calc_grad1(self,xb,yb):
    preds = self.model(xb)
    loss = self.mnist_loss1(preds, yb)
    loss.backward()

def batch_accuracy1(x, y):
    preds = x.sigmoid()
    correct = (preds>0.5) == y
    return correct.float().mean()

def mnist_loss1(self,predictions, targets):
    return torch.where(targets==1, 1-predictions, predictions).mean()

D_SM · March 23, 2021, 3:21pm

Yes this is correct, i also used the Basic optimizer and it worked for me. and the other part was nearly same.

I suggest to try this Basic Optim class.