Lesson 4 Further Research Task 1 - Help

Hi,

I’ve been working in the lesson 4 further research section (at the end of the lesson) and I’ve hit a wall trying to implement the first task which is implement the Learner fastai class.

Now for this I tried to create everything by myself, which includes creating a model class for my model, an optimization function, … Only thing I’m using from fast ai is really the Dataloaders class.

Now, my problem seems to be that my gradients are always zero (or really close to zero) which means the weights never actualize, which in turn means that my model never improves. Been trying to debug the error all day but haven’t been able to.

Code below:

def init_params(size):
    return torch.rand(size).requires_grad_()

class Model:
    def __init__(self):
        self.w1 = init_params((28*28, 1))
        self.b1 = init_params(1)
        self.params = self.w1, self.b1
    
    def predict(self, x):
        predictions = x@self.w1 + self.b1
        return predictions

def loss_function(labels, preds):
    preds = preds.sigmoid()
    return torch.where(labels==1, 1-preds, preds).mean()

def calculate_gradient(loss, learning_rate, params):
    loss.backward()
    for weight in params:
        weight.data -= learning_rate * weight.grad.data
        weight.grad.zero_()
       
def accuracy(label, pred):
    pred = pred.sigmoid() > 0.5
    
    return (label == pred).float().mean()

class Tr_loop:
    def __init__(self, dls, model, opt_func, loss_func, metric):
        self.dls = dls
        self.model = model
        self.opt_func = opt_func
        self.loss_func = loss_func
        self.metric = metric
        
    def train(self, learning_rate):
        for x, y in self.dls[0]:
            preds = self.model.predict(x)
            loss = self.loss_func(y, preds)
            self.opt_func(loss, learning_rate, self.model.params)
            
    def validation(self):
        accs_list = [self.metric(y, self.model.predict(x)) for x, y in self.dls[1]]
        return torch.FloatTensor(accs_list).mean()
    
    def fit(self, epochs, learning_rate):
        for _ in range(epochs):
            self.train(learning_rate)
            metric = self.validation()
            print(metric, end=' ')

model = Model()
learner = Tr_loop(dls, model, calculate_gradient, loss_function, accuracy)

Where dls is a data loader object.

Current functions are kind of fixed in the sense that they only work for a two class classfication problem, in this case is the same as the lesson 4 which is classifying a number as either a 3 or a 7.

1 Like

Hi @delrosario I believe the issue is that you have used torch.rand which generates positive values only between 0 and 1. If you use torch.randn it will generate both positive and negative values and you should at least see the loss value decrease and accuracy increase during each epoch. I’ve created a colab here as an example:

1 Like

I can here to write this same thing and just saw your answer!

Yep, but now I’m so confused as to why using torch.rand is giving me such weird results of not optimizing the loss function. I tried with torch.zeros and the optimization was able to go through too.

All things aside, @vbakshi thank you so much for taking the time to go over my question and coming up with a solution.

1 Like

You’re welcome!

I’m relatively new to these concepts so maybe someone more experienced can clarify/simplify, but I think what happens is that if all the parameters are positive, the prediction is large enough that all predictions.sigmoid values are 1.0 (which is a 7 if you used the same dataset as the book/notebook example) for a large number of epochs , and so the loss function is “stuck” at 1.0.

Whereas even if some of the parameters are negative when you initialize them, it allows for a variety of prediction values, some positive and some negative such that predictions.sigmoid will yield some values closer to 0 and the loss function won’t stay stuck at 1.0 as it is when torch.rand is used, even when all the batches are of the same digit.

Which makes me think, maybe you could try and shuffle the images in the batches and see if that helps? I assume the overall problem of getting the loss function stuck at 1 would probably still exist.

You are definitely right, since all the values are positive the sigmoid is for all practical purposes equal to 1. Looking at the graph the gradient should be around zero.

And I think you are correct in your second point too, shuffling would have definitely helped in not getting the loss stucked at one for certain batches.