The gradients in my model don't seem to be saving

ForBo7 · August 25, 2022, 7:15am

Hello!

I’m implementing my own basic version of a Learner class, in a similar vein to the one in the fastai library. It’s one of the challenges provided in the fastbook.

My class is shown below.

class Learner:
    """A simple attempt to model a learner. Works with PyTorch models."""

    def __init__(self, train_dataloader, valid_dataloader, model,
                 loss_function, metric_function, learning_rate=1.0):

        self._train_dataloader = train_dataloader
        self._valid_dataloader = valid_dataloader
        self._model = model
        self._loss_function = loss_function
        self._metric_function = metric_function
        self._learning_rate = learning_rate

        self._parameters = list(self._model.parameters())

        self._accuracy = None

    # Private methods.

    def _make_predictions(self, inputs):
        return self._model(inputs)

    def _calculate_loss(self, predictions, targets):
        return self._loss_function(predictions, targets)

    def _calculate_gradients(self, loss):
        loss.backward()

    def _update_parameters(self):
        for parameter in self._parameters:
            # print(parameter.grad.data)
            parameter.data -= parameter.grad.data * self._learning_rate
            # print(parameter.data)

        # Reset parameters.
        for parameter in self._parameters:
            parameter.grad = None

    def _calculate_accuracy(self):
        self._accuracy = [self._metric_function(inputs, targets) for inputs,
                          targets in self._valid_dataloader]
        self._accuracy = round(torch.stack(self._accuracy).mean().item(), 4)
        # print(self._accuracy)

    def _output_accuracy(self, epoch):
        print(f"Epoch: {epoch}; Accuracy: {self._accuracy*100:.2f}%")

    # Public methods.
    def train_model(self, epochs):
        for epoch in range(epochs):
            print(self._accuracy)
            for x_batch, y_batch in self._train_dataloader:
                predictions = self._make_predictions(x_batch)
                loss = self._calculate_loss(predictions, y_batch)
                self._calculate_gradients(loss)
                self._update_parameters()
            self._calculate_accuracy()
            self._output_accuracy(epoch)
            # print(self._parameters[0])

However, when testing this class with a very simple model, the gradients of my parameters remain at None despite the backward method being used and requires_grad_ being set to True.

I am using PyTorch’s Linear class with a single parameter of value 7 and no bias. The goal is to get the parameter to 4.

Below is the setUp method of my unittest class.

class TestLearner(unittest.TestCase):
    """Tests for Learner. """

    def setUp(self):
        """Create an instance of the Learner class."""

        # Create data.
        train_x = torch.tensor([[1], [2], [3], [4], [5]]).float()
        train_y = torch.tensor([4, 8, 12, 16, 20]).float()

        valid_x = torch.tensor([[6], [7], [8], [9], [10]]).float()
        valid_y = torch.tensor([24, 28, 32, 36, 40]).float()

        # Create datasets.
        train_dataset = list(zip(train_x, train_y))
        valid_dataset = list(zip(valid_x, valid_y))

        # Create dataloaders.
        self.train_dataloader = torch.utils.data.DataLoader(train_dataset,
                                                       batch_size=64)
        self.valid_dataloader = torch.utils.data.DataLoader(valid_dataset,
                                                       batch_size=64)

        # Create model.
        linear_model = torch.nn.Linear(1, 1, bias=False)
        linear_model.weight = torch.nn.parameter.Parameter(torch.tensor([
            7]).float())

        # Define loss function.
        def l1_norm(predictions, targets):
            return torch.nn.functional.l1_loss(predictions, targets)

        # Define metric function.
        def accuracy(inputs, targets):
            predictions = linear_model(inputs).sigmoid()
            correct_predictions = (predictions > 0.5) == targets
            return correct_predictions.float().mean()

        # Define learner.
        self.learner = Learner(
            self.train_dataloader,
            self.valid_dataloader,
            linear_model,
            l1_norm,
            accuracy,
        )

Gradients are calculated as shown below.

    def test__calculate_gradients(self):
        # TODO: Why is gradient 3?
        """Tests whether the gradients for each weight is correctly
        calculated."""
        # Make predictions.
        for x_batch, y_batch in self.train_dataloader:
            predictions = self.learner._make_predictions(x_batch)
            loss = self.learner._calculate_loss(predictions, y_batch)

        # Calculate gradients.
        loss.backward()
        for parameter in self.learner.get_parameters():
            print(parameter.grad.data)

In the above test case, the parameter’s gradient is 3. I update the parameters as shown below…

    def test__update_parameters(self):
        for parameter in self.learner.get_parameters():
            print(parameter.grad.data)
        self.learner._update_parameters()

…and get the following error; the parameter has a gradient of None.

...E
======================================================================
ERROR: test__update_parameters (__main__.TestLearner)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/[REDACTED]/PycharmProjects/simpleLearner/test_learner.py", line 96, in test__update_parameters
    print(parameter.grad.data)
AttributeError: 'NoneType' object has no attribute 'data'

----------------------------------------------------------------------
Ran 4 tests in 0.037s

FAILED (errors=1)

Does anyone have any idea why the gradient from my parameter isn’t being stored? I would really appreciate any suggestions!

BobMcDear · August 25, 2022, 2:21pm

Hello,

Your code appears fine to me; the only issue is that test__update_parameters itself does not calculate the loss nor the gradients, so it cannot be used as a stand-alone function. Concretely, another method, such as test__calculate_gradients, must be called prior to it to ensure the gradients have been populated. Alternatively, you could add the lines below, extracted from test__calculate_gradients, to the beginning of test__update_parameters.

for x_batch, y_batch in self.train_dataloader:
    predictions = self.learner._make_predictions(x_batch)
    loss = self.learner._calculate_loss(predictions, y_batch)

loss.backward()

Is that helpful?

ForBo7 · August 26, 2022, 6:25am

Thank you for the response!

That is a solution I could use.

However, since the unit tests are run sequentially, shouldn’t the same gradients be able to be accessed in the later unit test? test__calculate_gradients() is defined right before test__update_parameters().

Also, the parameters in both those unit tests can be accessed via the learner.get_parameters() method, which returns the same self._parameters attribute which exists in the learner object. This means the exact same parameter is being accessed in both unit tests, unless I’m missing something.

BobMcDear · August 26, 2022, 1:50pm

Hello,

Python’s unittest module creates a separate instance of the TestLearner class for every test. In other words, there is an instance of TestLearner for test__calculate_gradients, another instance for test__update_parameters, etc., and the tests are run individually. This is good practice because one test should not rely on another, and they should all run independently. Therefore, the logic of each test should be self-contained within its definition, e.g., in test__update_parameters, the gradients should be calculated regardless of test__calculate_gradients.

Not quite - as mentioned above, the tests are conducted in distinct instances, so the parameters of the model are re-created for every test, and accessing the model’s parameters from each test would be pointing to a different object. Consequently, if the parameters are altered in one method, for instance, that would have no effects whatsoever on the other tests.

Does that make sense?

ForBo7 · August 27, 2022, 8:02am

Oooh, I see. Yeah, that makes sense; unittests creates individual instances for every test.

Thank you for the explanation.