Lesson 5: Error on Manually Creating a Neural Network (PyTorch)

Hello, I’m trying to manually create a neural network using PyTorch. I generated a syntectic dataset, and created a 3 layers neural network, with the following:

# Generate Syntethic Data
X = torch.randint(1, 100, (32,1), dtype=torch.float32)
y = ((X.squeeze() * 2) + 30) + torch.rand(32,)
X.shape, y.shape

# Arch
layers = [
    torch.rand((1, 50), dtype=torch.float32), # latent layer
    torch.rand((50, 1), dtype=torch.float32), # output layer
]
layers = [torch.nn.Parameter(l) for l in layers]

# Loss Function (MSE)
def mse(y, y_hat): return ((y - y_hat)**2).mean()

t = 0
def update():
    x = X @ layers[0] # input -> latent
    x = F.relu(x) # latent -> activation_layer
    y_hat = x @ layers[1] # activation_layer -> output_layer
    
    # Calculate the loss
    loss = mse(y, y_hat)
    if t % 10 == 0: print("loss =", loss)
    
    # Backpropagation
    loss.backward()
    with torch.no_grad():
        for layer in reversed(layers):
            layer.sub_(1e-3 * layer.grad)
            layer.grad.zero_()

for t in range(300):
    update()

But my loss got stuck at a high number:

loss = tensor(572020.6250, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)
loss = tensor(23780.6582, grad_fn=<MeanBackward0>)

What I’m doing wrong?

Hi Bruno. I like that you are creating a model directly in PyTorch. It’s a great way to get an intuitive feeling for what actually goes on in the machine learning process.

Since no one responded, and I’m sick indoors on a rainy day, I took a look at your question. My observation about these forums, though, is that in general no one is going to take time to debug your code. Therefore it’s important to develop your own ways of investigating what is going wrong when it does not go as you expect. That way, you learn both how to debug a model and develop a sense of how models and learning actually work.

A primary debugging tool is matplotlib. You can plot y and y_hat on the same graph to see the prediction and target. (After making y_hat a global.)

plt.plot(y,marker='.',linestyle='None')
plt.plot(y_hat,marker='.',linestyle='None')

You will instantly see that y_hat converges to all zeros, and therefore the loss stays constantly high. Loss diverging or not converging usually indicates a learning rate issue. Indeed, changing the learning rate from 1e-3 to 1e-5 causes the model to converge to a lower number.

But still there is a problem. The graph now clearly shows the prediction below the target by a nearly consistent amount. Let’s see how much:

y-y_hat.flatten()

It’s about 33 or so units below. That observation suggests two things. First, the number is close to the offset from X that you used to create the target. Second and most important, no matter how hard the model tries, it follows the target, but can’t match it. This model is not capable of learning a simple affine function, because it can’t handle the offset! What could possibly be wrong? The answers might suggest to you how to alter the model so that it also has the capacity to learn an offset (bias).

I hope this is helpful, and not too “lectury”. Now I am off to debug time series models that have me totally stumped. Good luck with your learning!

1 Like