Part 1, online study group

Lesson 5

I shared the below image in our previous meetup. This is an updated version along with annotated code and notes.

Building a minimal Neural Network (Logistic Regression with no hidden layer) from scratch

Let’s walk through step by step and also refer how we code each blocks from the below image


Source: Natural Language Processing with PyTorch by Delip Rao et al.

  • Predictions: y_hat = model(x) , here we are using own model.
  • Loss function: loss_func(y_hat, y). In addition to that we are also adding it with w2*wd
  • Gradients: parameter.sub_(learning_rate * gradient), performing an inplace subtraction on parameters with product(learning_rate, gradient). But since our model has multiple parameters (weights, biases), we are looping through them using PyTorch parameters.
  • Extras:
    • Weight Decay:
      • a) w2: using each parameter, we are calculating the sum of squared weights, w2 , for p in model.parameters(): w2 += (p**2).sum()
      • b) wd: a constant (1e-5)
      • multiply w2 and wd & add to regular loss_func
  • Combined
    • We are going to calculate the loss for each minibatch by calling update(x,y,lr) on them. losses = [update(x,y,lr) for x,y in data.train_dl]
    • .item() turns into a python number in order to plot & see them visually.
def update(x, y, learning_rate):
  wd = 1e-5
  #prediction
  y_hat = model(x)
  w2 = 0.
  #sum of squared weights
  for p in model.parameters():
    w2 = w2 + (p**2).sum()
  # regular loss
  loss = loss_func(y_hat, y) + w2*wd
  # updates the gradients in the model ie parameters
  loss.backward()
  # instruct pytorch not to record these actions for the next gradient calculation
  with torch.no_grad():
    for p in model.parameters():
      #gradients
      p.sub_(learning_rate * p.grad)
      p.grad.zero_()
  return loss.item()

Resources

Feedback is welcome.

5 Likes