I was able to solve this issue. Basically my issues were misunderstanding how everything was implemented. I needed to do my backwards computation in the one_batch still, but only do my step and zero_grad steps up in _do_epoch. Here is the code that I believe is working now:
def _do_epoch_BGD(self):
self._do_epoch_train()
self._step()
self('after_step')
self.opt.zero_grad()
self._do_epoch_validate()
def _do_one_batch_BGD(self):
self.pred = self.model(*self.xb)
self('after_pred')
if len(self.yb): self.loss = self.loss_func(self.pred, *self.yb)
self('after_loss')
if not self.training or not len(self.yb): return
self('before_backward')
self._backward()
self('after_backward')
Here is what my result looks like now: