Training too slow

I’m using a tabular learner to fit a nnet with two outputs which then get passed to a custom loss function.
My training data has about 5M samples, and each learner.fit_one_cycle(1, 1e-2) epoch takes around 12 minutes to run.

Is it normal that it takes this long?


My model and custom loss:

  (0): Linear(in_features=21, out_features=10, bias=True)
  (1): ReLU(inplace)
  (2): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Linear(in_features=10, out_features=2, bias=True)
  (4): ReLU(inplace)

def custom_loss(input_, target):
    target = target.type(torch.FloatTensor) + 1e-3
    en = torch.pow(torch.log(target) - input_[:,0], 2)
    de = torch.mul(torch.pow(input_[:,1]+1e-3, 2), 2)
    nc = torch.div(1, target * (input_[:,1]+1e-3) * torch.sqrt(torch.mul(2, math.pi)))
    return torch.mean(-torch.log(torch.mul(nc, torch.exp(-torch.div(en, de)))+1e-3))

Unfamiliar with tabular applications, but 12 minutes doesn’t sound so bad to me, and 5M samples is nothing to sneeze at. What kind of a batch size are you using? Have you tried increasing it to speed up training?

As a quick sanity check, looking at DL1’s Tabular Notebook, Jeremy flies through 1000 samples in 3 seconds. Given you have 5M, 12 minutes doesn’t sound too awful to be honest. I’m not sure how well that performance can be extrapolated, but 12 minutes doesn’t sound bizarrely slow to me.

1 Like