Reproducing "How to train your ResNet" using fastai

The original default to num_workers=0 so fair.

OK, so this means we are probably not going to find the bottleneck in the data loading. Good to know, though!

Batch size = 128.

Cifar10-fast (didnt look into workers)

First iteration: 5-30ms
Second iteration: same order of magnitude

Fast.ai:

  1. Workers = 0
    First iteration: 150ms, up to 1.6s
    Second iteration: xxx ms

  2. Default workers (16?)
    First iteration: 3s
    Second: 2ms

Ranges are a bit guesstimated.

Did you run this with cifar10-fast as well?

This should be comparable.

I don’t believe this is comparable, you would need to do it = iter(train_batches) and then run next(it) just like you did with fastai’s databunch.

Sorry, you probably meant running only next timing on cifar10-fast. Here it is:

epochs=24
lr_schedule = PiecewiseLinear([0, 5, epochs], [0, 0.4, 0])
batch_size = 128
transforms = [Crop(32, 32), FlipLR(), Cutout(8, 8)]
N_runs = 5
train_batches = Batches(Transform(train_set, transforms), batch_size, shuffle=True, set_random_choices=True, drop_last=True)
test_batches = Batches(test_set, batch_size, shuffle=False, drop_last=False)
it = iter(train_batches)
%time next(it)

outputs

CPU times: user 7.95 ms, sys: 961 µs, total: 8.91 ms
Wall time: 8.71 ms

and running %time next(it) again:

CPU times: user 3 ms, sys: 4.73 ms, total: 7.73 ms
Wall time: 9.21 ms
1 Like

So it seems like cifar10-fast is actually much slower on the data iterator speeds. All the more interesting why it is faster than fastai v1 overall.

I’d say this is a bit inconclusive. Fastai’s first “next” was 88ms, which is much slower, but the second one was much faster.

I think we should compare a full loop of batches.

I wrote this ugly code :grimacing:

%%time

noerror = True
while noerror:
try:
next(it)
except:
noerror = False

cifar10: 4.44s
fastai (workers =0): 1min48s
fastai (workers = default): 11s

I wonder if your fastai epochs are realllly slow when using workers=0.

But I am not setting workers in any of my code.

Just fyi: I am going to take a few steps back and learn some general performance profiling skills and then come back to this.

1 Like

I believe fast.ai default to defaults.cpus if you don’t specify num_workers. 16 on my machine.

@gkk I just saw your post about speeding up dataloaders and was wondering if you could have a look at the above to get your impression whether the performance slow down in fastai compared to the mytrle.ai PyTorch model could be related to dataloader speed. Thanks Greg!

Have you looked at GPU utilization? Is it low? See my comment here:

If you see GPU utilization being low and CPU utilization being high, it’s easier to believe the training is CPU-bound.

Just came across an interesting post (and possibly thread) on the PyTorch forums while looking at another issue. Looking at small file performance so may not all apply but of note:

Myrtle used pin_memory=true, so not an explanation for comparison, but in terms of general performance things (and given above linking to categorical, I think this would likely especially help there, pin_memory=True .I think means it has to copy from the original memory to the pinned memory area, so may especially affect lots of small items).