How to create a callback using torch.multiprocessing (TPU)

Yeah, think this should be OK generally. Tensorflow 2.0 has an Eager mode mirroring the PyTorch way (though not sure if this is used for TPU or just GPU). This isn’t actually as different from PyTorch as it appears because in PyTorch while it’s eager, the results still aren’t immediate for GPU, they get started immediately but are basically just added to a queue (in the GPU drivers/runtime). So this is more an issue the torch_xla stuff has to deal with.

I’d think the main thing is potential performance issues because it takes longer to get results for TPU than GPU. So the pause of calling .cpu() on a GPU tensor may be less than on a TPU tensor and the access patterns in fastai may be particularly non-optimal for TPU. In fact I have a bit of a hunch the pretty big performance gap between fastai and PyTorch training is due to fastai accessing results too quickly. In particular after calculating batch losses fastai immediately accesses the item for smoothing and callbacks making everything block until the forward is complete (and probably similar with gradients). I plan to have a look at this though hard to know what can be done (and almost certainly at best a v2 thing, if not v3 or some).

1 Like