Running in the distributed training regime, TensorBoardCallback
seems to stall on this line (checked by monkey patching the method). Note that the callback is set up in before_fit
to only run on one GPU, so perhaps self.learn.one_batch(0,b)
tries to fire up all of them (two in my case)?