TensorBoardCallback incompatible with distributed learning

Running in the distributed training regime, TensorBoardCallback seems to stall on this line (checked by monkey patching the method). Note that the callback is set up in before_fit to only run on one GPU, so perhaps self.learn.one_batch(0,b) tries to fire up all of them (two in my case)?