Fastai v2 TPU support

tyoc213 · July 24, 2020, 9:49pm

Hi there, Im collaborating with @butchland in fastai_xla_extensions which originated from the invitation Global pytorch hackatoh and we get to know each other from the SF Study group. After a lot of trial and error about how to do the optimizer step we have found that doing an optimizer that just do the required step makes it work on TPU. if you like the next week we can (g)meet, even now we can use some like the discord channel if needed and allowed if you did like to change “notes” on the different approaches.

github.com

butchland/fastai_xla_extensions/blob/master/fastai_xla_extensions/core.py#L32


    def fake_device():
        gpu_available = torch.cuda.is_available()
        return torch.device(torch.cuda.current_device()) if gpu_available else torch.device('cpu')
    xm = SimpleNamespace(
        optimizer_step = fake_opt_step,
        xla_device = fake_device
    )




# Cell
class XLAOptimProxy:
    "Proxy optimizer to override `opt.step` with Pytorch XLA sync method `xm.optimizer_step` "
    def __init__(self,opt):
        self.opt = opt


    def xla_step(self):
        xm.optimizer_step(self.opt,barrier=True) # sync on gradient update


    def __getattr__(self,name):
        if name == 'step': # override proxying for step method
                return getattr(self,'xla_step')

The other things, are mostly just things that need to be done. But as jeremy once said some like “it should be easy”.

Currently it works on single TPU, but we have found some “problems” or slow parts that is run on TPU, so if anybody out there reading knows about TPUs and can link some optimization documents, or how to track specific performance issues on TPU, it would be great. So later we can start with distributed trainning.

We havent yet asked people making fastai2 for help, but hopefully now that we have more attention and jeremy is back we can start to ask a lot of things.