Hi there, Im collaborating with @butchland in fastai_xla_extensions which originated from the invitation Global pytorch hackatoh and we get to know each other from the SF Study group. After a lot of trial and error about how to do the optimizer step we have found that doing an optimizer that just do the required step makes it work on TPU. if you like the next week we can (g)meet, even now we can use some like the discord channel if needed and allowed if you did like to change “notes” on the different approaches.
The other things, are mostly just things that need to be done. But as jeremy once said some like “it should be easy”.
Currently it works on single TPU, but we have found some “problems” or slow parts that is run on TPU, so if anybody out there reading knows about TPUs and can link some optimization documents, or how to track specific performance issues on TPU, it would be great. So later we can start with distributed trainning.
We havent yet asked people making fastai2 for help, but hopefully now that we have more attention and jeremy is back we can start to ask a lot of things.