fastai v2 TPU support development thread
This is a thread documenting my efforts adding TPU support to fastai v2. This GitHub repository will be updated with the necessary code.
History
Sometime in October, I had discovered the existence of PyTorch XLA (even before the public announcement at PyTorch DevCon 2019). Since then, I had been working on trying to add fastai v1 TPU support. See here for original discussion. Originally, I had decided to work on fastai v1 first and then move to fastai v2. I documented my efforts working on fastai v1 over here. While I successfully developed code for single-core and multi-core TPU training with fastai v1, it was much slower than expected and not more efficient than a multi-GPU setup. I obtained a lot of help from @TomB, @sgugger, and people from the PyTorch XLA team.
After a while, I got busy with classes and research. At this point I had decided to switch to fastai v2, since it was becoming much more popular and since everybody was likely going to migrate over anyway. Thankfully, much of the code was transferrable. However, I ran into some issues due to some changes in the PyTorch XLA API and changes between fastai v1 and fastai v2. If I remember correctly, the next thing I had to do is create a new type of DataLoader (similar to DistributedDL
) that is compatible with PyTorch XLA. The last time I was able to work on this was in April, since I was busy with classes, research, and more.
I had some discussions with @TomB, which unfortunately we kept private since we werenāt sure about the interest of the community in such discussion and since Jeremy and Sylvain were busy with other work. But now, the community has showed much more interest (ex: some discussion here and recent discussion in Discord channel), I figured I will keep the discussion open again and document my efforts, as well as get help from the community and maybe discuss the best route (ex: a complicated callback vs. a different training loop) to include TPU support in fastai v2.
I look forward to working with the fast.ai community in adding TPU support to fastai v2, in order to make it one of the very few deep learning libraries with such capabilities!
NOTE: I will add later today or tomorrow details about the kinds of tasks that are needed and what are the next steps.