I’d like to share a tool I built to enable interactive Distributed Training of FastAI in Jupyter notebooks. It is a iPython/Jupyter notebook extension of line and cell magics, and uses
ipyparallel to manage the multiprocess PyTorch DistributedDataParallel (DDP) group.
The main objective of the tool is to let FastAi users to play with Distributed training in FastAi’s lesson notebooks with minimal changes, and without any change to the fastai code base. A few features (from the README):
- Switch execution easily between PyTorch’s multiprocess DDP group and local notebook namespace.
- Automatically empties cuda cache after executing a cell in DDP group, to reduce the likelihood of OOM errors in a long notebook session.
- Takes only 3 - 5 lines of iPython magics to port a Fastai
course v3notebook to run in DDP.
- Extensible architecture. Future support for
fastai v2could be implemented as a loadable module, like that for
Here is a summary of speedup observed in FastAI notebooks when trained with 3 GPUs.
The repository of the tool
Ddip (“Dee dip”), for Distributed Data “interactive” Parallel, is at:
Ddip is far from perfect, and a few fun puzzles are yet to be solved: some models don’t see speed-up, and I suspect some features may be better implemented using fastai’s callback architecture.
As multi-GPU machines become more common, I hope
Ddip can help more
fastai notebook users to speed up training. I have ported and uploaded most of course v3-dl1 notebooks to the repo, as usage examples. Please do not hesitate to ask any thing about this tool, I welcome and appreciate any feedback/questions/ideas to improve it.
Since I’m not as fast a learner as
fastai's Learner, by the time I get it to work alright with
fastai v2 is being rolled out already. Now
Ddip has to catch up to v2 — an exciting target.
I haven’t investigated
fastai v2's distributed training capability — can anyone shed some light on it?
Thank you FastAI team and users!