After reading this https://towardsdatascience.com/sharded-a-new-technique-to-double-the-size-of-pytorch-models-3af057466dba it would be interesting to port Fairscale to do multigpu sharding to fastai.
What have been your multi-gpu experiences w/ fastai?