Yeah, I am open to add more benchmarks if needed, and when Pytorch comes I will probably update the report.
My take is that the M1Pro (non Max) is a very descent GPU for light ML prototyping stuff, the max is just too expensive. Keep in mind that the M1 benchmarks are without any tune up like mixed precision, XLA, etc… just straight conda. Meanwhile, the Nvidia benchmarks are run in a highly optimised docker container from google.
Other take is that both apple cpus are way faster than colab and even Colab Pro.
Also, for 40 Watts you get 1/10th the performance of a 400 Watt RTX3090, so it’s pretty good.
Thanks for your reply, and thanks for doing that comparison!
I’m really amazed at this because I have scrounged together an old Xeon 8c/16t T3600 with 64G ram and a 1070ti and it is (barely) on par with Colab(with GPU enabled).
What this tells me is that I can replace that 2 ton machine with a mid tier M1-Pro or entry level M1Max MBP! … Except for any fastai related stuff, we probably need to wait for pytorch to catch up.
And re: the power to perf ratio, seems to me it’s giving decent performance when looked at from that perspective (perf/watt) … I’m very interested in knowing how Apple’s desktop/server level CPUs will fare.
Has anyone run fastai with the new pytorch nightly that supports M1 GPUs? I see this funny thing where if I pip-install it I can confirm it works (torch.backends.mps), but only if I uninstall fastai first. If I install fastai on top of it, maybe because of a torchvision requirement, pytorch loses the ability to check for M1 gpus.
I wanted to do some quick patching for proof of concept and seems that m1 is working with performance being around 2x what it was. The code should probably be rejigged to allow people to select MPS(m1) rather than CUDA if they want, but by default it can choose CUDA before MPS since generally speaking you’d get better performance on a CUDA card. Here is my quick and dirty patch to see the performance increase: Comparing fastai:master...pellet:master · fastai/fastai · GitHub
Yeah uninstall fastai and pytorch.
Use the updated environment.yml to install the new conda environment and make sure it installs pytorch 1.13. After that you can build the patched fastai module by running pip install -e fastai
I also had to add the highlighted line as line 29 to load.py.
But, now this is the problem. I guess we cannot use any of the existing model architectures until PyTorch implements them. NotImplementedError: The operator 'aten::adaptive_max_pool2d.out' is not current implemented for the MPS device.
Ok, I thought I had merged the fix from fast.ai master which set pin_memory_device to an empty string. I didn’t have any trouble running the first part of the first fastbook notebook but haven’t tested it much… I didn’t see that error pop up for me, might need to fallback to cpu for that operation.
Yeah I got it running with some MPS support, you can see in this screenshot it took 4:30 to fine tune the resnet34 model in the first notebook 01_intro.ipynb :
But with os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1", fast would run all the built-in architectures (eg, resnet34) on CPU, so it’s as if MPS does not exist and the nightly-build of PyTorch is and the changes that you have made to fastai are not needed.
I think we just need to wait for the PyTorch team to implement some of the layers that are used in all of the built-in architectures.