Fastai on Apple M1

Yeah, I am open to add more benchmarks if needed, and when Pytorch comes I will probably update the report.
My take is that the M1Pro (non Max) is a very descent GPU for light ML prototyping stuff, the max is just too expensive. Keep in mind that the M1 benchmarks are without any tune up like mixed precision, XLA, etc… just straight conda. Meanwhile, the Nvidia benchmarks are run in a highly optimised docker container from google.

Other take is that both apple cpus are way faster than colab and even Colab Pro.

Also, for 40 Watts you get 1/10th the performance of a 400 Watt RTX3090, so it’s pretty good.

2 Likes

Thanks for your reply, and thanks for doing that comparison!

I’m really amazed at this because I have scrounged together an old Xeon 8c/16t T3600 with 64G ram and a 1070ti and it is (barely) on par with Colab(with GPU enabled).

What this tells me is that I can replace that 2 ton machine with a mid tier M1-Pro or entry level M1Max MBP! :smiley: … Except for any fastai related stuff, we probably need to wait for pytorch to catch up.

And re: the power to perf ratio, seems to me it’s giving decent performance when looked at from that perspective (perf/watt) … I’m very interested in knowing how Apple’s desktop/server level CPUs will fare.

Pytorch released support for Macbook m1 gpu on 18 may. You can try it out now.

1 Like

Has anyone run fastai with the new pytorch nightly that supports M1 GPUs? I see this funny thing where if I pip-install it I can confirm it works (torch.backends.mps), but only if I uninstall fastai first. If I install fastai on top of it, maybe because of a torchvision requirement, pytorch loses the ability to check for M1 gpus.

I just submitted an issue to the github repository.

dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method=‘squish’)]
).dataloaders(path)

dls.show_batch(max_n=6)

1 Like

You can check some preliminary results @tcapelle has pubished today.

I ran that code on a Xeon+1070ti and have published my comparison in that thread as well.

PyTorch posted this. Maybe it would be useful here Introducing Accelerated PyTorch Training on Mac | PyTorch

I wanted to do some quick patching for proof of concept and seems that m1 is working with performance being around 2x what it was. The code should probably be rejigged to allow people to select MPS(m1) rather than CUDA if they want, but by default it can choose CUDA before MPS since generally speaking you’d get better performance on a CUDA card. Here is my quick and dirty patch to see the performance increase: Comparing fastai:master...pellet:master · fastai/fastai · GitHub

1 Like

Some more numbers by reddit user u/seraschka (is it the Sebastian Raschka?)

Thank you for this patch. Can you please specify how I can apply this patch? Do I need to uninstall fastai (which I’ve installed using conda) first?

Yeah uninstall fastai and pytorch.
Use the updated environment.yml to install the new conda environment and make sure it installs pytorch 1.13. After that you can build the patched fastai module by running pip install -e fastai

Thanks. I still get the ‘fake _loader’ error. Do I need to specify any new options related to mps in the dataloader?

I also had to add the highlighted line as line 29 to load.py.

image

But, now this is the problem. I guess we cannot use any of the existing model architectures until PyTorch implements them.
NotImplementedError: The operator 'aten::adaptive_max_pool2d.out' is not current implemented for the MPS device.

1 Like

Ok, I thought I had merged the fix from fast.ai master which set pin_memory_device to an empty string. I didn’t have any trouble running the first part of the first fastbook notebook but haven’t tested it much… I didn’t see that error pop up for me, might need to fallback to cpu for that operation.

1 Like

torch.backends.mps.is_available() is the equivalent to torch.cuda.is_available()

on this forum post @sgugger mentioned.

What is the equivalent when you have an Apple M1 processor?

learn = vision_learner(dls, resnet18, metrics=error_rate)
does not have device to be assigned

1 Like

Honestly have anyone run successfully a test with fastai on Apple M1?
Can you share a notebook?

I found this notebook but does not include any fastai
You can see the GPU (Activity Monitor → Window → GPU history)

Actually I fixed this error by chucking this in the top of the notebook i was using(before import torch)

#Crashes unless can fallback to cpu if an operation missing
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

Yeah I got it running with some MPS support, you can see in this screenshot it took 4:30 to fine tune the resnet34 model in the first notebook 01_intro.ipynb :


Here is a link to my notebook: deep-learning-for-coders/01_intro.ipynb at master · pellet/deep-learning-for-coders · GitHub

2 Likes

Hi Ben, what are the specs of the M1 machine on which you got these results?

But with os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1", fast would run all the built-in architectures (eg, resnet34) on CPU, so it’s as if MPS does not exist and the nightly-build of PyTorch is and the changes that you have made to fastai are not needed.

I think we just need to wait for the PyTorch team to implement some of the layers that are used in all of the built-in architectures.