Fastai on Apple M1

Hi Everyone,

Has anyone successfully used fastai on an Apple M1 chip? Can you please share your experiences?

Sincerely,
Vishal

1 Like

I’m using components of it to build out dataloaders, etc, but haven’t actually tried training models with it or running inference with it. So I can only say that, from a barebones standpoint: fastai installs fine via pip, and its components work.

Ok, interesting. Can you please tell me if you are using Air or Pro? And, if possible, would you please train a simple fastai model and see if it works.

Air. Training models works. (It’s using the CPU.)

Hi vahuja4 hope all is well!

Below are a few links talking about the Apple Silicon M1.



mrfabulous1 :smiley: :smiley:

1 Like

@jamesp, @gunturhakim - thank you for your reply! Good to know that we can train fastai models on an M1 without any issues. Could you also comment on the training speed? Is it very slow as compared to google colab? Thank you!

It doesn’t use the GPU. So you can design models locally, but I wouldn’t try to train them locally because of how slow it would be.

@ducha-aiki is playing with the M1 macbook and pytorch, he has made some progress!

The M1 GPU does not work with CUDA code. Therefore your training code for fastai / pytorch will be done on the CPU. If you want to use the GPU you’ll have to use one of Apple’s tools, like CoreML or CreateML.

The following is run from the M1 I’m typing on now:

[ins] In [1]: import torch
[ins] In [2]: torch.cuda.is_available()
Out[2]: False
[ins] In [5]: torch.cuda.get_device_name()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-7c4a4a7ea8b8> in <module>
----> 1 torch.cuda.get_device_name()

~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in get_device_name(device)
    274         str: the name of the device
    275     """
--> 276     return get_device_properties(device).name
    277
    278

~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in get_device_properties(device)
    304         _CudaDeviceProperties: the properties of the device
    305     """
--> 306     _lazy_init()  # will define _get_device_properties
    307     device = _get_device_index(device, optional=True)
    308     if device < 0 or device >= device_count():

~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
    162                 "multiprocessing, you must use the 'spawn' start method")
    163         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 164             raise AssertionError("Torch not compiled with CUDA enabled")
    165         if _cudart is None:
    166             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled

Yes, but no training, no GPU usage, inference and tutorial writing only.
Here is speed test:

A I have benchmarked the CPU speed of (patch extraction + HardNet, 3206 descriptors) on x86 (Rosetta) and arm64(native) with kornia

native: 3.46 s
x86: 7.78 s

Baselines:
Colab CPU: 6.43 s
Colab GPU: 141 ms

So, while it is nice and fast for CPU, it is nowhere near any GPU performance.

1 Like

I understand this post is a bit old but has anyone got any numbers for running fast.ai notebooks on an “M1 Pro” or “M1 Pro Max” machine? I’ve seen some youtube video claiming that the Pro Max was only ~30% slower than a 3090 with 256GB ram on a desktop (but with tensorflow not pytorch.) I’m not sure if pytorch gpu is even available for the M1 Pro Max chips yet?

EDIT: Nevermind. Pytorch doesn’t support Apple M1 GPUs yet (TensorFlow does) so we have to wait for them and then we’ll see some benchmarks.

Thomas Capelle benchmarked MobelNetV2 and ResNet50 training on the M1 and Nvidia GPUs and the 3090 is significantly faster than any M1. An order of magnitude faster in some benchmarks.

However, the M1 Max is competitive with a RTX5000 in these two benchmarks. Which is pretty good.

2 Likes

Yeah, I am open to add more benchmarks if needed, and when Pytorch comes I will probably update the report.
My take is that the M1Pro (non Max) is a very descent GPU for light ML prototyping stuff, the max is just too expensive. Keep in mind that the M1 benchmarks are without any tune up like mixed precision, XLA, etc… just straight conda. Meanwhile, the Nvidia benchmarks are run in a highly optimised docker container from google.

Other take is that both apple cpus are way faster than colab and even Colab Pro.

Also, for 40 Watts you get 1/10th the performance of a 400 Watt RTX3090, so it’s pretty good.

2 Likes

Thanks for your reply, and thanks for doing that comparison!

I’m really amazed at this because I have scrounged together an old Xeon 8c/16t T3600 with 64G ram and a 1070ti and it is (barely) on par with Colab(with GPU enabled).

What this tells me is that I can replace that 2 ton machine with a mid tier M1-Pro or entry level M1Max MBP! :smiley: … Except for any fastai related stuff, we probably need to wait for pytorch to catch up.

And re: the power to perf ratio, seems to me it’s giving decent performance when looked at from that perspective (perf/watt) … I’m very interested in knowing how Apple’s desktop/server level CPUs will fare.

Pytorch released support for Macbook m1 gpu on 18 may. You can try it out now.

1 Like

Has anyone run fastai with the new pytorch nightly that supports M1 GPUs? I see this funny thing where if I pip-install it I can confirm it works (torch.backends.mps), but only if I uninstall fastai first. If I install fastai on top of it, maybe because of a torchvision requirement, pytorch loses the ability to check for M1 gpus.

I just submitted an issue to the github repository.

dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method=‘squish’)]
).dataloaders(path)

dls.show_batch(max_n=6)

1 Like

You can check some preliminary results @tcapelle has pubished today.

I ran that code on a Xeon+1070ti and have published my comparison in that thread as well.

PyTorch posted this. Maybe it would be useful here Introducing Accelerated PyTorch Training on Mac | PyTorch

I wanted to do some quick patching for proof of concept and seems that m1 is working with performance being around 2x what it was. The code should probably be rejigged to allow people to select MPS(m1) rather than CUDA if they want, but by default it can choose CUDA before MPS since generally speaking you’d get better performance on a CUDA card. Here is my quick and dirty patch to see the performance increase: Comparing fastai:master...pellet:master · fastai/fastai · GitHub

1 Like