Hi Everyone,
Has anyone successfully used fastai on an Apple M1 chip? Can you please share your experiences?
Sincerely,
Vishal
Hi Everyone,
Has anyone successfully used fastai on an Apple M1 chip? Can you please share your experiences?
Sincerely,
Vishal
Iām using components of it to build out dataloaders, etc, but havenāt actually tried training models with it or running inference with it. So I can only say that, from a barebones standpoint: fastai installs fine via pip, and its components work.
Ok, interesting. Can you please tell me if you are using Air or Pro? And, if possible, would you please train a simple fastai model and see if it works.
Air. Training models works. (Itās using the CPU.)
@jamesp, @gunturhakim - thank you for your reply! Good to know that we can train fastai models on an M1 without any issues. Could you also comment on the training speed? Is it very slow as compared to google colab? Thank you!
It doesnāt use the GPU. So you can design models locally, but I wouldnāt try to train them locally because of how slow it would be.
@ducha-aiki is playing with the M1 macbook and pytorch, he has made some progress!
The M1 GPU does not work with CUDA code. Therefore your training code for fastai / pytorch will be done on the CPU. If you want to use the GPU youāll have to use one of Appleās tools, like CoreML or CreateML.
The following is run from the M1 Iām typing on now:
[ins] In [1]: import torch
[ins] In [2]: torch.cuda.is_available()
Out[2]: False
[ins] In [5]: torch.cuda.get_device_name()
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-5-7c4a4a7ea8b8> in <module>
----> 1 torch.cuda.get_device_name()
~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in get_device_name(device)
274 str: the name of the device
275 """
--> 276 return get_device_properties(device).name
277
278
~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in get_device_properties(device)
304 _CudaDeviceProperties: the properties of the device
305 """
--> 306 _lazy_init() # will define _get_device_properties
307 device = _get_device_index(device, optional=True)
308 if device < 0 or device >= device_count():
~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
162 "multiprocessing, you must use the 'spawn' start method")
163 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 164 raise AssertionError("Torch not compiled with CUDA enabled")
165 if _cudart is None:
166 raise AssertionError(
AssertionError: Torch not compiled with CUDA enabled
Yes, but no training, no GPU usage, inference and tutorial writing only.
Here is speed test:
A I have benchmarked the CPU speed of (patch extraction + HardNet, 3206 descriptors) on x86 (Rosetta) and arm64(native) with kornia
native: 3.46 s
x86: 7.78 s
Baselines:
Colab CPU: 6.43 s
Colab GPU: 141 ms
So, while it is nice and fast for CPU, it is nowhere near any GPU performance.
I understand this post is a bit old but has anyone got any numbers for running fast.ai notebooks on an āM1 Proā or āM1 Pro Maxā machine? Iāve seen some youtube video claiming that the Pro Max was only ~30% slower than a 3090 with 256GB ram on a desktop (but with tensorflow not pytorch.) Iām not sure if pytorch gpu is even available for the M1 Pro Max chips yet?
EDIT: Nevermind. Pytorch doesnāt support Apple M1 GPUs yet (TensorFlow does) so we have to wait for them and then weāll see some benchmarks.
Thomas Capelle benchmarked MobelNetV2 and ResNet50 training on the M1 and Nvidia GPUs and the 3090 is significantly faster than any M1. An order of magnitude faster in some benchmarks.
However, the M1 Max is competitive with a RTX5000 in these two benchmarks. Which is pretty good.
Yeah, I am open to add more benchmarks if needed, and when Pytorch comes I will probably update the report.
My take is that the M1Pro (non Max) is a very descent GPU for light ML prototyping stuff, the max is just too expensive. Keep in mind that the M1 benchmarks are without any tune up like mixed precision, XLA, etcā¦ just straight conda. Meanwhile, the Nvidia benchmarks are run in a highly optimised docker container from google.
Other take is that both apple cpus are way faster than colab and even Colab Pro.
Also, for 40 Watts you get 1/10th the performance of a 400 Watt RTX3090, so itās pretty good.
Thanks for your reply, and thanks for doing that comparison!
Iām really amazed at this because I have scrounged together an old Xeon 8c/16t T3600 with 64G ram and a 1070ti and it is (barely) on par with Colab(with GPU enabled).
What this tells me is that I can replace that 2 ton machine with a mid tier M1-Pro or entry level M1Max MBP! ā¦ Except for any fastai related stuff, we probably need to wait for pytorch to catch up.
And re: the power to perf ratio, seems to me itās giving decent performance when looked at from that perspective (perf/watt) ā¦ Iām very interested in knowing how Appleās desktop/server level CPUs will fare.
Pytorch released support for Macbook m1 gpu on 18 may. You can try it out now.
Has anyone run fastai with the new pytorch nightly that supports M1 GPUs? I see this funny thing where if I pip-install it I can confirm it works (torch.backends.mps
), but only if I uninstall fastai first. If I install fastai on top of it, maybe because of a torchvision requirement, pytorch loses the ability to check for M1 gpus.
I just submitted an issue to the github repository.
dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method=āsquishā)]
).dataloaders(path)
dls.show_batch(max_n=6)
You can check some preliminary results @tcapelle has pubished today.
I ran that code on a Xeon+1070ti and have published my comparison in that thread as well.
PyTorch posted this. Maybe it would be useful here Introducing Accelerated PyTorch Training on Mac | PyTorch
I wanted to do some quick patching for proof of concept and seems that m1 is working with performance being around 2x what it was. The code should probably be rejigged to allow people to select MPS(m1) rather than CUDA if they want, but by default it can choose CUDA before MPS since generally speaking youād get better performance on a CUDA card. Here is my quick and dirty patch to see the performance increase: Comparing fastai:master...pellet:master Ā· fastai/fastai Ā· GitHub