Fastai on Apple M1

vahuja4 · February 24, 2021, 4:22am

Hi Everyone,

Has anyone successfully used fastai on an Apple M1 chip? Can you please share your experiences?

Sincerely,
Vishal

jamesp · February 24, 2021, 6:13am

I’m using components of it to build out dataloaders, etc, but haven’t actually tried training models with it or running inference with it. So I can only say that, from a barebones standpoint: fastai installs fine via pip, and its components work.

vahuja4 · February 24, 2021, 6:30am

Ok, interesting. Can you please tell me if you are using Air or Pro? And, if possible, would you please train a simple fastai model and see if it works.

jamesp · February 24, 2021, 3:23pm

Air. Training models works. (It’s using the CPU.)

mrfabulous1 · February 24, 2021, 3:42pm

Hi vahuja4 hope all is well!

Below are a few links talking about the Apple Silicon M1.

mrfabulous1

vahuja4 · February 25, 2021, 4:16am

@jamesp, @gunturhakim - thank you for your reply! Good to know that we can train fastai models on an M1 without any issues. Could you also comment on the training speed? Is it very slow as compared to google colab? Thank you!

jamesp · February 25, 2021, 4:20am

It doesn’t use the GPU. So you can design models locally, but I wouldn’t try to train them locally because of how slow it would be.

tcapelle · July 1, 2021, 9:59am

@ducha-aiki is playing with the M1 macbook and pytorch, he has made some progress!

idraja · July 1, 2021, 11:57am

The M1 GPU does not work with CUDA code. Therefore your training code for fastai / pytorch will be done on the CPU. If you want to use the GPU you’ll have to use one of Apple’s tools, like CoreML or CreateML.

The following is run from the M1 I’m typing on now:

[ins] In [1]: import torch
[ins] In [2]: torch.cuda.is_available()
Out[2]: False
[ins] In [5]: torch.cuda.get_device_name()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-7c4a4a7ea8b8> in <module>
----> 1 torch.cuda.get_device_name()

~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in get_device_name(device)
    274         str: the name of the device
    275     """
--> 276     return get_device_properties(device).name
    277
    278

~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in get_device_properties(device)
    304         _CudaDeviceProperties: the properties of the device
    305     """
--> 306     _lazy_init()  # will define _get_device_properties
    307     device = _get_device_index(device, optional=True)
    308     if device < 0 or device >= device_count():

~/nbs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
    162                 "multiprocessing, you must use the 'spawn' start method")
    163         if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 164             raise AssertionError("Torch not compiled with CUDA enabled")
    165         if _cudart is None:
    166             raise AssertionError(

AssertionError: Torch not compiled with CUDA enabled

ducha-aiki · July 1, 2021, 12:54pm

Yes, but no training, no GPU usage, inference and tutorial writing only.
Here is speed test:

A I have benchmarked the CPU speed of (patch extraction + HardNet, 3206 descriptors) on x86 (Rosetta) and arm64(native) with kornia

native: 3.46 s
x86: 7.78 s

Baselines:
Colab CPU: 6.43 s
Colab GPU: 141 ms

So, while it is nice and fast for CPU, it is nowhere near any GPU performance.

mike.moloch · January 15, 2022, 4:01am

I understand this post is a bit old but has anyone got any numbers for running fast.ai notebooks on an “M1 Pro” or “M1 Pro Max” machine? I’ve seen some youtube video claiming that the Pro Max was only ~30% slower than a 3090 with 256GB ram on a desktop (but with tensorflow not pytorch.) I’m not sure if pytorch gpu is even available for the M1 Pro Max chips yet?

EDIT: Nevermind. Pytorch doesn’t support Apple M1 GPUs yet (TensorFlow does) so we have to wait for them and then we’ll see some benchmarks.

bwarner · January 17, 2022, 6:50pm

Thomas Capelle benchmarked MobelNetV2 and ResNet50 training on the M1 and Nvidia GPUs and the 3090 is significantly faster than any M1. An order of magnitude faster in some benchmarks.

However, the M1 Max is competitive with a RTX5000 in these two benchmarks. Which is pretty good.

tcapelle · January 18, 2022, 1:55pm

Yeah, I am open to add more benchmarks if needed, and when Pytorch comes I will probably update the report.
My take is that the M1Pro (non Max) is a very descent GPU for light ML prototyping stuff, the max is just too expensive. Keep in mind that the M1 benchmarks are without any tune up like mixed precision, XLA, etc… just straight conda. Meanwhile, the Nvidia benchmarks are run in a highly optimised docker container from google.

Other take is that both apple cpus are way faster than colab and even Colab Pro.

Also, for 40 Watts you get 1/10th the performance of a 400 Watt RTX3090, so it’s pretty good.

mike.moloch · January 24, 2022, 8:46pm

Thanks for your reply, and thanks for doing that comparison!

I’m really amazed at this because I have scrounged together an old Xeon 8c/16t T3600 with 64G ram and a 1070ti and it is (barely) on par with Colab(with GPU enabled).

What this tells me is that I can replace that 2 ton machine with a mid tier M1-Pro or entry level M1Max MBP! … Except for any fastai related stuff, we probably need to wait for pytorch to catch up.

And re: the power to perf ratio, seems to me it’s giving decent performance when looked at from that perspective (perf/watt) … I’m very interested in knowing how Apple’s desktop/server level CPUs will fare.

dewanshu · May 19, 2022, 12:50am

Pytorch released support for Macbook m1 gpu on 18 may. You can try it out now.

Borz · May 19, 2022, 1:34am

Has anyone run fastai with the new pytorch nightly that supports M1 GPUs? I see this funny thing where if I pip-install it I can confirm it works (torch.backends.mps), but only if I uninstall fastai first. If I install fastai on top of it, maybe because of a torchvision requirement, pytorch loses the ability to check for M1 gpus.

gerardo · May 19, 2022, 1:49am

I just submitted an issue to the github repository.

dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method=‘squish’)]
).dataloaders(path)

dls.show_batch(max_n=6)

github.com/fastai/fastai

AttributeError: '_FakeLoader' object has no attribute 'pin_memory_device'

opened 11:37PM - 18 May 22 UTC

geg00

Be sure you've searched [the forums](https://forums.fast.ai) for the error messa…ge you received. Also, unless you're an experienced fastai developer, first ask on the forums to see if someone else has seen a similar issue already and knows how to solve it. **Only file a bug report here when you're quite confident it's not an issue with your local setup.** **Please see [this model example](https://github.com/fastai/fastai2/issues/487) of how to fill out an issue correctly. Please try to emulate that example as appropriate when opening an issue.** Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES / NO **Describe the bug** AttributeError: '_FakeLoader' object has no attribute 'pin_memory_device' When I execute the dos.show_batch(max_n=6) **To Reproduce** Steps to reproduce the behavior: Successfully installed torch-1.12.0.dev20220511 Right after you execute the DataBlock section dls = DataBlock( blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(valid_pct=0.2, seed=42), get_y=parent_label, item_tfms=[Resize(192, method='squish')] ).dataloaders(path) You execute the code dls.show_batch(max_n=6) ===> --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Untitled-1.ipynb Cell 9' in <cell line: 1>() ----> 1 dls.show_batch(max_n=6) File ~/opt/anaconda3/lib/python3.9/site-packages/fastai/data/core.py:102, in TfmdDL.show_batch(self, b, max_n, ctxs, show, unique, **kwargs) 100 old_get_idxs = self.get_idxs 101 self.get_idxs = lambda: Inf.zeros --> 102 if b is None: b = self.one_batch() 103 if not show: return self._pre_show_batch(b, max_n=max_n) 104 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs) File ~/opt/anaconda3/lib/python3.9/site-packages/fastai/data/load.py:170, in DataLoader.one_batch(self) 168 def one_batch(self): 169 if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches') --> 170 with self.fake_l.no_multiproc(): res = first(self) 171 if hasattr(self, 'it'): delattr(self, 'it') 172 return res File ~/opt/anaconda3/lib/python3.9/site-packages/fastcore/basics.py:621, in first(x, f, negate, **kwargs) 619 x = iter(x) 620 if f: x = filter_ex(x, f=f, negate=negate, gen=True, **kwargs) --> 621 return next(x, None) File ~/opt/anaconda3/lib/python3.9/site-packages/fastai/data/load.py:125, in DataLoader.__iter__(self) 123 self.before_iter() 124 self.__idxs=self.get_idxs() # called in context of main process (not workers/subprocesses) --> 125 for b in _loaders[self.fake_l.num_workers==0](self.fake_l): 126 # pin_memory causes tuples to be converted to lists, so convert them back to tuples 127 if self.pin_memory and type(b) == list: b = tuple(b) 128 if self.device is not None: b = to_device(b, self.device) File ~/opt/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py:590, in _SingleProcessDataLoaderIter.__init__(self, loader) 589 def __init__(self, loader): --> 590 super(_SingleProcessDataLoaderIter, self).__init__(loader) 591 assert self._timeout == 0 592 assert self._num_workers == 0 File ~/opt/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py:521, in _BaseDataLoaderIter.__init__(self, loader) 517 self._prefetch_factor = loader.prefetch_factor 518 # for other backends, pin_memory_device need to set. if not set 519 # default behaviour is CUDA device. if pin_memory_device is selected 520 # and pin_memory is not set, the default behaviour false. --> 521 if (len(loader.pin_memory_device) == 0): 522 self._pin_memory = loader.pin_memory and torch.cuda.is_available() 523 self._pin_memory_device = None AttributeError: '_FakeLoader' object has no attribute 'pin_memory_device' **Expected behavior** A clear and concise description of what you expected to happen. to show the images selected **Error with full stack trace** Place between these lines with triple backticks: ``` ``` **Additional context** I'm using the new version of torch-1.12.0.dev20220511 to take advantage of the M1 Metal option. Add any other context about the problem here.

mike.moloch · May 19, 2022, 2:21am

You can check some preliminary results @tcapelle has pubished today.

I ran that code on a Xeon+1070ti and have published my comparison in that thread as well.

radikubwa · May 19, 2022, 8:19am

PyTorch posted this. Maybe it would be useful here Introducing Accelerated PyTorch Training on Mac | PyTorch

pellet · May 23, 2022, 5:15am

I wanted to do some quick patching for proof of concept and seems that m1 is working with performance being around 2x what it was. The code should probably be rejigged to allow people to select MPS(m1) rather than CUDA if they want, but by default it can choose CUDA before MPS since generally speaking you’d get better performance on a CUDA card. Here is my quick and dirty patch to see the performance increase: Comparing fastai:master...pellet:master · fastai/fastai · GitHub