What makes fastai unstable in M1 Mac?

from README, I saw fastai doesn’t naturally support installation on M1 Mac.

But I learn that Pytorch has now supported M1 Mac, so I attempted to install fastai in my M1 Mac. FYI, I use miniconda , then install pytorch and fastai separately by pip.

The installation was successful without any error message and I verified my installed PyTorch has access to M1 GPU, i.e.

In [5]: torch.backends.mps.is_available()
Out[5]: True

However, when I run this sample notebook for testing, I kept getting Kernel die error on the following line:

dls = ImageDataLoaders.from_name_re(
    path, fnames, pat=r'(.+)_\d+.jpg$', item_tfms=Resize(460), bs=bs,
    batch_tfms=[*aug_transforms(size=224, min_scale=0.75), Normalize.from_stats(*imagenet_stats)])

The same error persisted despite reducing batch size or image size so I believe the issue is not related to memory but something more fundamental.

Does anyone have experience using fastai in M1 Mac and encounter similar situations? I have no idea what causes the issue at this moment.

1 Like

Let me check

I found commenting out this argument from ImageDataLoaders helps:

batch_tfms=[*aug_transforms(size=224, min_scale=0.75), Normalize.from_stats(*imagenet_stats)]

So I guess it has to do with the transformation pipeline.

1 Like

But then it got stuck at learn.fit_one_cycle(1), its training progress bar is stuck at 0%, with the following warning messages:

/Users/alex/opt/miniconda3/envs/fastai/lib/python3.9/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
/Users/alex/opt/miniconda3/envs/fastai/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py:115: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.")
[W ParallelNative.cpp:229] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:229] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:229] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:229] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:229] Warning: Cannot set number of intraop threads after para

The library should in my opinion support mps

a lot of notebooks i have seen online only checks if cuda is available though, so i have ad to add checks for mps support.

2 Likes

Thanks @melonkernel and good to see that the library takes into account mps device.
I am still trying to figure out what went wrong with my setup.

Did you personally use fastai in M1 Mac? (If so, did it work fine for you when running the course notebooks?)

FYI, I noticed others encountered the same issue with their M1 Mac (i.e. Beginner: Setup ✅ - #159 by knivets)

I have not run them for the fastai course, but for some other uses.

My kernel also died

tested to run jupyter with the --debug flag on, but it seems to not give any info. Will try with a custom python script outside juptyer fo detect any error messages.

If i run this from command line (with the same conda env i used for the notebook)
it will complain about:

-:27:11: error: invalid input tensor shapes, indices shape and updates shape must be equal
-:27:11: note: see current operation: %25 = “mps.scatter_along_axis”(%23, %arg0, %24, %1) {mode = 6 : i32} : (tensor<634800xf32>, tensor<460xf32>, tensor<211600xi32>, tensor) → tensor<634800xf32>
/AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphExecutable.mm:1267: failed assertion `Error: MLIR pass manager failed’
[1]

The autocast is not supported on m1 i think.

I can’t explain why, but I couldn’t reproduce your error with your script
I managed to run your script successfully.
FYI, I am using PyTorch 1.12.1, M1 arm64 build, my fastai version is 2.7.10

1 Like

Ok great. I had done some changes to my cona env, What python version are you using btw?

Python 3.9.13, arm64 build

Last time i tried using the M1 not all pytorch operations were supported yet for most of the more complicated libraries. Not sure if thats still the issue right now.

You still have that issue. I believe more operations are supported now but there’s still missing ones and every once in a while you’ll get either a warning that a certain operation will switch over to CPU, or an error asking you to set an environment variable, or the code will simply crash out. It’s getting better though …

2 Likes

This is so strange
Based on your script, I made another script and then run in command line.
The script did work out and I can did the training (fit_one_cycle) though at a pretty slow speed.

But when I run the lesson1 jupyter notebook again, it either crashed at

dls = ImageDataLoaders.from_name_re(
    path, fnames, pat=r'(.+)_\d+.jpg$', item_tfms=Resize(460), bs=bs,
    batch_tfms=[*aug_transforms(size=224, min_scale=0.75), Normalize.from_stats(*imagenet_stats)])

or crashed at

learn.fit_one_cycle(1)

I guess it is not related to unsupported pytorch operations on M1 mac, but something specific to jupyter notebook

For those looking for solutions in 2023:

  1. Setup: mamba create -n fastai python=3.10 && mamba activate fastai && pip install fastai transformers timm jupyter
  2. Remove aug_transforms from dataloaders (this is causing the issue)
  3. You should be able to train on mps now

I haven’t found the exact reason the kernel dies with aug_transforms, haven’t had a chance to look into it. But hopefully this unblocks people for now!

1 Like

For me on M1 mac the aug_transforms works with mps when limiting transformations like this:
aug_transforms( do_flip=False, flip_vert=False, max_rotate=0.0, max_zoom=1.0, max_warp=0.0, )

Alternatively calling datablock.dataloaders(path, device = 'cpu') with arbitrary arguments of aug_transforms

1 Like

this worked for m1 Mac for me i.e passing those args to aug_transform