Apple Silicon M1 Macbook air for 01_intro

ysimonx · December 27, 2020, 8:58pm

Hello

I am running Mac OS X Big Sur on Apple macbook air m1
with python 3.8 (the python provided by Apple on BigSur)

learn.fine_tune(1)

can not be done with Apple Silicon on 01_intro jupyter notebook

i have this log message

[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.
[W ParallelNative.cpp:206] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)

the only solution i found is to

export OMP_NUM_THREADS=1

just before starting

jupyter notebook

but this is not a good solution because the CPU is only < 100%

I hope fastai will upgrade it’s code as tensorflow did

thank you

ysimonx · December 27, 2020, 9:09pm

the same monitoring with an intel processor

ilovescience · December 27, 2020, 9:09pm

fastai cannot do anything regarding M1 Macbook Air support. PyTorch will have to include support:

vincentwi · December 28, 2020, 4:54am

how are you able to run fastai / fastbook without pytorch being compatible with apple silicon?

ysimonx · December 28, 2020, 3:40pm

@vincentwi : I managed to install pytorch without any problem

with pip

python3 -m venv ./env

source env/bin/activate

pip install --upgrade pip

pip install torch torchvision

ysimonx · December 28, 2020, 8:41pm

@ilovescience : it is actually running on my M1 Macbook Air … slowly, without any GPU support, but it runs

ilovescience · December 28, 2020, 9:21pm

Yeah it’s just running on the CPU. You aren’t going to get the speedups you see with TensorFlow because TensorFlow has GPU support for M1 Mac. If you were to run both PyTorch/fastai and TensorFlow on just CPU, they would likely be comparable.

ysimonx · December 28, 2020, 9:37pm

@ilovescience : this is not the reason why i wrote this post.
On my Macbook Pro, INTEL version, the fast.ai script runs well without any GPU support, with 6 processes.
On my Macbook Air, M1 version, without Rosetta, the same script can not be run with 2 processes or more, I had to force the export OMP_NUM_THREADS=1

ilovescience · December 29, 2020, 2:49am

Again this is likely some optimization that PyTorch hasn’t supported. Have you tried running some experiments with pure PyTorch to compare?

ysimonx · December 29, 2020, 8:02am

ok, i will give it a try

ysimonx · December 29, 2020, 6:44pm

hi @ilovescience

I just ran https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
where some scripts use 2 workers … and i have to say that everything went well
Core 5 and Core 6 were able to train the classifier multithreads

vincentwi · December 31, 2020, 2:07am

if you have pytorch are you then able to install the entire fastbook package? Meanining, can you fully do the fast.ai bootcamp on Apple Silicon (M1)?

ysimonx · December 31, 2020, 10:26am

@vincentwi

i was able to import the whole fastbook package, but in order to run and test “part 1” of the book,

github.com

fastai/fastbook/blob/master/01_intro.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "!pip install -Uqq fastbook\n",
    "import fastbook\n",
    "fastbook.setup_book()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [

This file has been truncated. show original

until the part where you can upload a picture, and guess if this is a cat or not

before starting “jupyter notebook”, I had to force the env variable
OMP_NUM_THREADS=1
the “learning” part of the python script runs, very very slowly.

In result, i could train the model. yes.
With the same script, my “intel” macbookpro is fastest, by far.

I did not try the remaining code of the script

iTapAndroid · August 19, 2022, 1:20am

I was just able to share the whole model from lesson 1 on my local MBP M1 Max with 24 GPU Cores.