Fastai on Azure ML Error

marco.andreoni · February 19, 2021, 4:29pm

Hello everybody,
I’m working with Fastai (V. 2.1.7) on Azure Machine Learning (Azure ML) and I’m having an issue.

If I train a model directly in the notebook, everything looks ok.
When I try to run exactly the same python code into an experiment I get the following error.

Have you ever experienced the same issue?
Do you have any idea about it?
Thanks a lot

Traceback (most recent call last):
  File "train.py", line 75, in <module>
    learn.fit_one_cycle(8, 3e-3)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/callback/schedule.py", line 112, in fit_one_cycle
    self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 205, in fit
    self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 154, in _with_events
    try:       self(f'before_{event_type}')       ;f()
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 196, in _do_fit
    self._with_events(self._do_epoch, 'epoch', CancelEpochException)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 154, in _with_events
    try:       self(f'before_{event_type}')       ;f()
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 190, in _do_epoch
    self._do_epoch_train()
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 182, in _do_epoch_train
    self._with_events(self.all_batches, 'train', CancelTrainException)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 154, in _with_events
    try:       self(f'before_{event_type}')       ;f()
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/learner.py", line 160, in all_batches
    for o in enumerate(self.dl): self.one_batch(*o)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/data/load.py", line 103, in __iter__
    yield self.after_batch(b)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastcore/transform.py", line 198, in __call__
    def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastcore/transform.py", line 150, in compose_tfms
    x = f(x, **kwargs)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/vision/augment.py", line 34, in __call__
    self.before_call(b, split_idx=split_idx)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/vision/augment.py", line 377, in before_call
    self.do,self.mat = True,self._get_affine_mat(b)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/vision/augment.py", line 388, in _get_affine_mat
    aff_m = _init_mat(x)
  File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/fastai/vision/augment.py", line 286, in _init_mat
    mat = torch.eye(3, device=x.device).float()
AttributeError: 'list' object has no attribute 'device'

nickkb · March 3, 2021, 7:11pm

@marco.andreoni I’m having the exact same error. I’m working the 02_production notebook and it happens for me after calling learn.fine_tune(4). It looks like it gets through the first epoch of training OK but then breaks when it goes to validation?

At first, I thought it was a problem with my data, but I’m pretty sure it is some sort of idiosyncrasy of the Azure vm set up. I think this because 1) I don’t get the error on paperspace (haven’t tried other platforms) and 2) I get the same error when I run the code unchanged using the ‘bears’ example.

nickkb · March 3, 2021, 8:51pm

Ahh, figured this out: I had to change the kernel my notebook was using to one installed in the fastai2 conda env that was created by the set-up script. Now it works fine!

The clue was that I had to uncomment the !pip install -Uqq fastbook line at the beginning of the notebook … but the fastbook module should have already been installed during set up!