Recursion error; fastai v1.0.27, Windows 10

I can change the error by setting the number of workers to zero.

data = ImageDataBunch.from_folder(PATH, num_workers=0)

But now I get:

    Traceback (most recent call last):
    File "D:/$CatsDogs/cats.py", line 19, in <module>
    main()
    File "D:/$CatsDogs/cats.py", line 16, in main
    learn.fit(1)
    File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 162, in fit
    callbacks=self.callbacks+callbacks)
    File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 94, in fit
    raise e
    File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 82, in fit
    for xb,yb in progress_bar(data.train_dl, parent=pbar):
    File "C:\ProgramData\Anaconda3\lib\site-packages\fastprogress\fastprogress.py", line 65, in __iter__
    for i,o in enumerate(self._gen):
    File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_data.py", line 47, in __iter__
    for b in self.dl:
    File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 314, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
    File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\torch_core.py", line 94, in data_collate
    return torch.utils.data.dataloader.default_collate(to_data(batch))
    File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 187, in default_collate
    return [default_collate(samples) for samples in transposed]
    File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 187, in <listcomp>
    return [default_collate(samples) for samples in transposed]
    File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 164, in default_collate
    return torch.stack(batch, 0, out=out)
    RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 249 and 274 in dimension 2 at c:\programdata\miniconda3\conda-bld\pytorch_1533096106539\work\aten\src\th\generic/THTensorMath.cpp:3616

It appears to be an image size problem now but I don’t know how to set the image size. The code from the lesson 1 notebook does not seem to work in V1, or I need to find out how the import changed.

Vettejeep

If I run the example directly from the docs, I get a different error:

    path = untar_data(URLs.MNIST_SAMPLE)  # https://docs.fast.ai/
    data = ImageDataBunch.from_folder(path,num_workers=0)

    learn = create_cnn(data, models.resnet18, metrics=accuracy)
    learn.fit(1)

The error:

Traceback (most recent call last):
  File "D:/$CatsDogs/cats.py", line 41, in <module>
    main()
  File "D:/$CatsDogs/cats.py", line 38, in main
    learn.fit(1)
  File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 162, in fit
    callbacks=self.callbacks+callbacks)
  File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 94, in fit
    raise e
  File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 84, in fit
    loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
  File "C:\ProgramData\Anaconda3\lib\site-packages\fastai\basic_train.py", line 22, in loss_batch
    loss = loss_func(out, *yb)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1550, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1407, in nll_loss
    return torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.cuda.IntTensor for argument #2 'target'

Any ideas on how to get going with V1 in an IDE with Windows?

Thanks,
Vettejeep

1 Like

I don’t know of anyone who has gotten it working on Windows yet.

I would like to add my experience of this bug.

On my machine it occurs if I run the example for preprocessing tabular data https://docs.fast.ai/tabular.html#Preprocessing-tabular-data.
The error occurs when I run (cat_x,cont_x),y = next(iter(data.train_dl)).

My system:

=== Software ===
python version : 3.7.1
fastai version : 1.0.27
torch version  : 0.4.1
torch cuda ver
torch cuda is  : **Not available**

=== Hardware ===
No GPUs available

=== Environment ===
platform       : Windows-10-10.0.17134-SP0
conda env      : ai
python         : C:\Users\Tommaso\Miniconda3\envs\ai\python.exe
sys.path       :
C:\Users\Tommaso\Miniconda3\envs\ai\python37.zip
C:\Users\Tommaso\Miniconda3\envs\ai\DLLs
C:\Users\Tommaso\Miniconda3\envs\ai\lib
C:\Users\Tommaso\Miniconda3\envs\ai
C:\Users\Tommaso\Miniconda3\envs\ai\lib\site-packages
C:\Users\Tommaso\Miniconda3\envs\ai\lib\site-packages\IPython\extensions
no supported gpus found on this system

I also got the error with fastai >= 1.0.24. I just tried with version 1.0.22 and everything works fine (of course taking account of API changes to TabularDataBunch.from_df).

Please let me know if I can be of any assistance.

Tommaso

can you please provide a guide for fastai v1 installation on windows 10

As explained here there is no official support for fastai v1 for Windows yet.

Hi,
I have managed to make it work, but since I have very limited experience with Python, I would like to share with you the problem I have found and the proposed solution. Would a pull request be better to do this?

Thanks
Tommaso

I know fastai does not support windows 10 yet. Now, unofficial pytorch 1.0 available… I used latest dev version 1.0.29 with python version 3.7. When I run the Tabular data sample I get recursion error too.

the (cat_x,cont_x),y = next(iter(data.train_dl)) line couses the problem...

Traceback (most recent call last):
File “d:\conda3\lib\site-packages\fastai\data_block.py”, line 423, in getattr
res = getattr(self.x, k, None)
File “d:\conda3\lib\site-packages\fastai\data_block.py”, line 423, in getattr
res = getattr(self.x, k, None)
File “d:\conda3\lib\site-packages\fastai\data_block.py”, line 423, in getattr
res = getattr(self.x, k, None)
[Previous line repeated 996 more times]

1 Like

I don’t think this is a Windows 10 issue, it looks like a windows issue.

All the examples were working for me on Windows 8.1 (as long as I set num_workers=0) before the course started. And as noted above the tabular example stopped working in v1.0.22 and the vision learner started throwing the

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target'

in the dogs_and_cats example in v 1.0.27.

I hadn’t noticed until Lesson 6 because everything else was still working. When someone has a fix can they post it here?

Hello,
I had to patch data_block.py to make it work. The issue is not really Windows 10 related. It happens due to ‘self.x’ and ‘self.y’ attribute access from within __getattr__() method.

As mentioned in :this Stackoverflow post attribute access need to be done though.__dict__ when inside __getattr__().

@Tommaso, is this what you did also or you had different change?

This is new implementation of __getattr__ of LabelsList class:

def __getattr__(self,k:str)->Any:
        res = getattr(self.__dict__.get('x'), k, None)
        return res if res is not None else getattr(self.__dict__.get('y'), k)

Similar change for __getattr__ of ItemLists class:

def __getattr__(self, k):
        ft = getattr(self.__dict__['train'], k)
        if not isinstance(ft, Callable): return ft
        fv = getattr(self.__dict__['valid'], k)
        assert isinstance(fv, Callable)
        def _inner(*args, **kwargs):
            self.train = ft(*args, **kwargs)
            assert isinstance(self.__dict__['train'], LabelList)
            self.valid = fv(*args, **kwargs)
            self.__class__ = LabelLists
            self.process()
            return self
        return _inner
1 Like

Hey @cudawarped,
I also ran into a bunch of those on Windows 10 and one of the FastAI discussions mentions that this is indeed Windows related and variables need to be explicitly converted to long by calling .long() method.

That’s the change for the target parameter. In my case it was line 177 of layers.py file (I am using the latest clone of fastai github repo):

return self.func.__call__(input, target.view(-1).long(), **kwargs)

Also, had to change it in metrics.py, line 39:

return (input==targs.long()).float().mean()

Maybe there are more spots; will be discovering on the go.

HTH

1 Like

Hi @quantotto,
I tried to fix exactly the same part of __getattr__() implementation, but I am not proficient enough in Python so I stopped working on it.
From what I saw on various forums and online resources, infinite recursion in __getattr__() happens very often if the method does not handle correctly missing attributes.
Glad that you came up with a solution! :slight_smile:

Tommaso

Cool, glad it helps!

Hey @quantotto

Thank you for your suggestions, that is a great help.

Regarding the .long() conversion I proposed a small change in this pull request which seemed to work for me.

Alternatively to avoid hacking the library I found that

data.train_ds.y.items = np.int64(data.train_ds.y.items)
data.valid_ds.y.items = np.int64(data.valid_ds.y.items)

also works.

Thanks a lot, @cudawarped! External change is even better. I’ll keep that in mind for my tests.

After quantotto 's suggested changes I am able to run Tabular data sample thank you all…

I have the same issue with this example. Have you found a solution?

All the errors mentioned here should be fixed in master now.

2 Likes

When I run the tabular data example from docs fastai version: 1.0.36.post1 (master) I still get the recursion error.

I don’t see the data_block.py code patches in the master. Am I looking in the right place ? The patch does work for me on windows though, along with the:

data.train_ds.y.items = np.int64(data.train_ds.y.items)
data.valid_ds.y.items = np.int64(data.valid_ds.y.items)

changes in the notebook. Well done quantotto !