Fit failing with RuntimeError: Could not infer dtype of numpy.int64

I ran into the same problem. Found this issue on github where both participants said it was a installation problem. A fresh installation via PyPi in a fresh environment worked for me.

Yes, the problem being in pad_collate, it’s logical you can’t iterate through batches. I can’t reproduce the issue, and as @tblock suggested, maybe it’s an installation problem?
Maybe create a new environment and reinstall the library (it’s super quick) then see if the problem persists?

Otherwise, we’ll try to get to the bottom of this together.

So I created a new python 3.6, had everything set up with Cuda 9.2, and installed everything I could with pip instead of conda.
Unfortunately I get the same issue, but this time I have a warning that I didn’t see before:

Is this possibly related? How would I fix the problem?

Update: the warning appears to be an issue with Spacy, so I’m inclined to think it is not too related
Below is the code I ran in anaconda prompt for the latest installation:

conda create -n py36 python=3.6 anaconda
conda activate py36
python -m pip install --upgrade pip
pip3 install http://download.pytorch.org/whl/cu92/torch-0.4.1-cp36-cp36m-win_amd64.whl
pip3 install torchvision
conda install -c conda-forge spacy
python -m spacy download en
pip install fastai

Ok I finally managed to get the devil running. I did have to go into line 281 of text\data.py and make some changes.
I found this discussion about how pytorch tensor doesn’t accept integers half way down the page https://github.com/pytorch/pytorch/issues/8365
So I went I changed pad_collate() from
return res, torch.tensor([s[1] for s in samples]).squeeze()
to
return res, torch.tensor([np.long(s[1]) for s in samples]).squeeze()

See image below:

This is weird, as when I tried to do the same thing with converting your column to np.int64, it didn’t work… Do you mind sending me a sample of your dataframe so I can see if I can replicate the bug?

I’m happy to give you my whole jupyter notebook, I just can’t attach that to the reply. Is there a way for me to send it to you?

Can you put in a small repo in github? With a small subset of the files you use to create your dataframes (but still manage to reproduce the bug).


I see why you are looking for a sample of my dataframe now. I have truncated some of the data files so I can upload to github. In the parent directory there is a jupyter notebook that has all the details. The LM folder contains all data used for the language model and the class folder contains data used for the classification (which is where the error happened.

Even with your repo, I can’t reproduce the bug. This is really weird, maybe it’s a problem with your version of pytorch then?

Yeah strange, this is quite literally the version I installed:

pip3 install http://download.pytorch.org/whl/cu92/torch-0.4.1-cp36-cp36m-win_amd64.whl

Note that I did not ever install nightly, because it tells me package cannot be found. Gathering from the information so far, I can only guess that this is either the difference in the nightly version, or that I’m potentially using a newer version than you where they have stopped accepting integers in the tensor() method.

Is it worth applying my suggested code change anyway such that it caters for all versions of pytorch?

I also noticed that a separate post also made mention of a potentially similar situation, but the user fixed it in a different way. See Fastai V1 and multilabel (Ulmfit)

Ah! You have pytorch 0.4 and not 1.0. That explains all :wink:
The fastai library only supports pytorch v1 and was designed this way, so you should expect a lot more things to go wrong if you stick to pytorch 0.4.

I know v1 isn’t available in windows yet but don’t expect everything to work properly if you don’t switch to Linux instances for now.

Ah windows… fml

@sgugger based on this resolution and other posts I’ve seen around the forum, would it be valuable if I wrote a standard “bug report / request for help” template that people can use (for example: a minimum code snippet to reproduce the error if possible, versions of major libraries, system OS, etc).

Obviously making it as short as possible to avoid boilerplate / overhead is ideal to make it easy as possible for beginners to ask questions, but might be nice to solve issues like these

That sounds great! There’s some ideas you could steal from here:

https://docs-dev.fast.ai/troubleshoot.html#support

Nice, thanks. First pass (feedback welcome, feel free to use some / all / none of it wherever appropriate or valuable).

Request for Help

Please see troubleshooting docs here: https://docs-dev.fast.ai/troubleshoot.html

In order to allow us to help you most effectively, please keep the following in mind:

  • Please be as specific as possible. Include relevant code, error outputs, etc. “fastai isn’t working” is a lot worse than “I can’t figure out how to use a different sampler in my dataloader”, which is a lot worse than "Here’s a code snippet where I’m trying to use a WeightedRandomSampler, and it’s giving the following error: ..."
  • Please search your question here on the forums, as well as on Google (bonus points if you include links you found in your search that were helpful but didn’t fully answer your question!)
  • Please include a (the shortest possible) code snippet to demonstrate what you’re trying to do (for more info, please see https://stackoverflow.com/help/mcve). In the case of Deep Learning problems, consider what’s really needed for someone to reproduce your problem. Does the person helping you need to download your entire dataset, or does a single piece of data work? Or even better, does a torch.ones tensor of the same shape as your data produce the same error?
  • Please include your system version and setup: are you developing locally on Windows? Remotely on AWS? Please especially include the output of the following command (run from the command line):
python -c 'import fastai; fastai.show_install(1)'

Bug Report

Bug reports are welcome at https://github.com/fastai/fastai/issues/new – but if for some reason you’d prefer to post it here, that’s okay. We do ask that your follow the previous instructions as appropriate – especially giving us the code needed to reproduce your bug is very helpful.


And remember, we’re all here to help you, but we’re not magic (and also unpaid). We’ll try our best to get you to a solution, and showing that you’re also putting in effort is appreciated.

Thanks! Updated guide here:

The biggest issue now is that the ‘troubleshooting’ page only covers local installation - not troubleshooting for gradient/gcp/etc. So a really help PR would be something at the top that points people to the setup guides on course-v3.fast.ai.

Hi,

I am running into the same problem when I create a TextClasDataBunch. I tried a fresh installation in a new environment and the problem persists. For this project I’m on a linux operating system. The full stack trace is as below:

Traceback (most recent call last):
 File "train.py", line 40, in <module>
 LSTM.train()
File "train.py", line 33, in train
for batch_idx, (data,target) in enumerate(self.train_dataloader):
File "/homes/anand39/.pyenv/versions/DL_venv/lib/python3.6/site-packages/fastai/basic_data.py", line 82, in __iter__
for b in self.dl: yield self.proc_batch(b)
File "/homes/anand39/.pyenv/versions/DL_venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/homes/anand39/.pyenv/versions/DL_venv/lib/python3.6/site-packages/fastai/text/data.py", line 273, in pad_collate
return res, tensor([s[1] for s in samples])
File "/homes/anand39/.pyenv/versions/DL_venv/lib/python3.6/site-packages/fastai/torch_core.py", line 67, in tensor
return torch.tensor(x) if is_listy(x) else as_tensor(x)
RuntimeError: Could not infer dtype of numpy.int64

Is there any fix to this problem at the current time?

Edit: Please ignore, it was an installation problem and it works now!

@sgugger I am not able to install Pytorch v1 in Google colab. Any fix for this?

The link is not working.

This should do the trick: https://docs.fast.ai/troubleshoot.html