TypeError: unhashable type 'list' when making DataLoader

vrjordant · November 28, 2023, 3:11am

I have been receiving this error when trying to make a DataLoader. Basically, I’m trying to train a model to recognize Python files from JavaScript files by looking at the code. I created my own dataset on Kaggle and imported it into my worksheet. However, once I try to initialize the data loader, I am met with this error. I tried basing this model off of the Lesson 1 worksheet for Practical Deep Learning for Coders.

Here is my Kaggle workbook: Saving a basic fastai model | Kaggle.

Here is a link to the dataset if you need it: Python and Javascript Code | Kaggle

Here is the relevant code and error:

Would greatly appreciate any help.

vrjordant · November 28, 2023, 7:18pm

Update: I was able to fix the above error by replacing BaseTokenizer with noop. However, I eventually ran into this error below: AttributeError: ‘list’ object has no attribute ‘truncate’. I can’t use show_batch() to see what my data looks like. I feel like it still has something to do with the tokenizing, but I’m not really sure.

vrjordant · November 28, 2023, 9:00pm

Alright, so I’ve made a bit more progress. I refactored the dataset to look more like Imagenet. See updated dataset here: Python and JavaScript Code Imagenet Style | Kaggle. I refactored some of the code to look more like the text learner tutorial. I was able to create the data loader and the learner, but trying to fine tune it produced a weird error: RuntimeError: transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered. It makes some progress on training, but then it crashes somewhere in the middle. Not really sure what this one’s about. Will try to reproduce on Colab or check my PyTorch version. After the first instance occurs, whenever I try to rerun some of the previous learner lines, a similar error appears.

vrjordant · November 28, 2023, 10:43pm

Turns out switching to Google Colab did fix it.