That’s all I did - not sure why you’d have a different number of words. Try counting the words in each subfolder to see how many it should be.
I think a problem might occurred the first time I tried to move ‘unsup’, I repeated the same process now and have the correct number of words, ty!!
Hi, I tried doing the same thing but I got the total count as 17486270 instead of 17486581. What could be the issue? I have attached the screen shot of my folder structure in my train folder. I have moved all files from pos,neg and unsup to the all folder
I am also running into same errors like @pnvijay . Would be great, if someone can share their experiences.
I think you want to update the code to do this,
md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=10)
But I’m not at my machine to test it.
Codebase was modified so LanguageModelData objects can be built from text files or dataframes.
from_dataframes are class methods to do each respectively.
I’ve updated the notebook based on @wgpubs’s changes now, so if you
git pull, it should work fine.
Thanks for doing this. I was just getting on today and about to look at the notebooks when I saw all was good.
@jeremy @wgpubs , I have done the latest git pull, now getting unicode decode error.
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 796: ordinal not in range(128)"
@rob, I have the same line of code as you suggested, still has issue.
Also, I keep getting “Back to the Future Imdb review” and number of words seem to be wrong. I did exactly as suggested by moving files, etc. See below. Would be great, if you all can have a look and help.
@satheesh , regarding utf-8, check the threads titled Crestle. One talks about encoding
I think you need to git pull and/or checkout the imdb notebook again to get it fixed
@rob, I have the latest code pull. I have checked those threads, but nothing worked. I am using Amazon’s fastai AMI , they use utf-8 by default ( check : https://aws.amazon.com/amazon-linux-ami/faqs/ ) . Not sure, what’s going on…two issues, the word counts are mismatching and this ascii error…
Sorry I’m on my phone so can’t link the other thread. Did you find it?
I think you need to recreate the notebook, even after getting the latest pull. The Crestle author said the utf-8 fix will work for new notebooks. If you have further trouble I suggest at-ing him directly
no worries @rob. What do you mean by recreate notebook ? Copy paste each line or just duplicate ? I am assuming this is the thread :Crestle - Spacy installation failed , that @anurag answered, but does not talk about anything like recreate .
@satheesh the other thread applies only to notebooks run on Crestle.
For the ascii issue with Amazon’s AMI, what is the output of the
@anurag, it seems to be UTF-8 . Below is what I get.
locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE=UTF-8 LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
Got it. The only other thing I’d try is what’s recommended in that thread, since LC_ALL seems to be unset for you:
If this works you can add it to your
I’d Google those errors at the top of local output and see if there’s a solution.
Are you using an Amazon image that was made specifically for this course? If so, surely you’re not the only one who will encounter this issue