Lesson 4- IMDB - Dataset


(Surya Mohan) #1

Hi all, likely a very nooby question but I am currently working my way through the lesson4-imdb notebook. When I download the dataset using the link provided ((http://files.fast.ai/data/aclImdb.tgz), I see the train, test folders but no models folder. This results in an error while executing the following line: pickle.dump(TEXT, open(f’{PATH}models/TEXT.pkl’,‘wb’)).

Can you please let me know what I am doing wrong?

Thanks,
Surya


#2

I had the same problem and solved it by creating the models directory. However, later in the notebook I’m running into this:

FileNotFoundError: [Errno 2] No such file or directory: ‘data/aclImdb/models\adam3_10_cyc_2.h5’

so I’m guessing that somehow we’re missing a step that creates this directory and adds some files to it.


(RobG) #3

You simply need to create that models directory inside the data {PATH} directory. For instance on linux
mkdir models


#4

I tried that and it worked fine (I’m working locally on Windows). However a little later in the IMDB notebook I encountered this:

The cell:

learner.load_cycle(‘adam3_10’,2)

gives the error:

FileNotFoundError: [Errno 2] No such file or directory: ‘data/aclImdb/models\adam3_10_cyc_2.h5’

Does this mean that an earlier step was supposed to generate this file? If so, that doesn’t seem to be happening for me.


(RobG) #5

This is simply a result of demonstration notebooks being like real notebooks, with parts repeated, skipped, or changed. The notebooks are meant to go along with the talk. Further down in the sentiment section you can see a fit() function having its cycles saved with cycle_save_name= and subsequently loaded with load_cycle(). The load that isn’t working for you is because the notebook isn’t showing a preceding save. You don’t need to save and load, it is just there to save time. Debugging some of the wrinkles in the notebooks actually proves to be a good learning experience, so hang in there.


#6

Thanks RobG. I agree completely about the debugging of the notebooks. I watched the talk right through, then worked on the notebook. I probably need to watch the talk again while taking the first pass through the notebook.


(Surya Mohan) #7

Thank you @Roger and @digitalspecialists for your responses. I will try this today.


(gram) #8

About the structure of the way the data is arranged and fed to the GPU,

Jeremy says context matters, but it looks like he gets the sentences to go down the columns.
But then it takes a slice across 64 columns and feeds this into the GPU?

How is this not slicing up the context 64 times?

Is it “reading” down the columns even though it is getting the sentences 64 at a time?