Running IMDB notebook under 10 minutes


(MTAU) #22

About to start this exercise. A little put off by the unresolved issues flagged here.

Has anyone resolved this problem or should I start from scratch?


(Abhishek Mishra) #23

Hi Folks,

What is the difference between learner.load_cycle(‘adam3_10’,2) and learner.load(‘adam3_10’) ?
Parameters are just for reference.
I am referring lesson4-imdb notebook’s 3 lines as below.

learner.save_encoder(‘adam1_enc’)
learner.load_encoder(‘adam1_enc’)
learner.load_cycle(‘adam3_10’,2)

learner.load_cycle(‘adam3_10’,2) doesn’t seem to be related to learner used in language model.


(Sam Lloyd) #24

The difference is pretty much zilch. load_cycle just formats the string ‘f{name}cycle{number}’ and then calls learner.load


(Karl) #25

Was the vocab size difference issue ever resolved? I just tried to use the weights uploaded here and found a pretty sizable vocab difference. I get md.nt = 37392 compared to md.nt = 34945.


#26

Hello,

I’m trying to reproduce results from scratch for the notebook from lesson 4, could you help with the following questions:

  • When I use the dataset from the notebook’s link, in train/all folder there are 75000 files, but in test/all folder - 25000 files.
    If I download this dataset from http://ai.stanford.edu/~amaas/data/sentiment/ - there are 25000 (train) + 25000 (test) files
    Why dataset from fast.ai server contains 75000 files in train directory? Is it another version of dataset? The description from notebook is about 25000 files, no mentions about 75000 files.

  • If we use this dataset from fast.ai server for the first part of training, then when we run the following code:
    splits = torchtext.datasets.IMDB.splits(TEXT, IMDB_LABEL, ‘data/’)
    It seems that it downloads the original version of dataset (25000 + 25000 files).
    So if it is true, then we partly train the model using fast.ai’s dataset, then train the last time using original dataset, and predict on test data from original dataset.
    Is it intended strategy for training? What is the reason for it?
    I’ve also checked that state-of-art result of 94.1% from research paper, that is mentioned in notebook, is reached on the data from http://ai.stanford.edu/~amaas/data/sentiment/
    Is it honest to compare this result with the result from lesson 4, if we have dataset that is 3x times larger than original dataset? I didn’t see the content of the files in fast.ai dataset, but anyway we have not exact set of training files.
    P.S. I 've checked test files from both datasets, they are equal, but our training process is confusing for me.

  • While creating language model using this code:
    md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=10)
    We implicitly set the folder for saving the model in subdirectory of the PATH (for example if we try to save the model using learner.save_encoder).
    How can I save the model outside of the PATH directory?
    For example, for Paperspace Gradient notebook with fast.ai template, all datasets are in read-only directory, and while trying to save the model I’ve got an error that it is not possible. So I need to copy the whole dataset from read-only directory to another directory, which is time/cost consuming. I tried to use P100 GPU for training, so it was enough expensive to waste the time on preparing the dataset.

Correct me please if I am wrong in any of these questions.


(William Collins) #27

I’m getting a copy error even though the sizes are an exact match. I’m very confused.

learner.load(learner_finetune_path)


RuntimeError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
513 try:
–> 514 own_state[name].copy_(param)
515 except Exception:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:20

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in ()
----> 1 learner.load(learner_finetune_path)

~/anaconda3/lib/python3.6/site-packages/fastai/learner.py in load(self, name)
94
95 def load(self, name):
—> 96 load_model(self.model, self.get_model_path(name))
97 if hasattr(self, ‘swa_model’): load_model(self.swa_model, self.get_model_path(name)[:-3]+’-swa.h5’)
98

~/anaconda3/lib/python3.6/site-packages/fastai/torch_imports.py in load_model(m, p)
25 def children(m): return m if isinstance(m, (list, tuple)) else list(m.children())
26 def save_model(m, p): torch.save(m.state_dict(), p)
—> 27 def load_model(m, p): m.load_state_dict(torch.load(p, map_location=lambda storage, loc: storage))
28
29 def load_pre(pre, f, fn):

~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
517 'whose dimensions in the model are {} and ’
518 ‘whose dimensions in the checkpoint are {}.’
–> 519 .format(name, own_state[name].size(), param.size()))
520 elif strict:
521 raise KeyError(‘unexpected key “{}” in state_dict’

RuntimeError: While copying the parameter named 0.encoder.weight, whose dimensions in the model are torch.Size([50004, 400]) and whose dimensions in the checkpoint are torch.Size([50004, 400]).