Getting error while pretraining with base corpus ( not wiki103 )

faysalhossain2007 · August 15, 2018, 10:27pm

I want to test the performance of the model and using my base corpus as pretrained model. But I dont have the pretrained model that I can use in the finetune process. That’s why want to create a pretrained model using the code. But getting the following error:

Traceback (most recent call last):                                                                                                       | 0/68 [00:00<?, ?it/s]
  File "pretrain_lm.py", line 53, in <module>
    if __name__ == '__main__': fire.Fire(train_lm)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "pretrain_lm.py", line 49, in train_lm
    learner.fit(lrs, 1, wds=wd, use_clr=(32,10), cycle_len=cl)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fastai/learner.py", line 287, in fit
    return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fastai/learner.py", line 234, in fit_gen
    swa_eval_freq=swa_eval_freq, **kwargs)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fastai/model.py", line 129, in fit
    loss = model_stepper.step(V(x),V(y), epoch)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/fastai/model.py", line 52, in step
    loss = raw_loss = self.crit(output, y)
  File "/zf18/fs5ve/.conda/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/localtmp/fs5ve/fastai/courses/dl2/imdb_scripts/sampled_sm.py", line 81, in forward
    if self.sampled: return self.sampled_softmax(input, target)
  File "/localtmp/fs5ve/fastai/courses/dl2/imdb_scripts/sampled_sm.py", line 69, in sampled_softmax
    idxs = V(self.get_rand_idxs())
  File "/localtmp/fs5ve/fastai/courses/dl2/imdb_scripts/sampled_sm.py", line 66, in get_rand_idxs
    def get_rand_idxs(self): return pt_sample(self.prs, self.n_neg)
  File "/localtmp/fs5ve/fastai/courses/dl2/imdb_scripts/sampled_sm.py", line 54, in pt_sample
    return torch.topk(w, ns, largest=False)[1]
RuntimeError: invalid argument 5: k not in range for dimension at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorTopK.cu:21

jairus · September 11, 2018, 2:46am

I’m having this same issue! Could somebody step in and help us out?

jairus · September 11, 2018, 3:38am

I found that by increasing my corpus size or by editing sampled_sm.py line 52 by replacing “ns” with a smaller number(10) it started to work. However, I would definitely recommend the corpus size. I believe this has something to do with beam sizes.