I am trying to build an Arabic language model using instructions from https://github.com/n-waves/ulmfit-multilingual/tree/master/ulmfit running on colab (!curl https://course-v3.fast.ai/setup/colab | bash
). So far, I got all the steps completed successfully. Then, I tried to run pretrain_lm.py like so:
!python -m pretrain_lm 'data/wiki/ar-2-unk' 'ar' 0 True False 60000 70 70 'ar-2' 10 True 1.0
It runs with a few lines of output:
Batch size: 70
Max vocab: 60000
Using QRNNs...
Saving vocabulary as data/wiki/ar-2-unk/models/itos_ar-2.pkl
Size of vocabulary: 50723
First 10 words in vocab: <unk>, <pad>, <eos>, ., ،, في, من, @.@, على, "
Cupy not found the code will work only on CPU!
true_wd: False
Starting from random weights
then there is an error:
....
File "/usr/local/lib/python3.6/dist-packages/fastai/text/qrnn/forget_mult.py", line 184, in forward
return GPUForgetMult()(f, x, hidden_init) if use_cuda else CPUForgetMult()(f, x, hidden_init)
File "/usr/local/lib/python3.6/dist-packages/fastai/text/qrnn/forget_mult.py", line 127, in forward
self.compile()
File "/usr/local/lib/python3.6/dist-packages/fastai/text/qrnn/forget_mult.py", line 109, in compile
program = _NVRTCProgram(kernel.encode(), 'recurrent_forget_mult.cu'.encode())
NameError: name '_NVRTCProgram' is not defined
Am I doing something wrong in this last step? I am also using this thread to see if there are standard Arabic benchmarks to test the model later.