Using QRNN in Language Models


#1

QRNNs where introduced in this article by James Bradbury, Stephen Merity, Caiming Xiong and Richard Socher as an alternative to LSTMs. The main advantage is that they are between 2 and 4 times faster (depending on your batch size/bptt) and can reach same state of the art results.

I’ve adapted their QRNN pytorch implementation into the fastai library. To use it, you must first install the cupy package, then just add the option qrnn=True when you build a Language model, for instance:

learner = md.get_model(opt_fn, em_sz, nh, nl, dropouti=drops[0], dropout=drops[1], 
                      wdrop=drops[2], dropoute=drops[3], dropouth=drops[4], qrnn=True)

To install the cupy package, just use the instruction on their github repo. It should be as easy as pip install cupy-cudaXX (with XX being 80, 90 or 91 depending on your cuda version). Note that on Windows, you must install Visual c++ build tools first for it to work (scroll the page a bit to find them on their own without Visual Studio 2017).

I’m currently trying to find a good set of hyper-parameters (all the dropouts have to change for instance) and will share a notebook as soon as I have something as good as the LSTM version of the Language Model.


Language Model Zoo :gorilla:
#2

Great work! Can’t wait to try it. Is it also multi GPU as the original repo?


#3

It should be, though I have only tested it on one GPU for now.


(Ben Johnson) #4

Have you tried replacing the LSTM in ULMFit w/ the QRNN yet? If not, I can do it if you share the pretrained model with me.


#5

I haven’t pretrained a model with QRNNs on wt103 yet, but I will as soon as I figure the best set of hyper-parameters for training! Then I’ll share the result with the regular LSTMs pretrained models.


(Divyansh Jha) #6

Great Work!!


(Monique Monteiro) #7

Great!

Have you tried it with your French language model?

Best regards,
Monique


#8

Not yet. Right now, I’ve trained an Enlgish model and I’m looking at redoing the imdb notebook with it.


(Thomas Wolf) #9

That’s really cool!
Are you starting from their latest set of QRNN hyper-parameters in “An Analysis of Neural Language Modeling at Multiple Scales” (https://arxiv.org/abs/1803.08240) ?


#10

Exactly. With just a few tweaks to use the 1cycle policy and try to achieve super-convergence there.


(Sooheon Kim) #11

@sgugger

I’ve installed cupy-cuda91 (same as my cuda version), and tried to get this working, but it errors out here:

~/fastai/fastai/torchqrnn/forget_mult.py in <module>()
      2 import torch
      3 from torch.autograd import Variable
----> 4 from cupy.cuda import function
      5 from cupy.cuda.compiler import _NVRTCProgram
      6 from collections import namedtuple

ImportError: cannot import name 'function'

Are you using a different version of cupy which has that definition?


#12

Did you install cupy inside the fastai environment?
I’m using the regular version of cupy from the github repo I mentioned in the first post.


(Sooheon Kim) #13

Yep, sorry, this was just installation wonkiness.


#14

I’m getting KeyError: 'unexpected key "0.rnns.0.module.weight_ih_l0" in state_dict, when I run learner.model.load_state_dict(wgts) though it goes away when I remove qrnn=True. Has this happened to anyone else?


#15

This line is to load the pretrained model from Jeremy, which has LSTMs and not QRNNs.
There is no pretrained QRNN model yet (working on this, but for now my pretrained model doesn’t get as good results on imdb).