Change AWD_LSTM Parameter

Hi

I’m trying to ULM-Fit on text classification

https://arxiv.org/pdf/1801.06146.pdf

We use the AWD-LSTM language model (Merity et al., 2017a) with an embedding size of 400, 3 layers, 1150 hidden activations per layer, and a BPTT batch size of 70. We apply dropout of 0.4 to layers, 0.3 to RNN layers, 0.4 to input embedding layers, 0.05 to embedding layers, and weight dropout of 0.5 to the RNN hidden-to-hidden matrix. The classifier has a hidden layer of size 50. We use Adam with β1 = 0.7 instead of the default β1 = 0.9 and β2 = 0.99, similar to (Dozat and Manning, 2017). We use a batch size of 64, a base learning rate of 0.004 and 0.01 for finetuning the LM and the classifier respectively, and tune the number of epochs on the validation set of each task.

but Fast.ai tutorial is not this paper’s parameter.
https://docs.fast.ai/text.models.html

so, I changed parameters.

learn = language_model_learner(data_clas, AWD_LSTM(vocab_sz=60000, 
            emb_sz=400, n_hid=1150, n_layers=3, hidden_p=0.4, 
            input_p=0.4, embed_p=0.05, weight_p=0.5))

and…

I got this error

KeyError: AWD_LSTM(enter code here
(encoder): Embedding(60000, 400, padding_idx=1)
(encoder_dp): EmbeddingDropout(
(emb): Embedding(60000, 400, padding_idx=1)
)
(rnns): ModuleList(
(0): WeightDropout(
(module): LSTM(400, 1150, batch_first=True)
)
(1): WeightDropout(
(module): LSTM(1150, 1150, batch_first=True)
)
(2): WeightDropout(
(module): LSTM(1150, 400, batch_first=True)
)
)
(input_dp): RNNDropout()
(hidden_dps): ModuleList(
(0): RNNDropout()
(1): RNNDropout()
(2): RNNDropout()
)
)

I don’t know why I get this error.

and how can I fix it

If you want to update the params for the model you do it like this (here just to change to qrnn=True

config = awd_lstm_lm_config.copy()
config['qrnn'] = True

Then pass to the LM learner.

learn = language_model_learner(data_clas,AWD_LSTM, config=config)

Default params in AWD_LSTM config are here. They appear to match the paper except for the dropout probs (is that what you are focused on?)

Note if you change the fundamentals of the arch you will not be able to load pre-trained weights.

8 Likes

Thank you!!!

I want to exchange my dropout probs.

Thank you, Bobak! This was not at all obvious from the documentation and I’m very excited to use qrnn to see if I can get some training speed-up with the increase parallelizing that this method offers!