Load ULMFit model

I ran through the scripts here to build the ULMFit model described in the recent paper.

Now I’m trying to load the classifier and run prediction on the IMDb dataset to confirm the results. And unfortunately, there is no script for loading or evaluating the model.

So I tried to build my own using relevant pieces from the scripts as well as from the imdb.ipynb notebook. However, when I do so, my accuracy comes out to ~50%. Below is the code snippet I’ve used.

import pickle
from os.path import join
import numpy as np

# this data was generated from `imdb_scripts` 
PATH = "data/nlp_clas/imdb/"

# load vocabulary lookup
itos = pickle.load(open(join(PATH, 'tmp/itos.pkl'), 'rb'))
vs = len(itos)

# load data
test_data = np.load(join(PATH, "tmp/val_ids.npy"))
test_data = np.squeeze(test_data)
test_lbls = np.load(join(PATH, "tmp/lbl_val.npy"))
test_lbls = np.squeeze(test_lbls)
c=int(test_lbls.max())+1

# build a TextDataset
from fastai.text import TextDataset
test_dataset = TextDataset(test_data, test_lbls)

# build a SortSampler
BATCH_SIZE = 8
from fastai.text import SortSampler
test_sampler = SortSampler(test_data, key=lambda x: len(test_data[x]))

# build a DataLoader
from fastai.dataloader import DataLoader
test_loader = DataLoader(test_dataset, BATCH_SIZE, transpose=True, num_workers=1, pad_idx=1, sampler=test_sampler)

# build a TextData instance
from fastai.nlp import TextData
md = TextData(PATH, None, test_loader)

# build the classifier (exactly as it was in train_clas.py)
from functools import partial
from fastai.learner import optim
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
bptt = 70     # back propogation through time
em_sz = 400   # size of embeddings
nh = 1150     # size of hidden
nl = 3        # number of layers

dps = np.array([0.4,0.5,0.05,0.3,0.4])
import torch
from fastai.lm_rnn import get_rnn_classifer
model = get_rnn_classifer(
    bptt=bptt, 
    max_seq=20*70, 
    n_class=c, 
    n_tok=vs, 
    emb_sz=em_sz, 
    n_hid=nh, 
    n_layers=nl, 
    pad_token=1,
    layers=[
        em_sz*3, # three layers of 1200, but then where does nh=1150 come in?
        50,      # just like an intermediate compression layer?  Why 50?
        c        # number of total labels
    ],   
    drops=[dps[4], 0.1],
    dropouti=dps[0],
    wdrop=dps[1],
    dropoute=dps[2],
    dropouth=dps[3]
)
model.eval   # just to make sure dropout is being applied

# build an RNN_Learner
from fastai.nlp import RNN_Learner
from fastai.core import to_gpu
from fastai.nlp import TextModel
learner = RNN_Learner(
    data=md,
    models=TextModel(to_gpu(model)),     # not sure why this is required
    opt_fn=opt_fn
)
learner.model.eval     # just to make sure dropout is being applied

# Can't use `TextData.get_model()` because it expects self.c but then doesn't allow me to add it when I instantiate
# learner = md.get_model(
#     opt_fn=opt_fn,
#     max_sl=20*70,
#     bptt=bptt,
#     emb_sz=em_sz,
#     n_hid=nh,
#     n_layers=nl,
#     dropout=[dps[4], 0.1],
# )


# load saved weights
import torch
loaded_weights = torch.load(join(PATH, "models/fwd_clas_1.h5"))
learner.load("fwd_clas_1")

# confirmed that the new parameters match those of the loaded model
for k,v in loaded_weights.items():
    print(k, np.all(v == learner.model.state_dict()[k]))

# get predictions
preds_dist, preds = learner.predict_with_targs()

# prepare for accuracy measurement
preds = preds.flatten()
golds = learner.data.val_y

def get_accuracy(preds, golds):
    correct_num = np.where(preds == golds)[0].shape[-1]
    return float(correct_num) / golds.shape[-1]

print(get_accuracy(preds, golds))
# 0.49848

Any feedback on where I have made a mistake (presumably in loading) would be greatly appreciated.

Haven’t reviewed your code thoroughly, but I believe fastai.nlp is deprecated in favor of fasta.text.

model.eval()
learner.model.eval()

?
However i don’t think it matters, since eval is called inside predict method.

Also, when you are getting 50% accuracy … it’s almost a sure bet you are shuffling your validation dataset.

See: Kaggle NLP Competition - Toxic Comment Classification Challenge

@bny6613 I was just trying to make sure that dropout wasn’t happening. But I’m glad to know that eval is called inside predict(). Thanks for the info.

@wgpubs Thanks for the thoughts.

But I have two things:

  1. I don’t understand how shuffling the dataset would cause problems unless the data and labels were sorted in different ways, obviously.

  2. Regardless, I don’t see anywhere where the dataset would be shuffled.

from fastai.text import TextDataset
test_dataset = TextDataset(test_data, test_lbls)

The TextDataset class doesn’t have any option to sort that I could find.

from fastai.dataloader import DataLoader
test_loader = DataLoader(test_dataset, BATCH_SIZE, transpose=True, num_workers=1, pad_idx=1, sampler=test_sampler)

Neither does DataLoader

@sebastianruder @jeremy Perhaps you’d be interested in looking at the code snippet and providing your expertise? Once I get it to work, I’ll happily make a pull request to the imdb_scripts README so that others can successfully load your model.

Oh wow! @wgpubs I looked more closely at the discussion you linked, and it sounds like people are implying that the torchtext DataLoader does not keep the labels and the data together when sorting to minimize padding? That’s surprising! But I looked more closely I realized that the Sampler I was using does not take the labels as input.

And so I took your advice and updated DataLoader instantiation:

test_loader = DataLoader(test_dataset, BATCH_SIZE, transpose=True, num_workers=1, pad_idx=1, sampler=None, shuffle=False)

And now I’m no longer getting ~50% accuracy. Thanks!

1 Like

Bingo!

It’s one of my golden rules for DL: If you are getting ~50% accuracy, you are shuffling your validation set.

Glad you figured it out.

1 Like