ULMFiT: Understanding learn.predict output

aayushy · July 16, 2018, 7:33am

I’ve written a test script for testing the classifier trained using train_clas.py (on a custom dataset).

Now I don’t have any OOB test data so I usually just send in the (only one) text string, tokenize it, load the classifier, run predict and evaluate the prediction(s) myself. Some like this:

tst_sent = tok2id(dir_path, tst_tok, max_vocab=30000, min_freq=1)
tst_ds = TextDataset(tst_sent, np.zeros(len(tst_sent)))
tst_dl = DataLoader(tst_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=None)
md = ModelData(dir_path, None, None, tst_dl)
learn = RNN_Learner(md, TextModel(m), opt_fn=opt_fn)

prob = learn.predict(is_test=True)
# output: [[-11.48408  13.15376]] <-- what are these numbers?

pred = np.argmax(prob, axis=1)
# output: [1]

My question is simple: what does learn.predicts() return in the ULMFiT based classifier?

nickl · July 16, 2018, 8:25am

These numbers are per-class scores. The higher score is the result the model chooses.

If the classes are class_a, class_b you can get the result with something like this:

classes = ["class_a", "class_b"]
return classes[np.argmax(raw_scores)]

aayushy · July 16, 2018, 8:29am

Thanks for your reply. What I had meant to ask was: what function was used to get these scores?

nickl · July 16, 2018, 8:37am

It’s a PoolingLinearClassifier. See https://github.com/fastai/fastai/blob/master/fastai/lm_rnn.py#L175 for the code.

aayushy · July 16, 2018, 8:45am

Ah, right. So I just softmax it and I’d have a nice distribution in [0,1).

gerardo · August 7, 2018, 11:05pm

Imagine that already have the model and I just need to test a single prediction.
What is the best way to do this?

aayushy · August 8, 2018, 7:07am

Tokenize your sentence (such as with the Tokenizer module)
Map the tokens to indexes. The logic will be the same as here.
… and the rest of it is the same as the code snippet in the question.

nickl · August 10, 2018, 5:17am

Use the (newish) script in designed for doing exactly this: https://github.com/fastai/fastai/blob/master/courses/dl2/imdb_scripts/predict_with_classifier.py

Uttam · September 12, 2018, 2:58pm

For make a single prediction ,

make sure you have the network built.


def classifier_model_network(dir_path, cuda_id=0):
    '''
    :param dir_path:
    :param cuda_id:
    :return:
    '''
    if not hasattr(torch._C, '_cuda_setDevice'):
        print('CUDA not available. Setting device=-1.')
        cuda_id = -1
    torch.cuda.set_device(cuda_id)

    dir_path = Path(dir_path)
    # load vocabulary lookup
    itos = pickle.load(open(dir_path / 'tmp' / 'itos.pkl', 'rb'))
    n_tokens = len(itos)

    dps = np.array([0.4, 0.5, 0.05, 0.3, 0.4]) * dropmult

    m = get_rnn_classifier(bptt, 20 * 70, label_class, n_tokens, emb_sz=em_sz, n_hid=nh, n_layers=nl,
                           pad_token=1,
                           layers=[em_sz * 3, 50, label_class], drops=[dps[4], 0.1],
                           dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])
    m.eval  # just to make sure dropout is being applied
    return m

Fetch the model Learner .

def get_learner(dir_path, model_network, modelData, cuda_id=0):
    if not hasattr(torch._C, '_cuda_setDevice'):
        print('CUDA not available. Setting device=-1.')
        cuda_id = -1
    torch.cuda.set_device(cuda_id)
    if cuda_id == -1:
        map_location = 'cpu'
    else:
        map_location = None

    dir_path = Path(dir_path)

    learner = RNN_Learner(modelData, TextModel(to_gpu(model_network)))
    learner.model.eval  # just to make sure dropout is being applied

    loaded_weights = torch.load(os.path.join(dir_path, "models/fwd_clas_1.h5"), map_location=map_location)
    learner.load("fwd_clas_1")

    # confirmed that the new parameters match those of the loaded model
    for k, v in loaded_weights.items():
        print(k, np.all(v == learner.model.state_dict()[k]))

    return learner

def predict(learner: Learner, X):
    return [softmax(x) for x in learner.predict_dl(create_dl(X))]

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

def get_learner(dir_path,cuda_id):
    
    
    network = classifier_model_network(dir_path, cuda_id)
    learn = get_learner(dir_path, network,
                              ModelData(dir_path, None, None),
                              cuda_id)
    return learn

Now make the prediction with predict method.

# fetch the ids of the text 
def get_tokens(sentence, lang='en'):
    '''
    fetch work tokens for the sentence
    :param text:
    :param lang:
    :return:
    '''
    text = f'\n{BOS} {FLD} 1 ' + sentence
    return Tokenizer(lang=lang).proc_text(text)

def convert2ids(tokens: list, tok2id_model_path):
    itos = pickle.load(open(tok2id_model_path,'rb'))
    stoi = collections.defaultdict(lambda: 0, {v: k for k, v in enumerate(itos)})

    predict_lm = np.array([stoi[p]  for p in tokens])
    return predict_lm
dir_path= ' '  # data directory path of the model built and saved .
lm_model_path= ' ' #file path of itos.pkl file in data_dir where you have tok2id file dumped.  
toks = get_tokens('some text sentence you want to predict')
ids = convert2ids(toks, lm_model_path)

learner=get_learner(dir_path,cuda_id)
predict(learner, (ids))

CrazyTensor · September 13, 2018, 7:19am

What’s inside create_dl function?

sandeeppanem · September 24, 2018, 11:17am

Here is the create_dl function

def create_dl(X):
    bs=64
    tst_ds = TextDataset([X], np.zeros(len(X)))
    tst_dl = DataLoader(tst_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=None)
    return tst_dl

William.Collins · November 8, 2018, 8:51pm

I have a question about the script.
On line 100, why is the first prediction being sent to softmax?
return softmax(numpy_preds**[0]**)[0]

I have a classifier with 300 labels and when I feed a 351 token string to it, my “numpy_preds” is shaped (351, 300). This confuses me since it appears to be returning a prediction for each input token (the outputs of each time step of the lstm?). If this is the case, then I suppose the last of these predictions would be the one to send to softmax:
return softmax(numpy_preds**[-1]**)[0]

Either way, I’m getting nonsensical results for my predictions. My model was reporting .91 accuracy while training, but I can’t figure out how to actually use the model.

My evaluation code:

model.reset()
model.eval()
correct=0
for i in range(inputs.shape[0]):
    tensor = torch.from_numpy(np.array([inputs[i]]))
    preds = model(Variable(tensor))
    preds = preds[0].data.numpy()
    pred = softmax(preds[0])  #also tried -1
    idx = np.argmax(pred)
    if targets[i]==idx:
        correct+=1
print(correct*100/inputs.shape[0])