ULMFiT: Understanding learn.predict output


#1

I’ve written a test script for testing the classifier trained using train_clas.py (on a custom dataset).

Now I don’t have any OOB test data so I usually just send in the (only one) text string, tokenize it, load the classifier, run predict and evaluate the prediction(s) myself. Some like this:

tst_sent = tok2id(dir_path, tst_tok, max_vocab=30000, min_freq=1)
tst_ds = TextDataset(tst_sent, np.zeros(len(tst_sent)))
tst_dl = DataLoader(tst_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=None)
md = ModelData(dir_path, None, None, tst_dl)
learn = RNN_Learner(md, TextModel(m), opt_fn=opt_fn)

prob = learn.predict(is_test=True)
# output: [[-11.48408  13.15376]] <-- what are these numbers?

pred = np.argmax(prob, axis=1)
# output: [1]

My question is simple: what does learn.predicts() return in the ULMFiT based classifier?


(Nick) #2

These numbers are per-class scores. The higher score is the result the model chooses.

If the classes are class_a, class_b you can get the result with something like this:

classes = ["class_a", "class_b"]
return classes[np.argmax(raw_scores)]

#3

Thanks for your reply. What I had meant to ask was: what function was used to get these scores?


(Nick) #4

It’s a PoolingLinearClassifier. See https://github.com/fastai/fastai/blob/master/fastai/lm_rnn.py#L175 for the code.


#5

Ah, right. So I just softmax it and I’d have a nice distribution in [0,1).


(Gerardo Garcia) #6

Imagine that already have the model and I just need to test a single prediction.
What is the best way to do this?


#7
  1. Tokenize your sentence (such as with the Tokenizer module)
  2. Map the tokens to indexes. The logic will be the same as here.
    … and the rest of it is the same as the code snippet in the question.

(Nick) #8

Use the (newish) script in designed for doing exactly this: https://github.com/fastai/fastai/blob/master/courses/dl2/imdb_scripts/predict_with_classifier.py


(Uttam) #9

For make a single prediction ,

  1. make sure you have the network built.

def classifier_model_network(dir_path, cuda_id=0):
    '''
    :param dir_path:
    :param cuda_id:
    :return:
    '''
    if not hasattr(torch._C, '_cuda_setDevice'):
        print('CUDA not available. Setting device=-1.')
        cuda_id = -1
    torch.cuda.set_device(cuda_id)

    dir_path = Path(dir_path)
    # load vocabulary lookup
    itos = pickle.load(open(dir_path / 'tmp' / 'itos.pkl', 'rb'))
    n_tokens = len(itos)

    dps = np.array([0.4, 0.5, 0.05, 0.3, 0.4]) * dropmult

    m = get_rnn_classifier(bptt, 20 * 70, label_class, n_tokens, emb_sz=em_sz, n_hid=nh, n_layers=nl,
                           pad_token=1,
                           layers=[em_sz * 3, 50, label_class], drops=[dps[4], 0.1],
                           dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3])
    m.eval  # just to make sure dropout is being applied
    return m
  1. Fetch the model Learner .
def get_learner(dir_path, model_network, modelData, cuda_id=0):
    if not hasattr(torch._C, '_cuda_setDevice'):
        print('CUDA not available. Setting device=-1.')
        cuda_id = -1
    torch.cuda.set_device(cuda_id)
    if cuda_id == -1:
        map_location = 'cpu'
    else:
        map_location = None

    dir_path = Path(dir_path)

    learner = RNN_Learner(modelData, TextModel(to_gpu(model_network)))
    learner.model.eval  # just to make sure dropout is being applied

    loaded_weights = torch.load(os.path.join(dir_path, "models/fwd_clas_1.h5"), map_location=map_location)
    learner.load("fwd_clas_1")

    # confirmed that the new parameters match those of the loaded model
    for k, v in loaded_weights.items():
        print(k, np.all(v == learner.model.state_dict()[k]))

    return learner

def predict(learner: Learner, X):
    return [softmax(x) for x in learner.predict_dl(create_dl(X))]

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

def get_learner(dir_path,cuda_id):
    
    
    network = classifier_model_network(dir_path, cuda_id)
    learn = get_learner(dir_path, network,
                              ModelData(dir_path, None, None),
                              cuda_id)
    return learn

  1. Now make the prediction with predict method.
# fetch the ids of the text 
def get_tokens(sentence, lang='en'):
    '''
    fetch work tokens for the sentence
    :param text:
    :param lang:
    :return:
    '''
    text = f'\n{BOS} {FLD} 1 ' + sentence
    return Tokenizer(lang=lang).proc_text(text)

def convert2ids(tokens: list, tok2id_model_path):
    itos = pickle.load(open(tok2id_model_path,'rb'))
    stoi = collections.defaultdict(lambda: 0, {v: k for k, v in enumerate(itos)})

    predict_lm = np.array([stoi[p]  for p in tokens])
    return predict_lm
dir_path= ' '  # data directory path of the model built and saved .
lm_model_path= ' ' #file path of itos.pkl file in data_dir where you have tok2id file dumped.  
toks = get_tokens('some text sentence you want to predict')
ids = convert2ids(toks, lm_model_path)

learner=get_learner(dir_path,cuda_id)
predict(learner, (ids))


#10

What’s inside create_dl function?


(Sandeep Panem) #12

Here is the create_dl function

def create_dl(X):
    bs=64
    tst_ds = TextDataset([X], np.zeros(len(X)))
    tst_dl = DataLoader(tst_ds, bs//2, transpose=True, num_workers=1, pad_idx=1, sampler=None)
    return tst_dl