Can't replicate ULMFit validation predictions

I’ve been trying to do (true/false) classification with ULMFit and I’ve had some trouble working out how to predict on just normal text (not loaded from a dataset). Or rather - I can get it to do something, but the predictions seem wrong.

So I thought I’d take the first row of the validation set and try to get it to predict the same result manually.

Here’s what I’m aiming for:

preds = learn.predict()
preds[0]

array([ 0.88123, -0.47445], dtype=float32)

Here’s my code:

# get the first row of the validation set
print("validation dataset row:", val_dl.dataset.x[0])

# convert it to string tokens
pred_toks = [itos[i] for i in val_dl.dataset.x[0]]

# convert it back to vocab indexes
pred_idxs = [stoi[p] for p in pred_toks]

# print these indexes out
print("converted row:", pred_idxs)

# they appear to be the same, so the prediction should be the same

# get the model
m = learn.model

# set the batch size
m[0].bs = 1

# put into evaluation mode
m.eval()
m.reset()

# convert the (list of) array(s) of indexes to a Pytorch vector
tensor = T([in_ints])

# we need a PyTorch variable
variable = V(tensor)

# pass to the model to make the prediction
result, *_  = m(variable)

result

Which outputs

validation dataset row: [4, 5, 6, 3, 4162, 313, 10, 256, 5251, 2]
converted row: [4, 5, 6, 3, 4162, 313, 10, 256, 5251, 2]


Variable containing:
  7.8798 -10.7048
  8.7254 -11.7704
  6.8029  -8.6208
  8.4807 -11.5354
  8.1717  -9.7377
 12.7380 -12.5275
  6.6641  -9.7890
 16.6947 -17.6726
  6.9700  -8.1962
  6.6253  -7.7002
[torch.cuda.FloatTensor of size 10x2 (GPU 0)]

I’ve seen code getting the maximum score from this result to get the class score. But I can’t see any way that is going to match the 0.88123, -0.47445 that .predict() gave me.

What am I doing wrong?

Don’t forget that examples are getting shuffled by the sampler you specified in your DataLoader.

So first, make sure you are looking at the right example. You can do this by pulling the data from your data loader rather than the underlying dataset.

Interesting. How do I get a single row?

I tried:

val_dl.batch_size = 1
row = next(iter(val_dl))

but that is still returning 48 rows (which I think was what the initial batch size was). I can’t work out what that is returning either.

Once I have that working, is the the right code to do a prediction?

result, *_ = m(variable)

I don’t think you need to change the batch size … calling next(iter(val_dl)) should be sufficient. It’s going to return two things: Your inputs and your targets.

Review the dimensions of your inputs and see if you can understand what they represent. From there, extracting the first example should be straight-forward.

pdb.set_trace() is your friend in this :slight_smile:

Yeah, I’ve got that and I’ve read https://medium.com/@tomgrek/wtf-is-going-on-with-fast-ai-db59741b5da2

rx, ry = next(iter(val_dl))

rx is:

4 1 1 … 1 1 1
5 1 1 … 1 1 1
6 1 1 … 1 1 1
… ⋱ …
506 75 0 … 370 358 18
13 355 231 … 212 7046 105
7734 0 2 … 2 2 2
[torch.cuda.LongTensor of size 160x48 (GPU 0)]

So it looks like 48 is the batch size (which isn’t what I want, but ok…)

The values in it are confusing. The ones after the “506” value look as though they could be encoded vocabularies.

But what is the incrementing numbers in the first column 4,5,6 with rows of 1? ( itos[1] is ‘pad’ which I don’t know if it is significant?)

So each column (not row) is an example (there are 48 in each batch per your batch size hyperparameter).

Thus the first example is: rx[:,0]

What are those numbers? You are correct … they are the corresponding indices in your vocab for the tokens in each example. Padding is applied as needed before the text so you can expect to see a bunch of 1’s depending on the size of your text.

I have no idea what the incrementing numbers are … maybe just part of the output displayed and not values in your matrix perhaps. I dunno.

Ok, got it, thanks!

So now I change how I get the data to this:

# get the first row of the validation set
print("validation dataset row:", rx[:,0])

# convert it to string tokens
pred_toks = [itos[i] for i in rx[:,0]]

# convert it back to vocab indexes
pred_idxs = [stoi[p] for p in pred_toks]

# print these indexes out
print("converted row:", pred_idxs)

The array pred_idxs is the same as the original, so that is good.

Now I do this:

# pass to the model to make the prediction
result, *_  = m(variable)

result

and get:

Variable containing:
  7.8798 -10.7048
  8.7254 -11.7704
  .. truncated...
  5.5648  -7.5580
 18.9520 -19.4077
[torch.cuda.FloatTensor of size 160x2 (GPU 0)]

How do I get from that to:

array([ 0.88123, -0.47445], dtype=float32)

(assuming they are the per-class probabilities)

what is “variable”?

Hard for me to interpret what your results reference, but it looks like class predictions (0 and 1) for a batch of 160 examples. I think the dimensions of variable are wrong, whatever it is … for example, in your first post you show 10 predictions for a single example that happens to contain 10 tokens. What you should see is a single prediction.

Look at the dimensions of rx and make sure variable is the same. I think you may simply have the dimensions mixed up.

My understanding is that this generates predictions against the validation set. preds.shape is (185, 2) which is (num_rows_in_validation_set x number_classes) so I think this is right.

The ultimate aim is to make a prediction on some text, but I figured if I use a row from the validation set then I’ll be able to tell I’m doing it correctly.

So I guess the question is:

If I have an array of correctly encoded text (ie, have done stoi() correctly on each word and have an array of them) how do I make a prediction?

This seems like it should be easier than I’ve found it so I assume I missing something.

variable.shape, rx[:,0].shape gives me(torch.Size([1, 160]), torch.Size([160]))

So you see that they aren’t the same. Assuming your text has 160 tokens in it, you want something like 160x1.

It’s hard for me to tell what you are doing wrong because I can’t see your full source … just the bits and pieces of your code you want to share.

Make sure you understand the dimensions of your batches; what each axis represents. From there, put your numericalized data into a similar format where the batch size dimension (columns) is equal to 1. Pass that into your model and you should be golden.

That code above is literally my code. It’s straight out of the IMDB sentiment classification notebook with a different dataset.

To recap:

# get predictions against the validation set
preds = learn.predict() 
# preds[0] is what I'm trying to replicate. 
# preds[0] is: array([ 0.88123, -0.47445], dtype=float32)

# get the first row of the validation set
rx, ry = next(iter(val_dl))
print("validation dataset row:", rx[:,0])

# convert it to string tokens
pred_toks = [itos[i] for i in rx[:,0]]

# convert it back to vocab indexes
pred_idxs = [stoi[p] for p in pred_toks]

# print these indexes out
print("converted row:", pred_idxs)

# they appear to be the same, so the prediction should be the same

# get the model
m = learn.model

# set the batch size
m[0].bs = 1

# put into evaluation mode
m.eval()
m.reset()

# convert the (list of) array(s) of indexes to a Pytorch vector
tensor = T([pred_idxs])

# we need a PyTorch variable
variable = V(tensor)

# pass to the model to make the prediction
result, *_  = m(variable)

result

Well this sucked.

If anyone is interested, here is how to do it:

rx, ry = next(iter(val_dl))

# convert it to string tokens
pred_toks = [itos[i] for i in rx[:,0]]

# convert it back to vocab indexes
pred_idxs = [stoi[p] for p in pred_toks]

print(rx[:, :1].shape) # is torch.Size([160, 1])

# make an array the same shape
test_input = np.swapaxes(np.array([pred_idxs]), 0, 1)
print(test_input.shape) # is (160, 1)

# these two should be the same

# predict on the original data
print(learn.predict_array(test_input)[0])

# predict on the reconstructed data
print(learn.predict_array(rx[:, :1])[0])

Only saw this thread now (feel free to tag me in future ULMFiT related questions).

Sorry that you’ve found this so hard to use, @nickl. I was meaning to add an evaluation script but didn’t get around to it yet.

Would you consider submitting a PR to add your script to the imdb_scripts repo to make it easier for others to use in the future?

@sebastianruder Thanks, for sure.

What do you want for a PR? Just an example of how to use a pretrained model on text (which was basically what I was trying to do?) Or is the reconstruction of the validation set useful to?

I think the best thing would be to have an example of predicting on a sentence or text. That should make it easier for people to play around with the model on the command line.

Besides that, I think having an example where we evaluate on a separate test set would also be useful. I can also add this once I have time.

I can do the prediction example for sure.

Cool! That’d be awesome! Feel free to submit a PR with an initial version or post here if you want feedback.

Hey @sebastianruder I’ve done a bit of work on this. I wanted to make it as clear and simple as possible.

I think this version is simple and better than the one above.

I do have one question. Is there a way to create a rnn_classifer without all those parameters being required? A lot of them look like they should only be required at train time.

Also, generally, does this code look right to you?

def load_model(self):

    bptt,em_sz,nh,nl = 70,400,1150,3
    dps = np.array([0.4,0.5,0.05,0.3,0.4])*0.5
    num_classes = 2 # this is the number of classes we want to predict
    vs = len(self.itos)

    self.model = get_rnn_classifer(bptt, 20*70, num_classes, vs, emb_sz=em_sz, n_hid=nh, n_layers=nl, pad_token=1,
            layers=[em_sz*3, 50, num_classes], drops=[dps[4], 0.1],
            dropouti=dps[0], wdrop=dps[1], dropoute=dps[2], dropouth=dps[3]) 

    trained_encoder_path = str((self.MODEL_PATH/'lm1_enc.h5').absolute())
    trained_classifier_path = str((self.MODEL_PATH/'clas_2.h5').absolute())

    self.model[0].load_state_dict(torch.load(trained_encoder_path, map_location=lambda storage, loc: storage))
    self.model.load_state_dict(torch.load(trained_classifier_path, map_location=lambda storage, loc: storage))

    self.model.reset()
    self.model.eval()


def predict_text(self, text):

    # prefix text with tokens:
    #   xbos: beginning of sentence
    #   xfld 1: we are using a single field here
    input_str = 'xbos xfld 1 ' + text

    # predictions are done on arrays of input. 
    # We only have a single input, so turn it into a 1x1 array
    texts = [input_str]

    # tokenize using the fastai wrapper around spacy
    tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
    
    # turn into integers for each word
    encoded = [self.stoi[p] for p in tok[0]]
    
    # we want a [x,1] array where x is the number 
    #  of words inputted (including the prefix tokens)
    ary = np.reshape(np.array(encoded),(-1,1))

    # turn this array into a tensor
    tensor = torch.from_numpy(ary) 

    # wrap in a torch Variable       
    variable = Variable(tensor)
    
    # do the predictions
    predictions = self.model(variable)    

    # convert back to numpy
    numpy_preds = predictions[0].data.numpy()

    return(numpy_preds[0])
2 Likes

@nickl I think it woud be better to do it for a batch of texts rather than just a single text.