I am using ULMfit for a regression task. After finetuning, and training the classifier, I want to make predictions on new data (that has been preprocessed the same way as the training data.)
I am trying to figure out which function to use:
I see: learn.predict, learn.predict_array, learn.predict_dl, learn.predict_with_targs.
Seems like most of these works with the dataset used to init RNN_Learner().
However, the size of the predicted array (from learn.predict) is larger than the size of the dataset used above.
Firstly, you need to preprocess your test data like validation set (eg use SortSampler, not SortishSampler).
Secondly, to get predictions on test dataset you need to use is_test=True:
Thanks again @asotov, that got me the right number of predictions.
It looks like the sortsampler re-orders the data. I am guessing I have to sort my labels, too if I want to align my predictions with labels (which is just an ID for me).
Glad to hear about it. @leonyin, I also have the same problem with wrong order of predictions, but I didn’t understand why it is. So, I think you guessed that we need to try predict without Sorting. Thank you!
In my case I see 95.73 accuracy during training, but when I call predict method and check accuracy manually - it gives only 63%! Now I understand the reason is SortSampler) I try to deal with it and write later here.
I did the same and everything looks good! This might be something we want to incorporate into the imdb notebook cc:ing @jeremy. Again thanks for working this through with me @asotov!!