I’m trying to make predictions to a test set for a text classification problem but when I call
x, y = next(iter(learn.data.test_dl)) I find that the data is shuffled (the sentences in the first batch don’t correspond to the sentences in the top of the csv file).
This is how I’m creating the learner:
test = TextList.from_csv(path, 'test.csv', cols='text') data = (TextList.from_df(train, path, cols='text') .random_split_by_pct(0.2) .label_from_df(cols=2) .add_test(test) .databunch(path='.')) learn = text_classifier_learner(data, drop_mult=0.5)
Looking to the source code, if I’m not missing anything, I see that the
SortSampler is applied to valid and test sets, sorting the data by length.
So my question is how to disable the sorting on the test set or how to recover the sort indices to sort back the predictions to the original order in the csv. I guess it should exist an easy way of doing it that I’m missing.
Thanks in advance!