L10 how to build a test set without labels

Celia · May 29, 2018, 4:11pm

Hello everyone,
I followed the code of L10 to train and to fine-tune my own nlp classifier. Everything was going well and I had a low val_loss (~0.2) without overfitting.
Now I want to classify the texts without labels, but I don’t know how.

I followed the same process to build TextDataset for test set, but the problem is that TestDataset requires texts and labels in order to build a dataset. However we don’t have any label for the test set data (That’s why we put it into a test set).

It seems that the function learn.predict does give the result on test set by setting is_test=True.

So how to build the TextDataset for test set without labels, In order to put it into ModelData, and then give the results by calling learn.predict(is_test=True).

Maybe this question is stupid to ask, if I create null labels (or labels = 0) for the whole test set, is there any bad effect to train the learner.

Thanks in advance.

frodesc · July 4, 2018, 12:44pm

Hi Celia,

I´m running with the same issue, did you manage to solve it??

nickl · July 6, 2018, 4:01am

I found this surprisingly hard too.

In the end I avoided using the whole TextDataset code and just did predictions:

input_str = 'xbos xfld 1 ' + text
texts = [input_str]
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))

encoded = [stoi[p] for p in tok[0]]

ary = np.swapaxes(np.array([encoded]), 0, 1)    

myleaner.model.reset() # put into evaluation mode

predictions = myleaner.predict_array(ary)[0][0]

That works fine for me.

Maybe this question is stupid to ask, if I create null labels (or labels = 0) for the whole test set, is there any bad effect to train the learner.

No. You are evaluating, not training.