Should a test set be used when comparing results between different models trained on self-prepared dataset?

Hi everyone,

I’m in a middle of writing my thesis on a sentiment classification in Polish. I’ve prepared a custom dataset consisting of 27202 reviews, among which 3085 are negative, 9436 are neutral and 6971 are positive, from which a set of all negative reviews and a subset of 3085 out of 6971 positive reviews is used to train a binary classifier.

I am, however, confused about the need of using a validation set and a test set in order to compare the performance between different models, varying between the size of the token vocabulary used, max_lr chosen, etc. On the one hand, use of a test set should indicate whether a model has overfitted to the validation set or not. On the other hand, my test set is randomly generated and comes from the same distribution as the training and validation set.

Also, while I’ve figured out how to use test set as validation set to get the measure of the model’s accuracy on it according to fast.ai’s docs, I cannot figure out how to use a TextClassificationInterpretation, for instance, to see show_top_losses() results for the test set - TextClassificationInterpretation always shows results for the old validation set, even if I do:

learn_c.data.valid_dl = data_test.valid_dl

I’d appreciate any of your feedback on this.