I was having some troubles doing test sets in v2 so I decided to make a brief tutorial notebook for doing so. This is two-fold, first to show what test sets can be, and second to show that we can now have labelled test sets! My notebook is available here
A detailed walk-through is below:
Create your DataLoader using your test set. In my example notebook I use the ADULTs dataset.
Thank you @muellerzr for sharing. There is a small typo:
dbuch = to.databunch() should be replaced by: dbch = to.databunch() in order to be consistent with: learn = Learner(dbch, model, CrossEntropyLossFlat(), opt_func=opt_func, metrics=accuracy)
Otherwise the latter line will prompt the following error: NameError: name 'dbch' is not defined
dbunch_test = to_test.databunch(shuffle_train=False) should also be replaced by: dbch_test = to_test.databunch(shuffle_train=False) in order to be consistent with: preds = learn.get_preds(dl=dbch_test.train_dl)
Also, the markdown line should be changed accordingly (really minor typo but just to be consistent):
We can pass in our dbch_test’s dataloader (either train_dl or valid_dl) in the dl argument for both and it will operate on them!
Is there an easy way to use an image test set with labels as well? I am trying to use test_dl() so that the validation transforms are applied to the test set but I cannot figure out a way to include the labels. Currently I am manually extracting them as shown below
# construct the test data loader
test_items = get_image_files(path_to_test_set)
test_dl_ = test_dl(dbunch_val, test_items)
# manually extract the labels for the test set
y_labels = L(map(parent_label,test_items))
_,o2i = uniqueify(y_labels, sort=True, bidir=True)
y = torch.from_numpy(np.array(L(map(o2i.get,y_labels))))
# check the accuracy
preds = learn.get_preds(dl=test_dl_)
accuracy(preds[0],y)
Do you have this working with images, if so what type of data loader do you use, I have tried TfmdDL with many variations of the following without success?
That is what I originally tried but the labels are missing. The label transform is removed because a test set usually won’t contain labels. If I add it again then the training not the validation augmentation transforms appear to be applied when I call show_batch().
Ahh yes. Very true. That’s an unlabeled test set. @sgugger is there a way to go about this? I’m trying to look around to see an easy way like how there was for tabular but I didn’t see any. Perhaps add an option to label in test_dl? Or is there a method for doing so with the DataBlock that I’m not quite seeing.
No, this is just to change the behavior of the transforms (when they are different on the training vs validation set). You can’t add new transforms with this.
Do you think it’s a good idea to move to v2 right now if I’ve only got to lesson 3 in the fastai course or do you suggest sticking with v1 until v2 is more developed? Are the benefits of v2 obvious to you? Thank you for your effort
I’d say go until lesson 4, when you’re comfortable with how it all works (tabular and images) and then you can start to move over! Especially so you understand as a whole how fastai v1 is working, as v2 is very similar. In terms of benefits, absolutely there are tons of reasons why I prefer the v2 library over v1! (hence why I started the study group, to help others out with migrating )
It seems a bit weird to do it this way. You know you can have several validation sets in a DataSource/DataBunch? Just send all the items and a list of three splits instead of two.