I was having some troubles doing test sets in v2 so I decided to make a brief tutorial notebook for doing so. This is two-fold, first to show what test sets can be, and second to show that we can now have labelled test sets! My notebook is available here
A detailed walk-through is below:
Create your DataLoader using your test set. In my example notebook I use the ADULTs dataset.
dbunch_test = to_test.databunch(shuffle_train=False) should also be replaced by: dbch_test = to_test.databunch(shuffle_train=False) in order to be consistent with: preds = learn.get_preds(dl=dbch_test.train_dl)
Also, the markdown line should be changed accordingly (really minor typo but just to be consistent):
We can pass in our dbch_test’s dataloader (either train_dl or valid_dl) in the dl argument for both and it will operate on them!
Is there an easy way to use an image test set with labels as well? I am trying to use test_dl() so that the validation transforms are applied to the test set but I cannot figure out a way to include the labels. Currently I am manually extracting them as shown below
# construct the test data loader
test_items = get_image_files(path_to_test_set)
test_dl_ = test_dl(dbunch_val, test_items)
# manually extract the labels for the test set
y_labels = L(map(parent_label,test_items))
_,o2i = uniqueify(y_labels, sort=True, bidir=True)
y = torch.from_numpy(np.array(L(map(o2i.get,y_labels))))
# check the accuracy
preds = learn.get_preds(dl=test_dl_)
That is what I originally tried but the labels are missing. The label transform is removed because a test set usually won’t contain labels. If I add it again then the training not the validation augmentation transforms appear to be applied when I call show_batch().
Ahh yes. Very true. That’s an unlabeled test set. @sgugger is there a way to go about this? I’m trying to look around to see an easy way like how there was for tabular but I didn’t see any. Perhaps add an option to label in test_dl? Or is there a method for doing so with the DataBlock that I’m not quite seeing.
Do you think it’s a good idea to move to v2 right now if I’ve only got to lesson 3 in the fastai course or do you suggest sticking with v1 until v2 is more developed? Are the benefits of v2 obvious to you? Thank you for your effort
I’d say go until lesson 4, when you’re comfortable with how it all works (tabular and images) and then you can start to move over! Especially so you understand as a whole how fastai v1 is working, as v2 is very similar. In terms of benefits, absolutely there are tons of reasons why I prefer the v2 library over v1! (hence why I started the study group, to help others out with migrating )