How to predict on the test set


(Chris Palmer) #21

Hi @jayashree

Do you actually have a test directory so that the following returns False?

(learn.data.test_dl == None)

(Jayashree Sridhar) #22

Yes I have a test folder inside data folder.


(Chris Palmer) #23

OK, is it called test, or test1? I note that the default test directory on lesson1 is test1.

I also note that the jupyter notebook for lesson 1 does not add the test data. Therefore you added the test data to your model, doing something like this?

data = ImageClassifierData.from_paths(PATH, bs=bs, tfms=tfms, test_name='test1')

And if you issue this command, you don’t get an error?

str(data.trn_ds), str(data.val_ds), str(data.test_ds)

Getting something like this?

('<fastai.dataset.FilesIndexArrayDataset object at 0x0000023C06ABCB70>',
 '<fastai.dataset.FilesIndexArrayDataset object at 0x0000023C06ABCDA0>',
 '<fastai.dataset.FilesIndexArrayDataset object at 0x0000023C06ABCC50>')

(Chris Chung) #24

Thanks @Chris_Palmer

I was getting the same error and adding test_name=‘test_directory_name’ solved the problem for me


(Jayashree Sridhar) #25

Issue solved by adding test_name=‘test1’.Thanks for the help @Chris_Palmer


(Chris Palmer) #26

:smile: The use of a test set confused me at first, as I was acquainted with the idea of train and test sets, where the test set is used to validate the training. But in the fastai library, the test set is what is used to submit to kaggle competitions - hence there are no labels - the result is unseen until you get feedback from submitting to kaggle on the test set.

Instead, fastai uses the name valid (val) to refer to the data set you validate your model against.

Some machine learning processes (e.g. CalibratedClassifierCV in “prefit” mode) recommend yet another hold-out set which is often called the validation set - so even more confusing if you are aware of that concept!


(Daniel Hunter) #27

Hi all —

I’m running into some trouble predicting on my test set. I’m trying to submit to Kaggle for the MNIST comp. Everything up until now has made sense, but when I actually go to use the model, I’m having issues. I created my dataset:

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), test_name='test')

I can run it through the fitting without issue, and then I run the prediction step:

mypreds = learn.predict(is_test=True)
mypreds[:4]

Output:

array([[ -8.34568, -15.68095, -15.91709, -17.87472, -16.13111, -13.19016,  -0.00024, -18.87988, -12.81165,
        -15.99587],
       [-12.95259,  -7.5127 ,  -0.01056,  -6.33938,  -5.02147,  -9.44032,  -8.79326,  -7.38153,  -8.1664 ,
         -7.68042],
       [ -0.00248, -13.80343,  -9.84029, -12.51427, -14.83042, -12.32397,  -9.07121, -11.57375, -12.00525,
         -6.08006],
       [-19.11721,   0.     , -20.84688, -23.33678, -14.91827, -24.31927, -19.93888, -17.24662, -22.26313,
        -17.40252]], dtype=float32)

Checking the shape:

>>> mypreds.shape
(28000, 10)

Checking the actual files:

(fastai) ubuntu@ip-172-31-37-237:~/fastai/courses/dl1/data/mnist$ ls test/ | wc -l
28000

Looks good. However, when I manually check the images, they don’t match the predictions.

>>> preds = np.argmax(mypreds, axis=1)
>>> preds[:10]
array([6, 2, 0, 1, 8, 4, 7, 5, 7, 5])

Compared to:

I also checked other files instead of just img_{1,2,3...}.jpg

(fastai) ubuntu@ip-172-31-37-237:~/fastai/courses/dl1/data/mnist$ ls test | head
img_10000.jpg
img_10001.jpg
img_10002.jpg
img_10003.jpg
img_10004.jpg
img_10005.jpg
img_10006.jpg
img_10007.jpg
img_10008.jpg
img_10009.jpg

These are the first files listed in the directory when I browse via command line, so maybe predict is grabbing those?

I tried that, but they still don’t seem to line up. I’ve checked to see if maybe everything is off by one or something, but it still doesn’t work. I feel like I’m missing something obvious, but I’ve searched for a while on the forums and watched the videos several times through, but I’ve missed how to do this. Any help is welcome!


(chengye liu) #28

Hey Daniel,

I got the same problem - I don’t know how prediction and test data files match to each other. Just wonder if you have got an solution to this? Thx!


(osnat weissberg) #29

get the list of file names in correct order from the data object

validation files:
data.valid_ds.fnames

test files:
data.test_ds.fnames


(Yin Huang) #30

Got a error, when I predict test data without labels which are in a folder, ‘test’.
Have any ideas? Thanks

tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, tfm_y=tfm_y, aug_tfms=augs)
md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True,test_name=‘test’)

learn.predict(is_test=True)


IndexError Traceback (most recent call last)
in ()
----> 1 learn.predict(is_test=True)

~/fastai/fastai/zeroshot/fastai/learner.py in predict(self, is_test)
278 def predict(self, is_test=False):
279 dl = self.data.test_dl if is_test else self.data.val_dl
–> 280 return predict(self.model, dl)
281
282 def predict_with_targs(self, is_test=False):

~/fastai/fastai/zeroshot/fastai/model.py in predict(m, dl)
135
136 def predict(m, dl):
–> 137 preda,_ = predict_with_targs_(m, dl)
138 return to_np(torch.cat(preda))
139

~/fastai/fastai/zeroshot/fastai/model.py in predict_with_targs_(m, dl)
147 if hasattr(m, ‘reset’): m.reset()
148 res = []
–> 149 for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
150 return zip(*res)
151

~/fastai/fastai/zeroshot/fastai/dataloader.py in iter(self)
82 # avoid py3.6 issue where queue is infinite and can result in memory exhaustion
83 for c in chunk_iter(iter(self.batch_sampler), self.num_workers*10):
—> 84 for batch in e.map(self.get_batch, c): yield get_tensor(batch, self.pin_memory)
85

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
–> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
–> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
–> 384 raise self._exception
385 else:
386 return self._result

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/thread.py in run(self)
54
55 try:
—> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)

~/fastai/fastai/zeroshot/fastai/dataloader.py in get_batch(self, indices)
69
70 def get_batch(self, indices):
—> 71 res = self.np_collate([self.dataset[i] for i in indices])
72 if self.transpose: res[0] = res[0].T
73 if self.transpose_y: res[1] = res[1].T

~/fastai/fastai/zeroshot/fastai/dataloader.py in (.0)
69
70 def get_batch(self, indices):
—> 71 res = self.np_collate([self.dataset[i] for i in indices])
72 if self.transpose: res[0] = res[0].T
73 if self.transpose_y: res[1] = res[1].T

~/fastai/fastai/zeroshot/fastai/dataset.py in getitem(self, idx)
160 def getitem(self, idx):
161 x,y = self.get_x(idx),self.get_y(idx)
–> 162 return self.get(self.transform, x, y)
163
164 def len(self): return self.n

~/fastai/fastai/zeroshot/fastai/dataset.py in get(self, tfm, x, y)
165
166 def get(self, tfm, x, y):
–> 167 return (x,y) if tfm is None else tfm(x,y)
168
169 @abstractmethod

~/fastai/fastai/zeroshot/fastai/transforms.py in call(self, im, y)
519 crop_tfm = crop_fn_lu[crop_type](sz, tfm_y, sz_y)
520 self.tfms = tfms + [crop_tfm, normalizer, ChannelOrder(tfm_y)]
–> 521 def call(self, im, y=None): return compose(im, y, self.tfms)
522 def repr(self): return str(self.tfms)
523

~/fastai/fastai/zeroshot/fastai/transforms.py in compose(im, y, fns)
500 for fn in fns:
501 #pdb.set_trace()
–> 502 im, y =fn(im, y)
503 return im if y is None else (im, y)
504

~/fastai/fastai/zeroshot/fastai/transforms.py in call(self, x, y)
174 x,y = ((self.transform(x),y) if self.tfm_y==TfmType.NO
175 else self.transform(x,y) if self.tfm_y in (TfmType.PIXEL, TfmType.CLASS)
–> 176 else self.transform_coord(x,y))
177 return x, y
178

~/fastai/fastai/zeroshot/fastai/transforms.py in transform_coord(self, x, ys)
205 def transform_coord(self, x, ys):
206 yp = partition(ys, 4)
–> 207 y2 = [self.map_y(y,x) for y in yp]
208 x = self.do_transform(x, False)
209 return x, np.concatenate(y2)

~/fastai/fastai/zeroshot/fastai/transforms.py in (.0)
205 def transform_coord(self, x, ys):
206 yp = partition(ys, 4)
–> 207 y2 = [self.map_y(y,x) for y in yp]
208 x = self.do_transform(x, False)
209 return x, np.concatenate(y2)

~/fastai/fastai/zeroshot/fastai/transforms.py in map_y(self, y0, x)
199
200 def map_y(self, y0, x):
–> 201 y = CoordTransform.make_square(y0, x)
202 y_tr = self.do_transform(y, True)
203 return to_bb(y_tr, y)

~/fastai/fastai/zeroshot/fastai/transforms.py in make_square(y, x)
195 y1 = np.zeros((r, c))
196 y = y.astype(np.int)
–> 197 y1[y[0]:y[2], y[1]:y[3]] = 1.
198 return y1
199

IndexError: index 2 is out of bounds for axis 0 with size 1