How to predict on the test set

chengyeliu · April 3, 2018, 2:23am

Hey Daniel,

I got the same problem - I don’t know how prediction and test data files match to each other. Just wonder if you have got an solution to this? Thx!

osnat · April 11, 2018, 11:01am

get the list of file names in correct order from the data object

validation files:
data.valid_ds.fnames

test files:
data.test_ds.fnames

ChangeBio · April 16, 2018, 1:30pm

Got a error, when I predict test data without labels which are in a folder, ‘test’.
Have any ideas? Thanks

tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, tfm_y=tfm_y, aug_tfms=augs)
md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True,test_name=‘test’)

learn.predict(is_test=True)

IndexError Traceback (most recent call last)
in ()
----> 1 learn.predict(is_test=True)

~/fastai/fastai/zeroshot/fastai/learner.py in predict(self, is_test)
278 def predict(self, is_test=False):
279 dl = self.data.test_dl if is_test else self.data.val_dl
–> 280 return predict(self.model, dl)
281
282 def predict_with_targs(self, is_test=False):

~/fastai/fastai/zeroshot/fastai/model.py in predict(m, dl)
135
136 def predict(m, dl):
–> 137 preda,_ = predict_with_targs_(m, dl)
138 return to_np(torch.cat(preda))
139

~/fastai/fastai/zeroshot/fastai/model.py in predict_with_targs_(m, dl)
147 if hasattr(m, ‘reset’): m.reset()
148 res = []
–> 149 for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
150 return zip(*res)
151

~/fastai/fastai/zeroshot/fastai/dataloader.py in iter(self)
82 # avoid py3.6 issue where queue is infinite and can result in memory exhaustion
83 for c in chunk_iter(iter(self.batch_sampler), self.num_workers*10):
—> 84 for batch in e.map(self.get_batch, c): yield get_tensor(batch, self.pin_memory)
85

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
–> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
–> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
–> 384 raise self._exception
385 else:
386 return self._result

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/thread.py in run(self)
54
55 try:
—> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)

~/fastai/fastai/zeroshot/fastai/dataloader.py in get_batch(self, indices)
69
70 def get_batch(self, indices):
—> 71 res = self.np_collate([self.dataset[i] for i in indices])
72 if self.transpose: res[0] = res[0].T
73 if self.transpose_y: res[1] = res[1].T

~/fastai/fastai/zeroshot/fastai/dataloader.py in (.0)
69
70 def get_batch(self, indices):
—> 71 res = self.np_collate([self.dataset[i] for i in indices])
72 if self.transpose: res[0] = res[0].T
73 if self.transpose_y: res[1] = res[1].T

~/fastai/fastai/zeroshot/fastai/dataset.py in getitem(self, idx)
160 def getitem(self, idx):
161 x,y = self.get_x(idx),self.get_y(idx)
–> 162 return self.get(self.transform, x, y)
163
164 def len(self): return self.n

~/fastai/fastai/zeroshot/fastai/dataset.py in get(self, tfm, x, y)
165
166 def get(self, tfm, x, y):
–> 167 return (x,y) if tfm is None else tfm(x,y)
168
169 @abstractmethod

~/fastai/fastai/zeroshot/fastai/transforms.py in call(self, im, y)
519 crop_tfm = crop_fn_lu[crop_type](sz, tfm_y, sz_y)
520 self.tfms = tfms + [crop_tfm, normalizer, ChannelOrder(tfm_y)]
–> 521 def call(self, im, y=None): return compose(im, y, self.tfms)
522 def repr(self): return str(self.tfms)
523

~/fastai/fastai/zeroshot/fastai/transforms.py in compose(im, y, fns)
500 for fn in fns:
501 #pdb.set_trace()
–> 502 im, y =fn(im, y)
503 return im if y is None else (im, y)
504

~/fastai/fastai/zeroshot/fastai/transforms.py in call(self, x, y)
174 x,y = ((self.transform(x),y) if self.tfm_y==TfmType.NO
175 else self.transform(x,y) if self.tfm_y in (TfmType.PIXEL, TfmType.CLASS)
–> 176 else self.transform_coord(x,y))
177 return x, y
178

~/fastai/fastai/zeroshot/fastai/transforms.py in transform_coord(self, x, ys)
205 def transform_coord(self, x, ys):
206 yp = partition(ys, 4)
–> 207 y2 = [self.map_y(y,x) for y in yp]
208 x = self.do_transform(x, False)
209 return x, np.concatenate(y2)

~/fastai/fastai/zeroshot/fastai/transforms.py in (.0)
205 def transform_coord(self, x, ys):
206 yp = partition(ys, 4)
–> 207 y2 = [self.map_y(y,x) for y in yp]
208 x = self.do_transform(x, False)
209 return x, np.concatenate(y2)

~/fastai/fastai/zeroshot/fastai/transforms.py in map_y(self, y0, x)
199
200 def map_y(self, y0, x):
–> 201 y = CoordTransform.make_square(y0, x)
202 y_tr = self.do_transform(y, True)
203 return to_bb(y_tr, y)

~/fastai/fastai/zeroshot/fastai/transforms.py in make_square(y, x)
195 y1 = np.zeros((r, c))
196 y = y.astype(np.int)
–> 197 y1[y[0]:y[2], y[1]:y[3]] = 1.
198 return y1
199

IndexError: index 2 is out of bounds for axis 0 with size 1

MTAU · May 4, 2018, 1:26am

There’s no documentation for the fastai package, so we have to dig into the code.

looking at the lesson1 code, where prediction is done from the training data
log_preds = learn.predict()

position cursor inside the brackets, use the SHIFT-TAB-TAB (shift and hold, TAB twice) brings up a screen showing the source file.

then jumping to the source code in github

github.com

fastai/fastai/blob/master/fastai/learner.py

from .imports import *
from .torch_imports import *
from .core import *
from .transforms import *
from .model import *
from .dataset import *
from .sgdr import *
from .layer_optimizer import *
from .layers import *
from .metrics import *
from .losses import *
from .swa import *
from .fp16 import *
from .lsuv_initializer import apply_lsuv_init
import time


class Learner():
    def __init__(self, data, models, opt_fn=None, tmp_name='tmp', models_name='models', metrics=None, clip=None, crit=None):
        """

This file has been truncated. show original

def predict(self, is_test=False, use_swa=False):
dl = self.data.test_dl if is_test else self.data.val_dl
m = self.swa_model if use_swa else self.model
return predict(m, dl)

if is_test=True then the above will use self.data.test_dl
otherwise uses self.data.val_dl

the if syntax used is explained here
https://docs.python.org/3/reference/expressions.html#conditional-expressions

recall the test data was set by the directory structure when loading data.
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), test_name=“test”)

refer

github.com

fastai/fastai/blob/master/fastai/dataset.py

import csv

from .imports import *
from .torch_imports import *
from .core import *
from .transforms import *
from .layer_optimizer import *
from .dataloader import DataLoader

def get_cv_idxs(n, cv_idx=0, val_pct=0.2, seed=42):
    """ Get a list of index values for Validation set from a dataset
    
    Arguments:
        n : int, Total number of elements in the data set.
        cv_idx : int, starting index [idx_start = cv_idx*int(val_pct*n)] 
        val_pct : (int, float), validation set percentage 
        seed : seed value for RandomState
        
    Returns:
        list of indexes

This file has been truncated. show original

def from_paths(cls, path, bs=64, tfms=(None,None), trn_name=‘train’, val_name=‘valid’, test_name=None, test_with_labels=False, num_workers=8):

[figured I’d copy my notes here as this three me initially. others might find it useful]

jonathan.spiller · May 10, 2018, 11:42am

Hi Guys

I may be late to the party, but hopefully someone will find this useful.

In order to test our trained ANN on the unseen ‘test’ data, the following changes / steps need to be undertaken:

1. Add the ‘test_name’ parameter:

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), test_name=‘test1’)

Add 'is_test=True’

log_testpreds = learn.predict(is_test=True)

Get probabilities

testprobs = np.exp(log_testpreds[:,1])
testprobs

Yields:
array([0.44364, 0.79223, 0.94982, 0.99699, 0.84098, 0.03484, 0.88222 …

Notes:

My test data included images named: 0001.jpeg, 0002.jpeg, 0003.jpeg, … etc. With the 0001-0030 being dolphins, and 0031 - 0060 being sharks (Sharks and Dolphins as opposed to Dogs and Cats).

I noticed that my probabilities didn’t at all match my images in the test set. My results looked like complete guesswork; no better than 50% accuracy. A shark-fin and dolphin-steak salad! This seemed strange, since my validation accuracy was 93% . So I created a test set containing 1 image. It was correctly classified. I then tested on another single image. Also correct. I repeated this a few times and always got the correct classification. So I did some examination:

test_files = os.listdir(f’{PATH}/test1/’)[:]
test_files

Yields a random order of images.
[‘0001.jpeg’,
‘0037.jpeg’,
‘0030.jpeg’,
‘0040.jpeg’,
‘0027.jpeg’,
‘0043.jpeg’,
‘0025.jpeg’ …

I checked on this and found the following: os.listdir order in python

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries ‘.’ and ‘…’ even if they are present in the directory.

So I matched the correct image up to the correct probability (in excel):

0001.jpeg - 0.44364
0002.jpeg - 0.03068
0003.jpeg - 0.00579
…
0037.jpeg - 0.79223

And got the expected accuracy of 93%

stephenjohnson · May 12, 2018, 10:32pm

I’m a newbie and been taking the course online and was wondering the same thing but it looks like you need to use data.classes to look up the actual class being predicted, so assuming you’ve named your ImageClassifierData “data” and done the below steps:

log_preds = learn.predict(is_test=True)
preds = np.argmax(log_preds, axis=1)

Then, you can use the following type of loop to get the filenames, predicted classes and percentage prediction.

itemIndex = 0
for maxIndex in preds:
    print((data.test_ds.fnames[itemIndex], 
             data.classes[maxIndex], 
             np.exp(log_preds[itemIndex][maxIndex])))
itemIndex = itemIndex + 1

Hope this helps.`

kindlychung · June 7, 2018, 9:01pm

I just want to point out that the test dataset has wrong labels:

log_preds, test_labels = learn.TTA(is_test=True)
np.mean(test_labels) # Mean is 0, labels are wrong!!!!! The test images are not all cats.

dmangla3 · June 9, 2018, 7:14am

learn.TTA returns log_pred and labels if is_test =False else it gives {zeroes} for labels. You can check this by pressing SHIFT+{TAB TAB}.

If you want to check mean of test according to what your classifier predicted. You should use:
preds = np.argmax(log_preds, 1)
preds.mean()

kindlychung · June 9, 2018, 5:29pm

Does that make sense? Why not return labels for the test set? It is inconsistent and renders the test set useless.

dmangla3 · June 10, 2018, 12:21pm

You are getting confused between labels and predictions. Labels are what they actually are while predictions are what we predict them to be. Labels are not provided for test dataset.

kindlychung · June 10, 2018, 12:48pm

No, I am not confused at all. Labels are the truth, predictions are what the model thinks is the truth.

I am just pointing out that making predictions on the test dataset without being able to find out the accuracy doesn’t make sense and returning a list of all zeros is misleading and wasteful.

stephenjohnson · June 18, 2018, 12:36am

If your “Test” data set doesn’t have labels then all that can be returned is the predictions. And contrary to your statement, it does make sense to do this, for example, if you enter a Kaggle competition you will get a “Test” set without labels on which you will need to make predictions for submission.

kindlychung · June 18, 2018, 8:59pm

Well, you are following a course, not in a kaggle competition here, right?

If the intention is to (unnecessarily) hide the labels, then at least return None instead of all zeros, because that’s wrong and misleading.

mchmutov · June 20, 2018, 4:32pm

This issue appears to have been fixed in the later versions of the fast.ai library, but for completeness sake, I wanted to mention here what it was. The test set internally used to get labels [0]. On the other hand, it still tried to treat these as the x and y coordinates as instructed previously in the notebook; hence it was missing a coordinate. In the new versions of the library, the labels in the test set have the same number of (zero) entries as the number of outputs.

poseidon · June 28, 2018, 10:38am

Hello Jonathan, thanks for the clear and in-order steps, could you tell me how did you manage to match the image with its correct probability in detail? Did you print the os.listdir(path) and the probabilities and then arrange them is excel?

jonathan.spiller · July 1, 2018, 12:39pm

@poseidon Yep, simple as that. I printed os.listdir(path), copied and pasted into Excel, then copied and pasted the output probabilities (and transposed them in excel).

At some stage, I will look at modifying:

data = ImageClassifierData.from_paths(PATH, tfms=tfms)

So that the data returned can sort the os.listdir(path) files. Then both the file list and associated probabilities will already be ordered.

poseidon · July 2, 2018, 4:44am

nice

Omar · July 3, 2018, 7:32pm

regards dogsbreed ,i can’t submit my file to kaggle so could you please someone provide the code needed.

Loob · August 8, 2018, 1:32pm

You have gotten this error for a test set on bounding boxes, right? I have the same error and was wondering if you managed to solve it. Help is much appreciated

kachun1017 · September 4, 2018, 3:10am

Hi guys, I am having a problem of converting 2 unit output into 1 probability output.

How should I convert a cat probability and dog probability into a 0-1 probability?

I have no clue. Thanks!