Getting predictions get_preds() for test sets

I also am seeing the y predictions returning all zeros, in my case using a text learner (in fastai v1.0.48):

preds, y = learn.get_preds(ds_type=DatasetType.Test) print(y)

tensor([0, 0, 0, ..., 0, 0, 0])

Even though taking the argmax reveals that not all the highest preds are at label position 0.

@sgugger Is this a known issue?

That’s not an issue, it’s because the test set in fastai is unlabelled, so the y (your targets) are all set to zero.

6 Likes

I see. That explains it. I thought y was the predicted y, but it’s actually the true y (which we don’t have in a test set). Thanks.

1 Like

Are there plans to change this? For me it always feels kind of hacky to artificially add none labels to the test data loader in order for it to work.

Not in the midterm, no, we have a lot on our plates with the ongoing course right now.

I am facing the same issue, i am getting image class as 0 using learn,get_preds(),however my class range is 1-5.Please suggest solution to incorrect class prediction .

@sariabod-For test test prediction we have to create new data using databunch and instead of valid we have replace Test folder in the argument.then learn.data=new data ,finally we can make prediction on that using your code.This was suggested in this Forum .But Still Its not working .Prediction for test class labels is not supported directly in fastai.

Hi Abhi,
Try this :

predictions, *_ = learner.get_preds(DatasetType.Test)
labels = np.argmax(predictions, 1)
2 Likes

You can also check this thread for additional info.

Welcome to the forums!

I tried your line of code,but i amstill getting labels which are not in my class list also.

@abhi891, you’re only getting zeros, right?
I came across the same problem with tabular.

Give this a try. This worked for me.

preds,_ = learn.get_preds(ds_type=DatasetType.Test)
result = preds.numpy()[:, 0]
1 Like

For my image classification problem with 5 classes below code worked with the help of your link .
preds,_ = learn.get_preds(ds_type=DatasetType.Test)
labels = np.argmax(preds, 1)
test_predictions_direct = [data.classes[int(x)] for x in labels]

Thanks a lot

1 Like

Hi @abhi891,

I downloaded your code from AV competition and ran the above code , but I am stuck at these lines of code.

all_test_preds = []
for i in range(1, 3+1):
learn.load(‘stage-’ + str(i))
learn.model.eval();
probs, y = learn.get_preds(ds_type=DatasetType.Test);
all_test_preds.append(probs.numpy());

final = [data.classes[i] for i in np.argmax(np.mean(all_test_preds, 0), axis=1)]

This is the error am getting.

Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/queues.py”, line 240, in _feed
send_bytes(obj)
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 404, in _send_bytes
self._send(header + buf)
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 368, in _send
n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor
Traceback (most recent call last):
File “/usr/lib/python3.6/multiprocessing/queues.py”, line 230, in _feed
close()
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 177, in close
self._close()
File “/usr/lib/python3.6/multiprocessing/connection.py”, line 361, in _close
_close(self._handle)
OSError: [Errno 9] Bad file descriptor…

Any idea why this is? Does this take lot of time to run??

Thanks.
Tried this, I’m getting correct predictions but the order is somehow different. How do I fix the order of the prediction?

To get predictions in order pass in ordered=True as a parameter to get_preds()

1 Like

I am getting the prediction for validation set when I am using
preds, y = learn.get_preds(DatasetType.Test

Hi there,
I did pass ordered=True to get_preds( ), but it gives me “Unexpected keyword - ordered” error

1 Like

@sgugger

I am using the below for getting predictions on my test set. However, I am getting different results for same data points if take the entire Test Dataset vs I take just the first 1000 examples. Can anyone explain the reason for such behavior or I am missing something?
FYI- I am using FastAI v1

def get_preds_as_nparray(ds_type) -> np.ndarray:
    """
    the get_preds method does not yield the elements in order by default
    we borrow the code from the RNNLearner to resort the elements into their correct order
    """
    preds = learner.get_preds(ds_type)[0].detach().cpu().numpy()
    sampler = [i for i in databunch.dl(ds_type).sampler]
    reverse_sampler = np.argsort(sampler)
    return preds[reverse_sampler, :]

test_preds = get_preds_as_nparray(DatasetType.Test)

Hello, were you able to solve this problem?

Hi, i’m quite new to fastai but i’ve found a simple workaround for predicting on new data, it may be useful for someone.

NB : I’m using ‘Imagenet style’ dataset, (e.g.: ‘data/test/class_to_pred/file.jpg’), so maybe you’ll need to change some parts to fit to your data structure.

import numpy as np
from fastai.data.transforms import get_image_files
from sklearn.metrics import classification_report, confusion_matrix

def get_preds_on_test(learner, test_path, show_results=False):
    fnames = get_image_files(test_path)
    dl = learner.dls.test_dl(fnames)
    classes = learner.dls.vocab
    
    # extracting true target value from path
    trues = [os.path.basename(os.path.dirname(fname)) for fname in fnames]
    # make predictions and extract the class name of the highest probability
    preds = [classes[i] for i in np.argmax(learner.get_preds(dl=dl)[0], 1)]
    
    if show_results:
        print(classification_report(trues, preds))
        print(confusion_matrix(trues, preds))
    return trues, preds

trues, preds = get_preds_on_test(learner, 'path/to/your/test/folder',  True)