Making predictions in v1

I’m talking about predicting more than one item, using a loaded Learner.
Currently, data is empty (loaded with load_empty()) as in the tutorials.
All the examples and tutorials talk about predicting one item only, is there any way to get predictions on multiple objects without calling .predict() n times or implementing get_preds() on my own?
Thanks :slight_smile:

The message I replied to is:

For now there is no script for this, but you can apply your model to a new object by calling learn.model(x) (just make sure x is on a the same device as the model).

1 Like

I don’t understand your question then. Is Learner.get_preds not working?

Both Learner.get_preds and basic_train.get_preds expect to get a DatasetType (train, validation or test) or DataLoader and it works with losses etc. .
I’m not in the training phase, I’m after training and I no longer have a Learner with train, validation and test datasets. I saved the model to the disk.
Now I’m loading the model from scratch, as described in the tutorials, and I have only 1 DataFrame:

df = MY_DATAFRAME_WAITING_FOR_PREDICTIONS
data = TabularDataBunch.load_empty(base_file_path)
clf = tabular_learner(data, emb_szs=emb_szs, ...)
clf.load('best')

How do I proceed from here to predict the rows in df?
I can’t create a DataBunch as I don’t have train, valid or test data. I don’t have any labels in this stage too so all the label_from methods won’t work either.
The only two ways I found to work are:

  1. call Learner.predict() for each row in df
  2. Duplicate the code in get_preds, remove all the validation code and all loss related code.

Am I missing something?

2 Likes

But why don’t you put all your data in an TabularItemList.from_df(...), no splitting and a constant label like the ones you had on your training, then create a dataloader from this and put it inside your empty data.test_dl?
That way you’ll be able to call learn.get_preds

Will try that, thanks.
Just sharing my opinion, It still feels like a workaround.
Creating fake labels means I need to somewhere save how my labels looked like, and then create fake ones according to that, when in reality I don’t have a reason to create fake labels as I’m not training (Learn.predict() works without labels).
In addition, changing the empty databunch’s test_dl to a new one after it was already created also feels weird, but there is no option to do it otherwise as when loading the model it requires a DataBunch while I have no data to evaluate yet.

If you are interested in changing it, and/or have a possible plan to change this behaviour for the better, I can assist with the PR :slight_smile:

5 Likes

I believe that it is very necessary to run prediction on multiple inputs rather than just one. I’m facing the same problem (on computer vision) and found several similar topics.

https://forums.fast.ai/t/inference-on-test-images/29954
https://forums.fast.ai/t/is-there-a-way-to-get-predictions-against-future-test-datasets-not-available-during-training/30340
https://forums.fast.ai/t/how-to-add-a-test-set-to-an-existing-databunch/30410

Hope we will get a straight forward API for that rather than workaround for fastai :smiley:

3 Likes

@sgugger
Tried your suggestion, can’t get this to work as well.
There is some private member called codes appearing half magically after applying the TabularProcessor, which I can’t understand when and why is it applied.
This causes codes to be not defined on the TabularList when get() is called.

This is my code:

constant_label = np.zeros(clf.data.c) if is_multilabel else 1
data = TabularList.from_df(x_test, cat_vars, cont_vars).label_const(constant_label)
data_loader = DataLoader(data, batch_size=64, num_workers=defaults.cpus) # have to duplicate code from DataBunch.create because of the workaround
clf.data.test_dl = data_loader
res = clf.get_preds(DatasetType.Test)

Stacktrace:

AttributeError: Traceback (most recent call last):                           
  File "/home/ronyl/project/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ronyl/project/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ronyl/project/venv/lib/python3.6/site-packages/fastai/data_block.py", line 480, in __getitem__
    if self.item is None: x,y = self.x[idxs],self.y[idxs]
  File "/home/ronyl/project/venv/lib/python3.6/site-packages/fastai/data_block.py", line 92, in __getitem__
    if isinstance(try_int(idxs), int): return self.get(idxs)
  File "/home/ronyl/project/venv/lib/python3.6/site-packages/fastai/tabular/data.py", line 120, in get
    codes = [] if self.codes is None else self.codes[o]
AttributeError: 'TabularList' object has no attribute 'codes'

I would just add that I have the same issue. Used the follow code as a workaround:

preds=[]
for i in range(len(df_test)):
    preds.append(learn.predict(df_test.iloc[i]))

Would be very nice if there was an easy way to make predictions for a Dataframe in one call.

No, you need to load your empty DataBunch so that it has the state of the processors, otherwise it won’t work.
Then plug in the test_dl like you did.

How should I do that?
What do you mean load the DataBunch so that it has the state of the processors?
It’s really unclear, the process I talked about is in the TabluarList level and not the bunch’s level.

If you have a working example for that workaround, I think it will benefit all of us :slight_smile:

Use the tutorial on inference mode to know how to save the inner state of your DataBunch (from your training set) and then load it again with no data. Than add to that object your test_dl before running get_preds and you should be good.

@sgugger
I was using it all along, it doesn’t work the way you say it should work.
My DataBunch has the processors, I even use the same DataBunch I trained with (without saving and loading, right after training) and it doesn’t work.
The problem, from what I can see, is that the TabularList object I create, as I said, has the get() method which uses the self.codes member (line 120 @ tabular/data.py).
This member is never set because processor.process() is never called for the test TabularList I create, if I’m not wrong.

I created a sample code we can work with that illustrates the problem, here it goes:

import numpy as np
import pandas as pd
from fastai import Learner, DatasetType, DataLoader, defaults
from fastai.tabular import *
from fastai.metrics import *
from fastai.callbacks import EarlyStoppingCallback, SaveModelCallback

x = np.random.rand(1000,10)
y = np.random.rand(1000, 4).argmax(1)
columns = ['cont{}'.format(i) for i in range(10)]
x_df = pd.DataFrame(x, columns=columns)

# let's train a classifier
data :TabularDataBunch = (TabularList.from_df(x_df, cat_names=[], cont_names=columns)
                  .random_split_by_pct(valid_pct=0.2)
                  .label_from_list(y)
                .databunch())
clf = tabular_learner(data=data, emb_drop=0, layers=[50], ps=[0.4], use_bn=True, metrics=[accuracy])
clf.fit_one_cycle(cyc_len=25, max_lr=0.1, div_factor=25)

# now let's get predictions for multiple rows
x_test = x_df.copy()
constant_label = 1
data = TabularList.from_df(x_test, [], columns).label_const(constant_label)
test_data_loader = DataLoader(data, batch_size=64, num_workers=defaults.cpus)
clf.data.test_dl = test_data_loader

res = clf.get_preds(DatasetType.Test) # same attribute error with self.codes here

Thanks.

Not for Tabular dataset but this new post from Jeremy might solve your problem ? https://forums.fast.ai/t/how-do-i-predict-a-batch-of-images-without-labels/32185/24?u=dhoa

Unfortunately can’t access this topic for some reason

Oh it is because the post is in Part1_V3 group and you don’t have access to. Sorry

I quote it here:

I haven’t played with tabular module yet in fastaiV1 but I guess it has similar function as add_test_df ?

It’s going to be add_test in this case, where you can put your TabularList.from_df(...) for the test dataframe.

Edit: the inference tutorial had been update to show this.

In the case of a tabular learner on a regression problem with a test set added during data bunch creation, what are the values in the tuples returned? The code says:

    "Tuple of predictions and targets, and optional losses (if `loss_func`) using `dl`, max batches `n_batch`."

So, using Rosmann as an example, what are the tensors returned by get_preds

preds = learn.get_preds(ds_type=DatasetType.Test)
preds

[tensor([[ 8.3814],
         [ 8.9268],
         [ 9.1673],
         ...,
         [ 8.7692],
         [10.0370],
         [ 8.8672]]),
 tensor([2.1998, 2.1998, 2.1998,  ..., 2.1998, 2.1998, 2.1998])]

The first tensor contains your predictions, the second your targets (dummy targets since you’re using the test set, which is why you have the same thing all the time).

1 Like

TabularList does not have the load_empty method. Don’t know how to add_test the data without creating the whole dataset from scratch.

You’re not supposed to use TabularList but LabelLists for load_empty as shown in the tutorial.