Inference on text data with ids from a csv

#1

So this is a fairly common problem for example in Kaggle competitions, yet I can’t seem to figure out how to do it even after reading How do I predict a batch of images without labels.

I have a test.csv with two columns: ‘id’ and ‘text’. I have a trained learner ‘learn’, and now I want to do prediction on all my data in the .csv.
There are two problems: 1. I can’t figure out how to predict on the .csv, even if dropping the ‘id’ column and 2. I actually want to keep the ids because I need them for the submission file in Kaggle.

I created my databunch for the training data (without ids) via:

data_train = TextClasDataBunch.from_csv(path, 'train_no_id.csv', text_cols='text', label_cols=['label1', 'label2', 'label3', 'label4', 'label5', 'label6']).

That worked fine for training my learner.

Then I tried following the instructions in the above link resp here: https://docs.fast.ai/data_block.html#LabelLists.add_test
by doing

test_data = TextList.from_csv(path, 'test_no_ids.csv', cols="text")
data_clas.data.add_test(items = test_data)

since the documentation says “Note: Here items can be an ItemList or a collection.” and I have my test data in a csv, not a folder.

But this throws the error

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)
in
----> 1 data_clas.data.add_test(items = test_data)

~/anaconda3/envs/myenv/lib/python3.7/site-packages/fastai/basic_data.py in getattr(self, k)
120 return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)
121
–> 122 def getattr(self,k:int)->Any: return getattr(self.train_dl, k)
123 def setstate(self,data:Any): self.dict.update(data)
124

~/anaconda3/envs/myenv/lib/python3.7/site-packages/fastai/basic_data.py in getattr(self, k)
36
37 def len(self)->int: return len(self.dl)
—> 38 def getattr(self,k:str)->Any: return getattr(self.dl, k)
39 def setstate(self,data:Any): self.dict.update(data)
40

~/anaconda3/envs/myenv/lib/python3.7/site-packages/fastai/basic_data.py in DataLoader___getattr__(dl, k)
18 torch.utils.data.DataLoader.init = intercept_args
19
—> 20 def DataLoader___getattr__(dl, k:str)->Any: return getattr(dl.dataset, k)
21 DataLoader.getattr = DataLoader___getattr__
22

~/anaconda3/envs/myenv/lib/python3.7/site-packages/fastai/data_block.py in getattr(self, k)
641 res = getattr(y, k, None)
642 if res is not None: return res
–> 643 raise AttributeError(k)
644
645 def setstate(self,data:Any): self.dict.update(data)

AttributeError: data

So how can I solve problem 1., & what do I need to keep the ids (problem 2.)?

0 Likes