[Knowledge Base] Adding test dataloader with multiple columns to learner with SentencePiece

How can I create a test dataloader to get metrics on test data set??

Complete colab using sentencepiece (Inputs are from multiple columns)
Colab

I faced an issue

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-6e94a0bc6673> in <module>()
----> 1 test_dl = learn.dls.test_dl(test_df, with_labels=True); test_dl.show_batch()

16 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5139             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5140                 return self[name]
-> 5141             return object.__getattribute__(self, name)
   5142 
   5143     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'text'

We can understand why the valid transforms are expecting ‘text’ attribute, when you run
learn.dls.valid.tfms

(#2) [Pipeline: ColReader – {‘cols’: ‘text’, ‘pref’: ‘’, ‘suff’: ‘’, ‘label_delim’: None} -> Tokenizer -> Numericalize,Pipeline: ColReader – {‘cols’: ‘label’, ‘pref’: ‘’, ‘suff’: ‘’, ‘label_delim’: None} -> Categorize – {‘vocab’: None, ‘sort’: True, ‘add_na’: False}]

So we may need to use

text_cols = ['split_a', 'split_b']
tok = SubwordTokenizer(cache_dir='tmp', sp_model='tmp/spm.model', vocab_sz=15000)
tokenized_df = tokenize_df(test_df, text_cols=text_cols, tok=tok, tok_text_col='text') #returns a tuple
test_dl = learn.dls.test_dl(tokenized_df[0], with_labels=True)

Check before you run the validate to get the metric on your test data loader

test_dl.show_batch()

Run the validate to get the metric on your test dataloader

learn.validate(dl=test_dl)

(#3) [0.6240191459655762,0.7124999761581421,0.2874999940395355]

2 Likes

This will be true of anything with text, you’ll need to tokenize it beforehand to have it work with test_dl :slight_smile: