How can I create a test dataloader to get metrics on test data set??
Complete colab using sentencepiece (Inputs are from multiple columns)
Colab
I faced an issue
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-6e94a0bc6673> in <module>()
----> 1 test_dl = learn.dls.test_dl(test_df, with_labels=True); test_dl.show_batch()
16 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5139 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5140 return self[name]
-> 5141 return object.__getattribute__(self, name)
5142
5143 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'text'
We can understand why the valid transforms are expecting ‘text’ attribute, when you run
learn.dls.valid.tfms
(#2) [Pipeline: ColReader – {‘cols’: ‘text’, ‘pref’: ‘’, ‘suff’: ‘’, ‘label_delim’: None} -> Tokenizer -> Numericalize,Pipeline: ColReader – {‘cols’: ‘label’, ‘pref’: ‘’, ‘suff’: ‘’, ‘label_delim’: None} -> Categorize – {‘vocab’: None, ‘sort’: True, ‘add_na’: False}]
So we may need to use
text_cols = ['split_a', 'split_b']
tok = SubwordTokenizer(cache_dir='tmp', sp_model='tmp/spm.model', vocab_sz=15000)
tokenized_df = tokenize_df(test_df, text_cols=text_cols, tok=tok, tok_text_col='text') #returns a tuple
test_dl = learn.dls.test_dl(tokenized_df[0], with_labels=True)
Check before you run the validate to get the metric on your test data loader
test_dl.show_batch()
Run the validate to get the metric on your test dataloader
learn.validate(dl=test_dl)
(#3) [0.6240191459655762,0.7124999761581421,0.2874999940395355]