Hi everyone,
I’d like to implement a Char-LSTM in fastai2.I’m struggling a bit with how to wrap the CharTokenizer into the Tokenizer class for use on dataframes. The code below results in the error Could not do one pass in your dataloader, there is something wrong in it. I’ve tried playing with the text.default settings, but I haven’t gotten it to work yet. Does anyone see where it’s going wrong?
Thanks!
from fastai.text.all import *
class CharTokenizer():
def init(self, lang = ‘en’, special_tokens = None):
self.lang = lang
self.special_tokens = special_tokensdef __call__(self, seq): return (list(s) for s in seq)
tok = CharTokenizer()
txt = [‘testing’, ‘anothertest’]
#test tokenization
first(CharTokenizer()(txt))#Wrap into Tokenizer class
tok = Tokenizer(CharTokenizer, rules=[])#Make test dataframe
df1 = pd.DataFrame({‘test_text’ : [‘Some_text’]*50+[‘Some_other_text’]*50,
‘test_value’ : [‘No’]*50+ [‘Yes’]*50})dls1 = TextDataLoaders.from_df(df1, text_col = ‘test_text’, tok = tok)