Passing chunksize
as a parameter to text_data_from_df
results in the following error (not that we would ever need to do that):
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-056e51589e97> in <module>
----> 1 data_lm = text_data_from_df(PATH, train_df=train, valid_df=test, data_func=lm_data, max_vocab=60_000, chunksize=24_000, min_freq=2, txt_cols=['name', 'item_description'], label_cols=['label'])
~/fastai/fastai/text/data.py in text_data_from_df(path, train_df, valid_df, test_df, tokenizer, data_func, vocab, **kwargs)
324 path=Path(path)
325 txt_kwargs, kwargs = extract_kwargs(['max_vocab', 'chunksize', 'min_freq', 'n_labels', 'txt_cols', 'label_cols'], kwargs)
--> 326 train_ds = TextDataset.from_df(path, train_df, tokenizer, 'train', vocab=vocab, **txt_kwargs)
327 datasets = [train_ds, TextDataset.from_df(path, valid_df, tokenizer, 'valid', vocab=train_ds.vocab, **txt_kwargs)]
328 if test_df: datasets.append(TextDataset.from_df(path, test_df, tokenizer, 'test', vocab=train_ds.vocab, **txt_kwargs))
~/fastai/fastai/text/data.py in from_df(cls, folder, df, tokenizer, name, **kwargs)
142 tokenizer = ifnone(tokenizer, Tokenizer())
143 chunksize = 1 if (type(df) == DataFrame) else df.chunksize
--> 144 return cls(folder, tokenizer, df=df, create_mtd=TextMtd.DF, name=name, chunksize=chunksize, **kwargs)
145
146 @classmethod
TypeError: type object got multiple values for keyword argument 'chunksize'
A little research indicated that the error “can happen if you pass a key word argument for which one of the keys is similar (has same string name) to a positional argument.” as given in the 2nd answer in this stackoverflow question. The solution is to “You would have to remove the keyword argument from the kwargs before passing it to the method.” I’m not sure how to do that. I also found this. Just wanted to bring this to your attention.
Thanks.