TextClassDataBunch.from_df raises StopIteration exception

Hello,

I have started using fastai lib v1.0.38 and I have a problem with loading simple dataframe through from_df() method.
I do not know if it is a bug or my lack of knowledge.

Simple not working example

from fastai import *
from fastai.text import * 
train_df = pd.DataFrame({'label': [ 1,2 ], 'text': ['bb aa cc', 'aa bb cc dd ee ff'] })
print(train_df)

valid_df = pd.DataFrame({'label': [ 1,2] ,'text': ['aa bb', 'bb gg ff'] })
print(valid_df)

path = f'{file_dir}/tmp/'
data_bunch = TextClasDataBunch.from_df(
    path=path,
    train_df=train_df,
    valid_df=valid_df,
    text_cols='text',
    label_cols='label')

When I run this code snippet I receive an error

   label               text
0      1           bb aa cc
1      2  aa bb cc dd ee ff
   label      text
0      1     aa bb
1      2  bb gg ff
Traceback (most recent call last):
  File "fastaitext_load.py", line 63, in <module>
    label_cols='label')
  File "/home/ksirg/.local/share/virtualenvs/szrek-data-7aSRMmoN/lib/python3.6/site-packages/fastai/text/data.py", line 169, in from_df
    return src.databunch(**kwargs)
  File "/home/ksirg/.local/share/virtualenvs/szrek-data-7aSRMmoN/lib/python3.6/site-packages/fastai/data_block.py", line 446, in databunch
    data = self.x._bunch.create(self.train, self.valid, test_ds=self.test, path=path, **kwargs)
  File "/home/ksirg/.local/share/virtualenvs/szrek-data-7aSRMmoN/lib/python3.6/site-packages/fastai/text/data.py", line 221, in create
    return cls(*dataloaders, path=path, collate_fn=collate_fn)
  File "/home/ksirg/.local/share/virtualenvs/szrek-data-7aSRMmoN/lib/python3.6/site-packages/fastai/basic_data.py", line 97, in __init__
    if not no_check: self.sanity_check()
  File "/home/ksirg/.local/share/virtualenvs/szrek-data-7aSRMmoN/lib/python3.6/site-packages/fastai/basic_data.py", line 198, in sanity_check
    idx = next(iter(self.train_dl.batch_sampler))
StopIteration

Could someone confirm an error or help me with proper use of this method.

Add bs=2 and it should work. The default bs (=64 I think) exhausts your data before batch ends.

3 Likes

Thank you.

It is not empasise in documentation enough https://docs.fast.ai/text.data.html#TextDataBunch.from_df

1 Like

Although it is an edge case, it is better if batch size is checked against the number of items at hand. Maybe @sgugger and dev team can comment.

Added a check that should raise a warning if it’s not possible to make one batch.

2 Likes