Issue with TextBlock.from_df - dataloaders only accepting one column name

Hi guys,
I’m updating an old model with fastai_v2 codes and I’ having some troubles with the new TextBlock.

Following the tutorial, this piece of code runs just fine:

path2 = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path2/'texts.csv')

imdb_lm = DataBlock(blocks=TextBlock.from_df('text', is_lm=True),

dls = imdb_lm.dataloaders(df, bs=64, seq_len=72)

But if I change the name of the text column to something else, I get an Attribute error when running dls = imdb_lm.dataloaders(df, bs=64, seq_len=72):

df.columns = ['label','blablabla','is_valid']   
    imdb_lm = DataBlock(blocks=TextBlock.from_df('blablabla', is_lm=True),

    dls = imdb_lm.dataloaders(df, bs=64, seq_len=72)

AttributeError                            Traceback (most recent call last)
<ipython-input-170-502263c4ccb3> in <module>()
----> 1 dls = imdb_lm.dataloaders(df, bs=64, seq_len=72)
      2 dls.show_batch(max_n=2)

9 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/ in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'blablabla'

I tried with another dataset and the same thing happened. If the column containing the texts isn’t “text”, the dataloaders method returns an AttributeError.

Anyone know why this happens?


After tokenizing your text will always be in “text” unless res_col_name is overridden. So your get_x should always return “text” while your TextBlock can point to blahblahblah


Ah it worked! Thank you @muellerzr.
I was puzzled because things were easier using the mid-level API ahahah.

May be something to add to the documentation in the text tutorial :wink:

For sure. I’m writing a tutorial for people interested in drug discovery. Every bit of info is valuable :vulcan_salute:t5:

