Unable to DataBlock - 'Series' object has no attribute

kmule · September 5, 2020, 8:12pm

Hi all,
I am following fastai v2 text classification tutorial (ULMFiT approach) using my own dataset. The first language model part went fine, thenI had an issue on loading my multilabel dataset for classification part.

I am currently on fastai 2.0.8. My dataset is a pandas dataframe and looks like this.

I then created a DataBlock object and called datasets I got this AttributeError: 'Series' object has no attribute 'proc_name' error which I cannot figure out.

Before I went into DataBlock approach I have also tried TextDataLoaders using label_delim but it ended up with unhashable type: 'list' error.

The main purpose is to create a dataset with multilabels to feed to the classifier. Any suggestions would be appreciated. Thanks.

ilovescience · September 5, 2020, 9:02pm

Is this a private or public dataset?

kmule · September 5, 2020, 10:36pm

It’s a private one.

kmule · September 6, 2020, 7:56am

I found the issue related to this. The problem is the following line, which takes only takes text as argument unless res_col_name was specified.

get_x=ColReader('proc_name')

mabcat · September 8, 2020, 8:08am

Thanks kmule, I had the same problem and it cost me about an hour.

The example at https://docs.fast.ai/text.data#TextBlock.from_df is unhelpfully misleading. The source text column in the example file is called text, but this has nothing to do with why the argument to get_x is also text.

You can either use res_col_name to change the output column name, or just understand that TextBlock.from_df is going to put the tokenised text into a column called text no matter what your source column is called:

wl_clas = DataBlock(
    blocks=(TextBlock.from_df('why_looking', seq_len=72), CategoryBlock),
    get_x=ColReader('text'),
    get_y=ColReader('outcome'),
    splitter=RandomSplitter()
).dataloaders(df, bs=64)

juanchoalric · May 14, 2021, 10:12pm

thanks!!! its really misleading