I am receiving the following error when trying to set up data loaders from the toxic comments dataset using a dataframe that I setup from the .csv file.
AttributeError: ‘list’ object has no attribute ‘truncate’
def get_y(r): return r[["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]]
block = DataBlock(
blocks=(TextBlock.from_df("comment_text", is_lm=True), MultiCategoryBlock),
get_x=ColReader("text"),
get_y=get_y,
splitter=RandomSplitter(.1)
)
dls = block.dataloaders(train, bs=64)
dls.show_batch()
This error comes from the show_batch method. I have also tried using ColReader for get_y, because I suspect that the way I’m retrieving the dependent variable is the problem.
ColReader(["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"])
This is how I import my data into a dataframe :
And then I do a train-test-split :
from sklearn.model_selection import train_test_split
train, val = train_test_split(df, test_size=0.05)
train.shape, val.shape
Any idea how I can change my DataBlock to avoid this error? What am I missing?