I am facing an issue while loading the data from a custom csv file. Almost all are xxpad strings. I have created a vocab list from a file and i was trying to use the vocab on another csv file for multi category classification.
The multi category classification csv file is in the following format -
text, label_1, label_2, label_3, label_4, label_5, label_6.
“hello this is cool”, 1, 0, 0, 1, 0, 0
dls_lm is the language model previously trained and loaded.
dls_labels = DataBlock( blocks=(TextBlock.from_df('text',vocab=dls_lm.vocab), MultiCategoryBlock), get_x=ColReader("text"), get_y=ColReader([1,2,3,4,5,6]), splitter=RandomSplitter(0.2) ).dataloaders(train_label, bs=128, seq_len=72)
when I run dls_labels.show_batch() in first row’s first column is a proper string (After tokenizing) from the csv and in the Y column it’s the semi colon seperated values from label_x columns (1 or 0). After that all the rows have repeated xxpad strings and y column value same as first one.
What is I am doing wrong here ? Please help me understand ?