Bug TextClasDataBunch in v.1.0.28?

hello, i was open my csv file using

data_clas = TextClasDataBunch.from_csv(PATH_DATASET, ‘datatrain-en.csv’, vocab=data_lm.train_ds.vocab, bs=32)

my csv file contains 5 classes, but while my data_clas printed they show like this

TextClasDataBunch;
Train: LabelList
y: CategoryList (39586 items)
[Category 2, Category 2, Category 2, Category 2, Category 2]...
Path: lmdata/id/dataset
x: TextList (39586 items)

why y have same label ???

or that mean top5 record on category 2?
that make me confuse
[class , class , class , class, class]…
that mean [class,class,class, …] ?

while i printing all my data that give me same label every record

for i in range(0,9897):
    print(i , ' ' , data_clas.train_dl.y.get(i))

0   CATEGORY_PROMOTIONS
1   CATEGORY_PROMOTIONS
2   CATEGORY_PROMOTIONS
3   CATEGORY_PROMOTIONS
4   CATEGORY_PROMOTIONS
5   CATEGORY_PROMOTIONS
6   CATEGORY_PROMOTIONS
.
.
.
9897 CATEGORY_PROMOTIONS

Can you load the CSV and print out a head? You don’t need to use FastAI for that just load in the CSV and get a head.

thanks for helping
this my csv

df = pd.DataFrame.from_csv(PATH_DATASET/'datatrain-en-2.csv',header=-1,index_col=False)
df.columns=['label','text']
df.label.value_counts()

CATEGORY_SOCIAL           18442
CATEGORY_PROMOTIONS       18346
CATEGORY_UPDATES          10294
CATEGORY_PERSONAL          1479
CATEGORY_NEED_RESPONSE      923
Name: label, dtype: int64

df.head()
|label|text|
|0|CATEGORY_PERSONAL|file upload tuga revisi fg|
|1|CATEGORY_PERSONAL|file upload tuga pre fg|
|2|CATEGORY_PERSONAL|file upload sisfo tuga|
|3|CATEGORY_PERSONAL|secur alert link googl account|
|4|CATEGORY_PERSONAL|secur alert link googl account|

data_clas = TextClasDataBunch.from_csv(PATH_DATASET, 'datatrain-en-2.csv')
data_clas.train_ds.y.classes
['CATEGORY_PROMOTIONS',
 'CATEGORY_SOCIAL',
 'CATEGORY_UPDATES',
 'CATEGORY_NEED_RESPONSE']

Have you already fine-tuned a language model?

i check the data before fine tuning. and that so strange coz only read as 4 classes instead of 5 classes. and while the x printed, all record on category_promotions.

while fine tuning the LM was working fine, but while fitting the classifier get interrupted on validate process (second loading bar in one epoch). and the error of fitting classifier on my other issue topic

After you load in the data for classifier did you run data_bunch_name.show_batch(). If not could you run that and show the output?

this issue because my data on csv, and while fastai split the data into valid and train. some classes doesn’t exist on valid-set. so transform my data into Imagenet style and load from folder fix this problem