hello, i was open my csv file using
data_clas = TextClasDataBunch.from_csv(PATH_DATASET, ‘datatrain-en.csv’, vocab=data_lm.train_ds.vocab, bs=32)
my csv file contains 5 classes, but while my data_clas printed they show like this
TextClasDataBunch;
Train: LabelList
y: CategoryList (39586 items)
[Category 2, Category 2, Category 2, Category 2, Category 2]...
Path: lmdata/id/dataset
x: TextList (39586 items)
why y have same label ???
or that mean top5 record on category 2?
that make me confuse
[class , class , class , class, class]…
that mean [class,class,class, …] ?
while i printing all my data that give me same label every record
for i in range(0,9897):
print(i , ' ' , data_clas.train_dl.y.get(i))
0 CATEGORY_PROMOTIONS
1 CATEGORY_PROMOTIONS
2 CATEGORY_PROMOTIONS
3 CATEGORY_PROMOTIONS
4 CATEGORY_PROMOTIONS
5 CATEGORY_PROMOTIONS
6 CATEGORY_PROMOTIONS
.
.
.
9897 CATEGORY_PROMOTIONS
shaun1
(Sudarshan)
November 26, 2018, 2:10pm
4
Can you load the CSV and print out a head? You don’t need to use FastAI for that just load in the CSV and get a head.
thanks for helping
this my csv
df = pd.DataFrame.from_csv(PATH_DATASET/'datatrain-en-2.csv',header=-1,index_col=False)
df.columns=['label','text']
df.label.value_counts()
CATEGORY_SOCIAL 18442
CATEGORY_PROMOTIONS 18346
CATEGORY_UPDATES 10294
CATEGORY_PERSONAL 1479
CATEGORY_NEED_RESPONSE 923
Name: label, dtype: int64
df.head()
|label|text|
|0|CATEGORY_PERSONAL|file upload tuga revisi fg|
|1|CATEGORY_PERSONAL|file upload tuga pre fg|
|2|CATEGORY_PERSONAL|file upload sisfo tuga|
|3|CATEGORY_PERSONAL|secur alert link googl account|
|4|CATEGORY_PERSONAL|secur alert link googl account|
data_clas = TextClasDataBunch.from_csv(PATH_DATASET, 'datatrain-en-2.csv')
data_clas.train_ds.y.classes
['CATEGORY_PROMOTIONS',
'CATEGORY_SOCIAL',
'CATEGORY_UPDATES',
'CATEGORY_NEED_RESPONSE']
shaun1
(Sudarshan)
November 26, 2018, 2:17pm
6
Have you already fine-tuned a language model?
i check the data before fine tuning. and that so strange coz only read as 4 classes instead of 5 classes. and while the x printed, all record on category_promotions.
while fine tuning the LM was working fine, but while fitting the classifier get interrupted on validate process (second loading bar in one epoch). and the error of fitting classifier on my other issue topic
shaun1
(Sudarshan)
November 26, 2018, 2:38pm
8
After you load in the data for classifier did you run data_bunch_name.show_batch()
. If not could you run that and show the output?
this issue because my data on csv, and while fastai split the data into valid and train. some classes doesn’t exist on valid-set. so transform my data into Imagenet style and load from folder fix this problem