Text classifier doesn't return any validation metrics?

niclow · January 31, 2019, 1:31am

I’m trying to train a model following along with the lesson3-imdb script. I’ve been able to get everything working, to the point where I go to train the classifier - but it doesn’t report any metrics related to the validation set.

data_clas = (TextList.from_csv('.', 'class_descriptions.csv', cols = 'text', vocab = data_lm.vocab )
            .split_by_idx(df['val_idx'])
            .label_from_df(cols = 'job_class')
            .databunch(bs = 128))
learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_encoder')
learn.freeze()
learn.fit_one_cycle(4, 5e-2, moms=(0.8,0.7))

Total time: 03:02

epoch	train_loss	valid_loss	accuracy
1	3.755421
2	3.576474
3	3.510875
4	3.017980

I’m new to both deep learning and python, so I’m not sure where to even start diagnosing this… did the validation set not get formed properly, and metric calculations fail silently during training? Did I specify (or not specify) a parameter incorrectly somewhere? Any help or insight is greatly appreciated!

niclow · January 31, 2019, 1:39am

Update:

data_clas.valid_ds

returns

LabelList y: CategoryList (0 items) []... Path: . x: TextList (0 items) []... Path: .

So the issue is somewhere in how I specified the indexes to create the validation set…

niclow · January 31, 2019, 5:37pm

Figured it out - I don’t think it liked having the validation indexes sitting separate from the csv with the text and labels. This worked fine.

data_clas = (TextList.from_df(df, cols = 'text', vocab = data_lm.vocab )
            .split_from_df(col = 'val_idx')
            .label_from_df(cols = 'job_class')
            .databunch(bs = 128))

I’ll leave this here for posterity in case anyone else was confused for the same reasons