I’ve been training a classification model using FastAI2’s ULMFiT model. Into the trainer, I added an argument to report the accuracy, precision, recall and fbeta scores, like this:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, pretrained = True, metrics=[accuracy, Precision(), Recall(), FBeta(beta=1)]).to_fp16()
learn.load_encoder('finetuned_lm')
As I look at the performance of each epoch during the training, it reports precision scores of 85’ish percent and recall of 95’ish, which is suppose are metrics on the validation set, not the training set. However, if I then run my model on my test set, I’m only reaching a precision of 17.4%, but the recall is much more closer to the one during training, with a 98%.
Does anyone know what might be the case here? Both the training and test set are preprocessed exactly the same. For reference, a bigger code snippet:
df = pd.read_csv("Aggregated_Dataset_KEB_03-01_sent(corr).csv")
df = df.dropna()
df = df.reset_index(drop=True)
df = df.drop(["Unnamed: 0"], axis=1)
df['Class'] = df['Class'].astype(int)
temp_df, df_test = train_test_split(df[["Filename","Class", "Sentence"]], stratify = df['Class'], test_size = 0.2, random_state = 314)
### some code to rebalance the training classes to a 3:1 ratio ###
### training of the LM ###
df_trn["Class"].value_counts()
>> 0 2023
>> 1 697
>> Name: Class, dtype: int64
blocks = (TextBlock.from_df('Sentence', seq_len=dls_lm.seq_len, vocab=dls_lm.vocab), CategoryBlock())
dls = DataBlock(blocks=blocks,
get_x=ColReader('text'),
get_y=ColReader('Class'),
splitter=RandomSplitter(0.2))
dls = dls.dataloaders(df_trn, bs=64)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, pretrained = True, metrics=[accuracy, Precision(), Recall(), FBeta(beta=1)]).to_fp16()
learn.load_encoder('finetuned_lm')
learn.fit_one_cycle(1, 1e-2)
>> epoch train_loss valid_loss accuracy precision_score recall_score fbeta_score time
>> 0 0.513772 0.376785 0.897059 0.818841 0.784722 0.801418 00:03
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-3/(2.6**4),1e-2))
>> epoch train_loss valid_loss accuracy precision_score recall_score fbeta_score time
>> 0 0.420189 0.289749 0.897059 0.752874 0.909722 0.823899 00:03
learn.freeze_to(-3)
learn.fit_one_cycle(1, slice(5e-3/(2.6**4),1e-2))
>> epoch train_loss valid_loss accuracy precision_score recall_score fbeta_score time
>> 0 0.323645 0.150523 0.943015 0.879195 0.909722 0.894198 00:04
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),3e-3))
>> epoch train_loss valid_loss accuracy precision_score recall_score fbeta_score time
>> 0 0.215019 0.131739 0.944853 0.851852 0.958333 0.901961 00:04
>> 1 0.172947 0.136240 0.944853 0.847561 0.965278 0.902597 00:04
learn.fit_one_cycle(5, slice(1e-3/(2.6**4),3e-3))
>> epoch train_loss valid_loss accuracy precision_score recall_score fbeta_score time
>> 0 0.115063 0.125721 0.957721 0.885350 0.965278 0.923588 00:04
>> 1 0.110957 0.155260 0.943015 0.846626 0.958333 0.899023 00:04
>> 2 0.090381 0.121803 0.959559 0.896104 0.958333 0.926174 00:04
>> 3 0.069215 0.123623 0.959559 0.891026 0.965278 0.926667 00:04
>> 4 0.056123 0.135880 0.952206 0.868750 0.965278 0.914474 00:04
dl = learn.dls.test_dl(df_test['Sentence'])
preds, targets = learn.get_preds(dl=dl)
df_test["Preds"] = np.argmax(preds, axis =1)
FP = 0
FN = 0
TP = 0
TN = 0
for index, row in df_test.iterrows():
if row.Class == row.Preds:
if row.Class == 1:
TP +=1
else:
TN +=1
else:
if row.Class == 1:
FN +=1
else:
FP +=1
print(FP)
print(FN)
print(TP)
print(TN)
>> 810
>> 3
>> 171
>> 16375
print("recall: ", TP / (TP + FN))
print("precision: ", TP / (TP + FP))
>> recall: 0.9827586206896551
>> precision: 0.1743119266055046**strong text**
Any help with this is much appreciated! Thanks in advance.