Predictions for the test set, are they correct?

Hi, I apologize in advance if these questions are already asked elsewhere. I’m using the DataBlock api to get 3 columns of my dataset together with mark_fields=True, and then do boolean classification. Two questions:

  • I found a way to make this work (predict sthg), but feels like a hack. Is there a better way to create the data with the processed column?
  • I see different metrics in dev when training the model compared to using dev as a test set. Is this maybe related to dropout?

Any input will be appreciated, and please feel free to comment on anything else apart from my specific questions. Still learning about the new API

Thanks

My code

text_block = TextBlock.from_df(
    text_cols=INPUT_COLUMNS, 
    is_lm=False,
    seq_len=1_000,
    tok=None,

    # add xxfld between fields
    mark_fields=True,

    # name for the output column
    tok_text_col='ulmfit_text',
    vocab=torch.load("data_lm_vocab")
)

    data_cls = DataBlock(
        blocks=(text_block, CategoryBlock),
        get_x=ColReader("ulmfit_text"),
        get_y=ColReader("label"),  
        splitter=ColSplitter("is_dev"),
    )

    data_cls = data_cls.dataloaders(
        pd.concat([
            df_train.assign(is_dev=False), 
            df_dev.assign(is_dev=True),
        ]),
        shuffle_train=True,
        bs=64,
    )

Then I train the model as in

cls_learner = text_classifier_learner(
    data_cls, 
    AWD_LSTM, 
    drop_mult=0.5, 
    metrics=[Precision(), Recall(), F1Score(), RocAucBinary()]
)
cls_learner.load_encoder("1epoch_encoder")
cls_learner.fine_tune(4, lr_max=0.5)
cls_learner.export("classifier_with_finetune")

Now if I restart the kernel and want to predict again on the dev set, I can do

# this loads the learner but not the dls
cls_learner = load_learner("classifier_with_finetune")
cls_learner.dls = data_cls
probas, targets, preds = cls_learner.get_preds(ds_idx=1, with_decoded=True)
print(classification_report(targets, preds))

and I get the same metrics I see when training. Good

I now want to predict on a given test set. To verify that I’m doing everything alright, I will use the dev as the test set, to check that I’m getting the exact same metrics I saw on the two reports above. The problem is that when doing

dl = cls_learner.dls.test_dl(df_dev, with_labels=True)

it complains with

ulmfit_text column not found

so it seems like it’s not doing the operations defined in the DataBlock. I wonder if there’s a better way to do this. The workaround (hack) I found is

data_cls_test = DataBlock(
    blocks=(text_block, CategoryBlock),
    get_x=ColReader("ulmfit_text"),
    get_y=ColReader("label"),  
)

# TODO: is there a better way to do this? split to then concat seems absurd ...
dl_test = data_cls_test.dataloaders(df_dev, bs=128)
tmp = pd.concat([ dl_test[0].items, dl_test[1].items]).sort_index()

dl = cls_learner.dls.test_dl(tmp, with_labels=True)
test_probas, test_targets, test_preds = cls_learner.get_preds(dl=dl, with_decoded=True)
print(classification_report(test_targets.numpy().astype(int), test_preds.numpy().astype(int)))

Unfortunately, I don’t get the same result, but I can see how test_targets is the same as targets, so I’m quite confident the data is the same and in the same order.