An FBeta metric for sequence labeling?

I am trying to use use an F1 score metric for sequence labeling. I’ve tried both fbeta and the FBeta class, but both throw errors. My guess is that both metrics are not meant for the sequence labeling output or the labels respectively.

My dataset is CoNLL 2003, which consists of tokens and labels (called “Tags”). The goal is to perform named entity recognition on the dataset.

My code is based loosely on the fastai NLP course’s 7-seq2seq-translation notebook (see also the course video). However, the output vocab consists of the different “Tags”, not a vocab of a target language.

The model looks like this:

NERRNN(
  (emb_enc): Embedding(9472, 400, padding_idx=1)
  (emb_enc_drop): Dropout(p=0.15)
  (gru_enc): GRU(400, 128, num_layers=28, batch_first=True, dropout=0.25)
  (out_enc): Linear(in_features=128, out_features=400, bias=False)
  (emb_dec): Embedding(24, 400, padding_idx=1)
  (gru_dec): GRU(400, 400, num_layers=28, batch_first=True, dropout=0.1)
  (out_drop): Dropout(p=0.35)
  (out): Linear(in_features=400, out_features=24, bias=True)
)

The model trains well using seq2seq_acc, but when adding either fbeta or FBeta(), I get the following errors:

RuntimeError: The size of tensor a (24) must match the size of tensor b (1175) at non-singleton dimension 2 for fbeta

and RuntimeError: The size of tensor a (384) must match the size of tensor b (1175) at non-singleton dimension 2 for FBeta().

I also tried FBeta(average="macro") with the same result.

If anyone could point me in the right direction of how I need to modify the metric to work, I would really appreciate it. I would also be happy to code up a new metrics, if someone can point out where the problem is with the current implementations.

1 Like

I am looking for something very similar, if you figure it out, please let me know!

Will do. At this point, I’m thinking I will have go roll my own implementation.

1 Like