FitLaM for sequence prediction problems

I was wondering if anyone here has experimented with using FitLaM for sequence prediction problems like Part of Speech (POS) tagging or Named Entity Recognition (NER).

@jeremy did you guys by any chance test it on the above-mentioned/related problems?

2 Likes

Not yet, but we want to. Help would be appreciated!

1 Like

Great! Do you have any specific datasets in mind?

Best is to pick a reasonably recent paper covering applications you’re interested in, and use the same datasets as them - so you can see how it compares. For instance, the ELMo paper from AI2 might have good examples, or the COVE paper from McCann et al.

5 Likes

Reference Work

Training Data used by CoVe

MT-Small: WMT 2016 multi-modal translation shared task [Specia et al., 2016]. The training set consists of 30,000 sentence pairs that briefly describe Flickr captions and is often referred to as Multi30k. Due to the nature of image captions, this dataset contains sentences that are, on average, shorter and simpler than those from larger counterparts.

MT-Medium: 2016 version of the machine translation task prepared for the International Workshop on Spoken Language Translation [Cettolo et al., 2015]. The training set consists of 209,772 sentence pairs from transcribed TED presentations that cover a wide variety of topics with more conversational language than in the other two machine translation datasets.

MT-Large: News translation shared task from WMT 2017. The training set consists of roughly 7 million sentence pairs that comes from web crawl data, a news and commentary corpus, European Parliament proceedings, and European Union press releases.

Evaluation Tasks and Datasets used by CoVe:

SST-2 - Sentiment Classification - 2 classes, single sentences - 56.4k
SST-5 - Sentiment Classification - 5 classes, single sentences - 94.2k
IMDb - Sentiment Classification - 2 classes, multiple sentences - 22.5k
TREC-6 - Question Classification - 6 classes - 4.3k
TREC-50 - Question Classification - 50 classes - 4.3k
SNLI - Entailment Classification - 2 classes - 549.4k
SQuAD - Question Answering - open-ended (answer-spans) - 87.6k

Cc: @rudraksh you might want to check above out. @jeremy did I interpret you correctly?

6 Likes

Thanks a lot, man! This is really helpful. Besides seq2seq, I’m also interested in tasks where an output is generated after each time step, similar to the task of language modelling. I essentially want to be able to port this technique to the Biomedical NLP domain where labelled data is often hard to come by, and can possibly benefit a lot from pre-trained NLP models. One of the important problems there is accurate recognition of named entities like diseases, genes etc., problems which require a prediction for each token.

1 Like

hi there, I’m also interested in related topics but still going through the FitLaM material right now :). Just FYI, in addition to the NER dataset referenced in the ELMO paper, here’s a biomedical NLP paper I’d come across - given your interest in that area specifically… https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx815/4764002

1 Like

As the expert on the topic @rudraksh, can you share 1-2 datasets? And selected examples of what the input and output to the network might look like?

1 Like

Btw, here’s a recent paper with 2 “classic” NER datasets (source: I am an NLP person :)). https://arxiv.org/pdf/1707.05928.pdf . But as Rudraksh can confirm, biomed NLP is slightly different - this is just in case you were interested in typical NER data.

2 Likes

Thank you for sharing :slight_smile:

Intuitionally, their approach makes a lot of sense. Vocabulary sizes in Biomedical datasets is often large but at the same time, there’s a lot of syntactic similarity between say words used for genes, chemicals etc. A character level CNN should be able to pick up on this.

Another paper that I really like makes use of a Multi-task NER objective and achieves state-of-the-art results: Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning.

1 Like

38%20AM

The kind of applications I’m talking about requires a many-to-many formulation. The network outputs a prediction for each timestep. In addition to the papers mentioned above by @anamariapopescug and me, you can also look at this blog.