Reverse text input as data augmentation

Google’s 2014 paper on seq2seq neural machine translation found that “… reversing the order of the words in all source sentences (but not target sentences) improved the LSTM’s performance markedly”, because it “… introduced many short term dependencies that made the optimization problem much simpler (see sec. 2 and 3.3). As a result, SGD could learn LSTMs that had no trouble with long sentences.”

This seems like something that can be easily added to the fastai library as a simple data augmentation technique (for text data). Or is it already in the fastai library, and I somehow missed it?

1 Like

We have this already. You can pass backwards=true into your databunch to replicate this :slight_smile:

2 Likes

Thanks for the clarification! For those who stumble across this, it’s documented here.

One more question: does bidirectional RNN/LSTM make reversing the input sentence unnecessary?