Lesson 4 In-Class Discussion ✅

Data Augmentation using Thesaurus—thesaurus-based approaches are all I’ve come across so far, but we’ll look for others and post if anything interesting. The problem with thesaurus-based approaches is that, you usually can’t just use an off-the-shelf thesaurus for most tasks. Some results were shown in this paper: [1502.01710] Text Understanding from Scratch

Updated:

There’s another interesting technique for data augmentation specific to RNNs from “Data Noising as Smoothing in Neural Network Language Models” (ICLR 2017): [1703.02573] Data Noising as Smoothing in Neural Network Language Models

In this work, we consider noising primitives as a form of data augmentation
for recurrent neural network-based language models.

5 Likes