The aim of my project is to experiment with a few NLP Data Augmentation ideas that have been published, (more details below) with an attempt to further improve Language model performance, training time or push accuracies with NLP Tasks such as Sentiment classification as covered in Lesson 4 (Part 1, 2019)
Initial ideas and experiments:
- EDA Easy Data Augmentation Paper
- Back Translation:
Translating from one language to another and then back to the original to utilize the “noise” in back translation as augmented text.
More ideas have been discussed in this thread. I’ll try to go over these and add them here with time.
Based on the EDA paper, we’re trying to perform noun and verb replacement in the IMDB dataset. Demo in kaggle kernel.
An image to show the TL;DR augmentation approach:
We’re trying to create an “Augmented copy” of the IMDB dataset and then train on the original and augmented data in cycles as an experiment.
- Checking aggresive/relaxed replacement techniques.
- Implementing more ideas from the orignal EDA paper onto other datasets.
If you’re interested, please tag @init_27 or DM me.