Share your work here ✅

hwasiti · March 30, 2019, 10:37am

A tip for everybody who has a small dataset…

If you have small dataset maybe you can try the powerful mixup technique that is already integrated in fastai. I had EEG data and experimented with very limited data, and got from 75% to 83% accuracy boost by adding .mixup() after creating the the learner like in this example:

from the fastai docs:

Mixup data augmentation

What is Mixup?

This module contains the implementation of a data augmentation technique called Mixup. It is extremely efficient at regularizing models in computer vision (we used it to get our time to train CIFAR10 to 94% on one GPU to 6 minutes).

As the name kind of suggests, the authors of the mixup article propose to train the model on a mix of the pictures of the training set. Let’s say we’re on CIFAR10 for instance, then instead of feeding the model the raw images, we take two (which could be in the same class or not) and do a linear combination of them: in terms of tensor it’s

new_image = t * image1 + (1-t) * image2

where t is a float between 0 and 1. Then the target we assign to that image is the same combination of the original targets:

new_target = t * target1 + (1-t) * target2

assuming your targets are one-hot encoded (which isn’t the case in pytorch usually). And that’s as simple as this.

mixup

Dog or cat? The right answer here is 70% dog and 30% cat!

As the picture above shows, it’s a bit hard for a human eye to comprehend the pictures obtained (although we do see the shapes of a dog and a cat) but somehow, it makes a lot of sense to the model which trains more efficiently. The final loss (training or validation) will be higher than when training without mixup even if the accuracy is far better, which means that a model trained like this will make predictions that are a bit less confident.

Example Training

model = simple_cnn((3,16,16,2))
learner = Learner(data, model, metrics=[accuracy]).mixup()
learner.fit(8)

================================
This powerful technique needs more visibility… I hope Jeremy will be kind to mention this mixup augmentation in one of the awesome lectures that we are enjoying in the part2 v3 course…

================================

@kodzaks
How many images you have in your dataset? and how many classes? How is your best accuracy so far? If mixup worked for you, I would love to know…

I love any project related to sounds and waves… FFT and understanding how is the speech and different musical sounds created by mixing only pure sine waves with different freq, phase and amplitude was intriguing me since I was a kid… This was something that I was dying to know and nobody could help (pre-internet era)… After several years when I got into, college I could understand it, and implemented FFT on my old MSX2 computer in BASIC and did some FIR filtering on the waves… That was truly a joy for me that I still remember vividly… Now kids are lucky that anything they want to know, it is only few clicks away…