Data augmentation/batch norm on a non-image dataset

randy912 · May 18, 2017, 3:04pm

Hello,

I have a dataset of about 10000 samples in 200 classes, and each sample is a 7000x4 matrix. These are not from images. Is it possible to do data augmentation and batch normalization on data of this type? If so, any insight on how to go about this would be greatly appreciated.

rteja1113 · May 18, 2017, 11:37pm

Hi @randy912, you can use batchnorm. One of the advantages of including batchnorm is it converges faster.If the data is not images it may not make sense to use augmentation.

randy912 · May 19, 2017, 1:16am

Thank you for the reply, @rteja1113!

shushi2000 · May 19, 2017, 10:58am

IMHO, you can still rotate, flip, or shear the matrix as augmentation, as long as doing this “make sense”. It depends on what is in your matrices.

For example, if the samples are people’s voice and you are using models to catch the words they are saying, then it is okay to speed up/slow down, change pitch, add a little noise, etc.

randy912 · May 19, 2017, 11:23am

Thanks for the reply, @shushi2000. The data is from time series (each row a sample, each column a time point).

rteja1113 · May 20, 2017, 11:02pm

Yeah, totally agree with @shushi2000, In the current Quora Duplicate question detection in Kaggle people used augmentation
The format of the data is below
question_1, question_2, is_duplicate

If question_1 is a duplicate/not_duplicate of question_2, then question_2 is also duplicate/not_duplicate of question_1.This kind of augmentation is analogous to image flipping .