Data augmentation for regression?

Very often, data augmentation is applied only to classification models, so that the inputs change but the target outputs (“labels”) don’t change.

I’m primarily interested in regression models; in these cases, typically when one augments the input, one also has to modify the target outputs appropriately.

I’d love to hear more in the future about augmenting for regression problems too.

It seems like there’s an important, ‘useful’ class of transformations T which may satisfy the following relation:
Given a (nonlinear) function f mapping inputs x to outputs y, i.e. y = f(x), some transformation T may satisfy T(y) = f(T(x)), i.e. that T and f commute under composition: T(f(x)) = f(T(x)).

(This will of course depend on the form of f(x), which is usually unknown – that’s why you’re using a neural network to approximate it).

Often, translation in time or space will satisfy this property, as will inversion (sometimes), depending on the data & function…

Question: Is there a “name” for this type of mathematical transformation/symmetry, and perhaps a mechanism to find such ‘allowed’ transformations?

(perhaps something akin to a ‘Calculus of Variations’ approach could work…)

some implementations of mixing modifies both in and output. Will be interesting to see the implementation in this course

1 Like

Yea, I looked at the new Mixup documentation, in fact my posting this question came about after reading it: Mixup seems to assume that the mapping from input to output supports linear superposition. The function I’m learning definitely doesn’t support that.

1 Like

I’m interested in this topic as well. In the past I’ve used gaussrank, which is an interesting function that rank orders the data and then maps it onto a gaussian distribution. My use case was ranking so the final values didn’t matter so much and it was very effective. But it’s not easily invertable so it’s not as useful in cases where you need the original value.

1 Like