Time series/ sequential data study group

Translation invariance experiment
TL;DR: full translation invariance may not always be a good thing in time series problems. Uber’s CoordConv may be useful to help the help the model learn how much translation invariance is needed.

I’ve been intrigued by your same question @alonso in the last few months, so I decided to perform a small experiment to really test if translation invariance is always a good thing.
The main idea is very simple: can a nn learn that a sequence of 100 zeros with a 1 randomly assigned to a position is the position number?
For example:
x = [0, 0, 0, 1, 0, 0, …, 0, 0, 0, 0] means y = 3
x =[0, 0, 0, 0, 0, 0, …, 0, 0, 1, 0] means y = 98
x =[0, 1, 0, 0, 0, 0, …, 0, 0, 0, 0] means y = 1

This is the code to create the dataset:

n_samples = 1000
seq_len = 100
X_train = np.zeros((n_samples, seq_len))
y_train = np.empty(n_samples, dtype=int)
X_test = np.zeros((n_samples, seq_len))
y_test = np.empty(n_samples, dtype=int)
for i in range(n_samples):
    j = np.random.randint(0, seq_len)
    X_train[i, j] = 1
    y_train[i] = j
    k = np.random.randint(0, seq_len)
    X_test[i, k] = 1
    y_test[i] = k
X_train = np.expand_dims(X_train, 1)
X_test = np.expand_dims(X_test, 1)

It seems a super simple problem, but even some of the state-of-the-art time series models, like ResNet or FCN (Wang, 2016), fail at this task.
For example ResNet’s accuracy on this dataset is 77% after 100 epochs.
Unknown
Unknown-2
When I use the same model (Resnet), but modify the first convolutional layer, and replace it by a CoordConv, the model achieves 100% accuracy.
Unknown-3
Unknown-4
The way I interpret this (please, let me know if you have a different view) is that a complete translation invariance may not be useful in certain types of time series (discrete or non-continuous) where the actual position of the identified features in the time axis is important.
CoordConv may be helpful in these type of situation since it

“allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task”

4 Likes