Yes I am referencing a lot on the Pytorch tutorial as well, I just need more tweak to make it work for time series data.
I’m implementing this approach on the LANL earthquake competition, and there’s something that keeps bugging me…
CNNs are translation-invariant, which is a good thing normally because you can detect a dog on the left and a dog on the right equally well.
For images generated from time-series data, though, isn’t this a drawback? When our images have a clear time axis it seems like using a translation-invariant architecture would disregard that time information - a feature in the image could be recognized in the top-right or bottom-left equally well, even though those correspond to vastly different meanings (earlier vs later in the time series, for example).
Does anyone have thoughts on this? I thought that this might be the motivation for the authors of this paper using “tiled” CNNs, but as far as I can tell the tiling doesn’t seem to effect the translational invariance at all.
You raise an interesting point @alonso.
Translation invariance is helpful when you look for certain patterns that may occur at any point in an image. However, in certain cases, the location of the pattern may be important. This was studied by Uber AI labs, and they came up with a modified conv layer they called coordconv. They basically use the coordinates (x, y) of the image. This is the paper they published. This modified coordconv layer allows the model learn how much translation invariance is needed.
With time series, I think this may be important in some situations. There are continuous time series (like a heartbeat), where the location of certain waves in the time axis may not be important. However, in other cases (discrete time series with predefined start and end) the location of certain pattern along the time axis may be important (for example in a food spectrogram).
There is a pytorch implementation of coordconv in github link.
I have seen exactly this problem tackled in this paper (which is well worth a read). Essentially they solve the issue by padding only on one side of the sequence. Notice that he paper uses temporal convnets to do language modeling (which would be similar to time series predictions) without converting them into images. The architecture is quite simple. And I have seen a pytorch implementation around. I have been wanting to give it a shot for weeks, but have been caught up in other projects.
Hi @alonso - in my experiment I found that convnet handles position sensitivity quite well - somewhat to my surprise. My positive training examples have spikes (and some dips) to the right edge of the image.
What I did have to do was to be careful with image augmentation, which is to say don’t do any at all. Given the classification task requires location sensitivity, I needed to avoid all translations, cropping/padding, flipping the image etc…
This worked out for my scenario - YMMV.
Translation invariance experiment
TL;DR: full translation invariance may not always be a good thing in time series problems. Uber’s CoordConv may be useful to help the help the model learn how much translation invariance is needed.
I’ve been intrigued by your same question @alonso in the last few months, so I decided to perform a small experiment to really test if translation invariance is always a good thing.
The main idea is very simple: can a nn learn that a sequence of 100 zeros with a 1 randomly assigned to a position is the position number?
x = [0, 0, 0, 1, 0, 0, …, 0, 0, 0, 0] means y = 3
x =[0, 0, 0, 0, 0, 0, …, 0, 0, 1, 0] means y = 98
x =[0, 1, 0, 0, 0, 0, …, 0, 0, 0, 0] means y = 1
This is the code to create the dataset:
n_samples = 1000 seq_len = 100 X_train = np.zeros((n_samples, seq_len)) y_train = np.empty(n_samples, dtype=int) X_test = np.zeros((n_samples, seq_len)) y_test = np.empty(n_samples, dtype=int) for i in range(n_samples): j = np.random.randint(0, seq_len) X_train[i, j] = 1 y_train[i] = j k = np.random.randint(0, seq_len) X_test[i, k] = 1 y_test[i] = k X_train = np.expand_dims(X_train, 1) X_test = np.expand_dims(X_test, 1)
It seems a super simple problem, but even some of the state-of-the-art time series models, like ResNet or FCN (Wang, 2016), fail at this task.
For example ResNet’s accuracy on this dataset is 77% after 100 epochs.
When I use the same model (Resnet), but modify the first convolutional layer, and replace it by a CoordConv, the model achieves 100% accuracy.
The way I interpret this (please, let me know if you have a different view) is that a complete translation invariance may not be useful in certain types of time series (discrete or non-continuous) where the actual position of the identified features in the time axis is important.
CoordConv may be helpful in these type of situation since it
“allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task”
Very interesting experiment.
For experimenting it is not even necessary to modify the model with the new layer, you can simply add a channel with the calculated coords to the input data!
Using this, I reproduced/confirmed your results, but I also tested your sample dataset with regression (so instead of predicting 100 classes, I try to predict the coordinate itself as one number): Without coordconv 100 epochs lead to an MAE of >4, with coordconv this is reduced to an MAE within the range of 0.1-0.3 !!
Good to hear from you!
Yes you are absolutely right! I just added coord as an option I can enable/disable from a conversation layer for convenience.
Great! Very interesting! This simple idea seems to definetly add value when a the position of the features identified by the conv layer have a strong temporal component. This is not the case in continuous time series.
I’ve tested the coord conv idea on other TS and it doesn’t provide any benefit. What like also like about it is that it doesn’t seem to add any negative bias.
By the way, this same idea can be applied to image data.
Yes, I saw all the examples in the paper were from image data. Maybe we should write a paper about applying it to time series!
Hi, I find very interesting the idea to transform time series into images! I have some practical questions,
how do you handle time series with different lenghts ? (analogy to pre-padding / post padding for RNN).
And what about multivariate time series ? (for example x(t), y(t), z(t))
thanks for your insights!
It seems very interesting.can you explain the process of preparing the data ?
would it be possible to share your notebooks please ?
Greetings everyone, I’m a novice and just beginning fast ai courses, but am interested in encoding time series data visually. Has anyone tried a polar coordinate approach to plotting the time series? I am working with repeat multispectral satellite imagery. The different colored facets each represent a different wavelength of light, and the magnitude of theta the reflectance. Time proceeds clockwise from the top of the plot.
My thinking in trying a polar approach is that we might be able to take advantage of rotational data augmentation to help the models generalize.
Does anyone have advice on the value of representing different dimensions of the time series as facets in a plot?
great group! I have the drivendata’s Cold start data where i performed quite well, I have read the paper on encoding time series as images and i want to try, also i’m thinking to join the VBS kaggle competition
Sorry for the late reply!
I don’t have any personal experience with TS of different lengths, but the approach I’ve seen reflected more often in papers is padding TS before the transformation.
There are several approaches you could follow.
In all cases you would first transform each individual TS within a MTS into an image (array).
Some alternatives are:
- Tile images to build a larger image.
- Stack all images into a multichannel (3+) array.
- Fuse the images into a single image.
To perform 2 & 3 you would need to modify the convnet to allow 3+ channels.
the closest I’ve found to what you describe here is a transofrmation called Gramian Angular Field that is based on polar coordinates. The output is a squared matrix.
There’s a Medium article on the use of Gramian Angular Field (link) you can read if you are interested.
I’m keen to know how your approach works if you run any tests.
I guess you could modify a resnet to take in multiple channels (it seems you only use 1 channel per image).
Hi, has anybody tried any approach with multivariate time series so far? Looking for some helpful notebook in case you have!
The encoding approach and the notebook shared is only meant for univariate right? I was using that notebook as a guide to encode images and then I realized that instead of encoding values at different time steps (ie univariate), I was actually encoding 300 features of a single time step into an image (since I’m working with a multivariate dataset). And that doesn’t make sense. Thank you for the help, learned quite a bit about univariate time series classification. Please do share any resource you have, or any approaches you have tried for multivariate ts. Thank you!
Solved, March/22. Details in my other post.
=== Old post ==
cool group and lots of fascinating stuff to discover. I just started working on a time series project with a fairly naive approach to replicate the “adult” tabular data example.
The data pre-processing and model building was straightforward but obviously, the prediction didn’t worked out.
Somebody pointed out to use LSTM or GRU instead but I just can’t find a working example in fast.ai. Can anyone help me to find a starting point on how to do a minimal time series model in fast.ai?
That is super interesting. Can you share a notebook?
Thanks @marvin. I’m currently reorganizing my notebooks, which are not in a shareable state. As soon as I get it in good shape I’ll share it.