Yes, I saw all the examples in the paper were from image data. Maybe we should write a paper about applying it to time series!
Hi, I find very interesting the idea to transform time series into images! I have some practical questions,
how do you handle time series with different lenghts ? (analogy to pre-padding / post padding for RNN).
And what about multivariate time series ? (for example x(t), y(t), z(t))
thanks for your insights!
It seems very interesting.can you explain the process of preparing the data ?
would it be possible to share your notebooks please ?
Greetings everyone, I’m a novice and just beginning fast ai courses, but am interested in encoding time series data visually. Has anyone tried a polar coordinate approach to plotting the time series? I am working with repeat multispectral satellite imagery. The different colored facets each represent a different wavelength of light, and the magnitude of theta the reflectance. Time proceeds clockwise from the top of the plot.
My thinking in trying a polar approach is that we might be able to take advantage of rotational data augmentation to help the models generalize.
Does anyone have advice on the value of representing different dimensions of the time series as facets in a plot?
great group! I have the drivendata’s Cold start data where i performed quite well, I have read the paper on encoding time series as images and i want to try, also i’m thinking to join the VBS kaggle competition
Sorry for the late reply!
I don’t have any personal experience with TS of different lengths, but the approach I’ve seen reflected more often in papers is padding TS before the transformation.
There are several approaches you could follow.
In all cases you would first transform each individual TS within a MTS into an image (array).
Some alternatives are:
- Tile images to build a larger image.
- Stack all images into a multichannel (3+) array.
- Fuse the images into a single image.
To perform 2 & 3 you would need to modify the convnet to allow 3+ channels.
the closest I’ve found to what you describe here is a transofrmation called Gramian Angular Field that is based on polar coordinates. The output is a squared matrix.
There’s a Medium article on the use of Gramian Angular Field (link) you can read if you are interested.
I’m keen to know how your approach works if you run any tests.
I guess you could modify a resnet to take in multiple channels (it seems you only use 1 channel per image).
Hi, has anybody tried any approach with multivariate time series so far? Looking for some helpful notebook in case you have!
The encoding approach and the notebook shared is only meant for univariate right? I was using that notebook as a guide to encode images and then I realized that instead of encoding values at different time steps (ie univariate), I was actually encoding 300 features of a single time step into an image (since I’m working with a multivariate dataset). And that doesn’t make sense. Thank you for the help, learned quite a bit about univariate time series classification. Please do share any resource you have, or any approaches you have tried for multivariate ts. Thank you!
Solved, March/22. Details in my other post.
=== Old post ==
cool group and lots of fascinating stuff to discover. I just started working on a time series project with a fairly naive approach to replicate the “adult” tabular data example.
The data pre-processing and model building was straightforward but obviously, the prediction didn’t worked out.
Somebody pointed out to use LSTM or GRU instead but I just can’t find a working example in fast.ai. Can anyone help me to find a starting point on how to do a minimal time series model in fast.ai?
That is super interesting. Can you share a notebook?
Thanks @marvin. I’m currently reorganizing my notebooks, which are not in a shareable state. As soon as I get it in good shape I’ll share it.
Deep Neural Network Ensembles for Time Series Classification: SOTA
Today I’ve read a new paper just published (15 Mar 2019) that shows that an ensemble of nn achieves the same performance as the current univariate time series state-of-the-art (and ensemble called HIVE-COTE).
This is another proof that neural networks are very useful in ts classification (TSC). Some key points from the paper are:
- An ensemble of nn models (several ResNets models, or several FCNs models - with different random weight init) improves TSC.
- An ensemble of a mixture of nn models (several ResNet + several FNCs + several Encoders) is even better, and matches the performance of the best non DL time series models.
This trick may not be very applicable to day to day problems, but may be useful in cases where you really need to get the absolute best performance (for example in Kaggle competitions).
The mentioned article isn’t terribly useful because all utils, important code & RL models, are missing, no data, and no response from the author to any reported issue.
== Original post ==
It doesn’t surprise me at all. About two months ago, I have read a long blog post about a Time Series Prediction system that uses a combined architecture, according to the author, as following:
Generative Adversarial Network (GAN) with LSTM , a type of Recurrent Neural Network, as generator, and a Convolutional Neural Network, CNN , as a discriminator. […] use Bayesian optimisation (along with Gaussian processes) and Reinforcement learning (RL) for deciding when and how to change the GAN’s hyperparameters (the exploration vs. exploitation dilemma). In creating the reinforcement learning we will use the most recent advancements in the field, such as Rainbow and PPO . [Src]
The author claims a near-perfect approximation of the time series after 200 epochs but unfortunately, has neither published the source code nor the underlying dataset. Just a bunch of plots and some code snippets. From a domain knowledge perspective, it all makes perfect sense especially w.r.t. to feature engineering. However, when looking at the deep learning architecture, it is really hard for me to discern whether the claimed results are credible or just blunder on an extraordinarily high level.
Therefore, I am trying currently to replicate the data and feature set as to run these through my base model. In a nutshell, the question is how the much simpler FCN compares to the bonkers-sophisticated model mentioned above when using about the same data and features?
In case the difference is negligible, it supports the view of the unreasonable effectiveness of FCN that often outperform RNN/LSTM. If not, well, I guess I have to explore the stack / ensemble methods.
My base model, that replicates the current Rossmann example, delivers out of the box about 80% accuracy, which is pretty damn good because a tweaked LSTM model on a similar dataset stagnates at about 70% at most. However, there is almost no feature engineering done yet, and thus, I believe I have to run a batch of experiments on a growing feature set before tweaking the model.
That said, feature engineering usually leads to diminishing returns at some point and that is exactly when a mixture of nn models usually pushes accuracy further ahead. And yes, it is very useful in cases where you really need to get the absolute best performance.
Thanks for sharing this post! It’s interesting. I had not seen anything of this complexity before.
I have the same feeling.
The model is a combination of GANs, LSTM, CNN, Deep Reinforcement Learning, BERT, stacked autoencoders, ARIMA, etc. It may work, but there are some many components, that it’s very difficult to understand why it works, or how to tune it.
I’d be interested in knowing more about your learnings in this area!
In a nutshell, I am working on handling risk of derivatives such as Options, Futures, FOP’s, etc with deep learning. Some risk model exists, pricing model exists, but the complexity of these models is mindblowing and I am wondering whether AI can do a similar job with less modeling complexity.
As for the article, when I started rebuilding the dataset today, I noticed that the 2265 days in the dataset imply an very ugly bias widely spread in financial modeling: Excluding the last financial crisis. Just including the 10 years bull market that started in 2010 really raises concerns of overly optimistic assumptions that may lead to poor performance during the next market downturn. Usually, I try to get at least 20 years of data (~5k days), depending on the IPO date of the included companies.
However, the remaining feature engineering is absolutely spot on b/c asset correlation is omnipresent, option/equity interrelation is very real, and reverse correlation to Bonds is as real as it gets. Technical/Fundamental analysis is bread and butter, and so is ARIMA.
However, my current thinking gravitates about applying transfer learning. The idea goes back to Jeremy’s idea to use transfer learning in NLP and that turned out to be a hit, just as it was it for CNN/images.
The core idea is to use the above feature engineering on the S&P index because that gives about 50 years* of reliable data to learn various patterns and while also having plenty of data for testing and validation. Once the model is good enough, just export, and apply it to a set of given stocks fit as to see how that goes. I guess the main idea is the same in the sense of putting the majority of engineering work in the master model to make life easier.
That said, for modeling derivates risk, predicting equity prices is about a quarter of the equation with the rest being linked to other factors such as volatility. Thus simplifying equity price prediction is on the very top of my list.
I post an update once I know how that works out.
[*] Most electronic stock data records begin with Jan/1970, although the S&P started back in 1957 and the Dow Jones in 1896.
Thanks a lot Marvin for your detailed post.
I think the idea of applying transfer learning with dynamic features to predict SP500 is really an interesting one.
I’ve performed a few tests transforming univariate time series into images and the results were promising.
Please, let me know if I can help in any way.
I have read that post before, that github repository together with the medium blog look very suspicious, especially after reading his resume. Code snipper he shows are all snippets u can copy online. You see some part are using mxnet, next part it is Keras.
The chance is it is a really sophisticated system or it is just a scam.
Thank you Ignacio,
There is indeed something you can contribute some help.
But let me briefly summarize the most recent results:
Meanwhile, I acquired a complete dataset of the S&P500 for the 92 years and did a lot of feature engineering today. I was actually stunned by the feature ranking since many well-known stock indicators (MACD, APX, etc etc) are totally useless b/c they correlate about 50% - 60%. with the closing price and those technical indicators that correlate the most are pretty obscure combinations rarely used in practice. The correlation heatmap of the final feature set indicates a promising start to train the model.
In case you are interested, you can run your algorithms over the dataset. I don’t mind sharing the data & feature set through email or DM, but for obvious reasons, I cannot share a download link in a public forum.
In case you really want to dig deeper into financial forecasting, you can start here: Financial forecasting with probabilistic programming and Pyro
Probabilistic programming with Pyro is perhaps the most underrated trend I know atm, and combining your image-net with a pyro bayesian neural network might be a first of its kind and may lead to an entirely new category of time-series predicting approaches.
I am super interested to see, how your image-net compares to the transfer learning I am working on and how both stack-up against a Bayesian Hybried ©NN network.
Would you be interested?
Thanks nok for looking into that guy,
the entire approach is just not working, no matter how you call it.
I spent some time on rebuilding the data, features, and some of the code, but ultimately most of the features had no correlation and some of the code I could make working, delivered noticeably different plots… I assume that one was a dead-end and got dumped. Total waste of time.
Hi Marvin, very interisting stuff you are working on,I wish i could participate. I had a thought some time ago about using triggers if some of the features come together, like rsi, macd and moving averages conditions and backtesting. But after some testing I found out this is an unbalanced class problem as these conditions do not occur a lot of time so I stopped there after bad results with LSTM’s. I had the same issue, unbalanced class, with divergence conditions what was also promising at first sight. Would want to test with tick data but it’s very hard to find tick data.
So i am very interested and will monitor this thread for new insights.