Time series/ sequential data study group

oguiza · March 2, 2019, 5:21pm

Hi @fl2o,
Sorry for the late reply!

I don’t have any personal experience with TS of different lengths, but the approach I’ve seen reflected more often in papers is padding TS before the transformation.

There are several approaches you could follow.
In all cases you would first transform each individual TS within a MTS into an image (array).
Some alternatives are:

Tile images to build a larger image.
Stack all images into a multichannel (3+) array.
Fuse the images into a single image.
To perform 2 & 3 you would need to modify the convnet to allow 3+ channels.

Edit: @hwasiti has pointed a 4th option: Concatenating multi CNN outputs (each CNN for each uni-variate spectrogram) into single dense NN (example).

oguiza · March 2, 2019, 5:29pm

Hi @mossCoder,
the closest I’ve found to what you describe here is a transofrmation called Gramian Angular Field that is based on polar coordinates. The output is a squared matrix.
There’s a Medium article on the use of Gramian Angular Field (link) you can read if you are interested.
I’m keen to know how your approach works if you run any tests.

I guess you could modify a resnet to take in multiple channels (it seems you only use 1 channel per image).

sknepal · March 12, 2019, 4:48am

Hi, has anybody tried any approach with multivariate time series so far? Looking for some helpful notebook in case you have!

The encoding approach and the notebook shared is only meant for univariate right? I was using that notebook as a guide to encode images and then I realized that instead of encoding values at different time steps (ie univariate), I was actually encoding 300 features of a single time step into an image (since I’m working with a multivariate dataset). And that doesn’t make sense. Thank you for the help, learned quite a bit about univariate time series classification. Please do share any resource you have, or any approaches you have tried for multivariate ts. Thank you!

marvin · March 21, 2019, 4:25pm

Hi,

Solved, March/22. Details in my other post.

=== Old post ==
cool group and lots of fascinating stuff to discover. I just started working on a time series project with a fairly naive approach to replicate the “adult” tabular data example.
The data pre-processing and model building was straightforward but obviously, the prediction didn’t worked out.

Somebody pointed out to use LSTM or GRU instead but I just can’t find a working example in fast.ai. Can anyone help me to find a starting point on how to do a minimal time series model in fast.ai?

Thank you
Marvin

Gist:

gist.github.com

https://gist.github.com/marvin-hansen/a65d1f38201ab8572fb6065a3db92d27

Auto-Regression.py

#Lib versions

#Python Version: 3.6.7
#Pandas Version: 0.24.2
#Numpy Version: 1.16.2
#FastAI Version: 1.0.50.post1
#PyTorch Version: 1.0.1.post2

#GPU Acceleration
#GPU: NVDIA K80

This file has been truncated. show original

Details here:
https://forums.fast.ai/t/auto-regression-with-tabularmodel-zero-accuracy-why/

marvin · March 21, 2019, 4:31pm

That is super interesting. Can you share a notebook?

oguiza · March 21, 2019, 9:12pm

Thanks @marvin. I’m currently reorganizing my notebooks, which are not in a shareable state. As soon as I get it in good shape I’ll share it.

oguiza · March 24, 2019, 10:42am

Deep Neural Network Ensembles for Time Series Classification: SOTA
Today I’ve read a new paper just published (15 Mar 2019) that shows that an ensemble of nn achieves the same performance as the current univariate time series state-of-the-art (and ensemble called HIVE-COTE).
This is another proof that neural networks are very useful in ts classification (TSC). Some key points from the paper are:

An ensemble of nn models (several ResNets models, or several FCNs models - with different random weight init) improves TSC.
An ensemble of a mixture of nn models (several ResNet + several FNCs + several Encoders) is even better, and matches the performance of the best non DL time series models.
This trick may not be very applicable to day to day problems, but may be useful in cases where you really need to get the absolute best performance (for example in Kaggle competitions).

marvin · March 25, 2019, 12:10pm

[Update Mar/25]
The mentioned article isn’t terribly useful because all utils, important code & RL models, are missing, no data, and no response from the author to any reported issue.

Notebook
Open Issues

== Original post ==
It doesn’t surprise me at all. About two months ago, I have read a long blog post about a Time Series Prediction system that uses a combined architecture, according to the author, as following:

Generative Adversarial Network (GAN) with LSTM , a type of Recurrent Neural Network, as generator, and a Convolutional Neural Network, CNN , as a discriminator. […] use Bayesian optimisation (along with Gaussian processes) and Reinforcement learning (RL) for deciding when and how to change the GAN’s hyperparameters (the exploration vs. exploitation dilemma). In creating the reinforcement learning we will use the most recent advancements in the field, such as Rainbow and PPO . [Src]

The author claims a near-perfect approximation of the time series after 200 epochs but unfortunately, has neither published the source code nor the underlying dataset. Just a bunch of plots and some code snippets. From a domain knowledge perspective, it all makes perfect sense especially w.r.t. to feature engineering. However, when looking at the deep learning architecture, it is really hard for me to discern whether the claimed results are credible or just blunder on an extraordinarily high level.

Therefore, I am trying currently to replicate the data and feature set as to run these through my base model. In a nutshell, the question is how the much simpler FCN compares to the bonkers-sophisticated model mentioned above when using about the same data and features?
In case the difference is negligible, it supports the view of the unreasonable effectiveness of FCN that often outperform RNN/LSTM. If not, well, I guess I have to explore the stack / ensemble methods.

My base model, that replicates the current Rossmann example, delivers out of the box about 80% accuracy, which is pretty damn good because a tweaked LSTM model on a similar dataset stagnates at about 70% at most. However, there is almost no feature engineering done yet, and thus, I believe I have to run a batch of experiments on a growing feature set before tweaking the model.

That said, feature engineering usually leads to diminishing returns at some point and that is exactly when a mixture of nn models usually pushes accuracy further ahead. And yes, it is very useful in cases where you really need to get the absolute best performance.

oguiza · March 25, 2019, 2:05pm

Thanks for sharing this post! It’s interesting. I had not seen anything of this complexity before.

I have the same feeling.
The model is a combination of GANs, LSTM, CNN, Deep Reinforcement Learning, BERT, stacked autoencoders, ARIMA, etc. It may work, but there are some many components, that it’s very difficult to understand why it works, or how to tune it.

I’d be interested in knowing more about your learnings in this area!

marvin · March 25, 2019, 4:57pm

In a nutshell, I am working on handling risk of derivatives such as Options, Futures, FOP’s, etc with deep learning. Some risk model exists, pricing model exists, but the complexity of these models is mindblowing and I am wondering whether AI can do a similar job with less modeling complexity.

As for the article, when I started rebuilding the dataset today, I noticed that the 2265 days in the dataset imply an very ugly bias widely spread in financial modeling: Excluding the last financial crisis. Just including the 10 years bull market that started in 2010 really raises concerns of overly optimistic assumptions that may lead to poor performance during the next market downturn. Usually, I try to get at least 20 years of data (~5k days), depending on the IPO date of the included companies.

However, the remaining feature engineering is absolutely spot on b/c asset correlation is omnipresent, option/equity interrelation is very real, and reverse correlation to Bonds is as real as it gets. Technical/Fundamental analysis is bread and butter, and so is ARIMA.

However, my current thinking gravitates about applying transfer learning. The idea goes back to Jeremy’s idea to use transfer learning in NLP and that turned out to be a hit, just as it was it for CNN/images.

The core idea is to use the above feature engineering on the S&P index because that gives about 50 years* of reliable data to learn various patterns and while also having plenty of data for testing and validation. Once the model is good enough, just export, and apply it to a set of given stocks fit as to see how that goes. I guess the main idea is the same in the sense of putting the majority of engineering work in the master model to make life easier.

That said, for modeling derivates risk, predicting equity prices is about a quarter of the equation with the rest being linked to other factors such as volatility. Thus simplifying equity price prediction is on the very top of my list.

I post an update once I know how that works out.

M

[*] Most electronic stock data records begin with Jan/1970, although the S&P started back in 1957 and the Dow Jones in 1896.

oguiza · March 25, 2019, 10:16pm

Thanks a lot Marvin for your detailed post.
I think the idea of applying transfer learning with dynamic features to predict SP500 is really an interesting one.
I’ve performed a few tests transforming univariate time series into images and the results were promising.
Please, let me know if I can help in any way.

nok · March 26, 2019, 12:08am

I have read that post before, that github repository together with the medium blog look very suspicious, especially after reading his resume. Code snipper he shows are all snippets u can copy online. You see some part are using mxnet, next part it is Keras.

The chance is it is a really sophisticated system or it is just a scam.

marvin · March 26, 2019, 7:35pm

Thank you Ignacio,

There is indeed something you can contribute some help.

But let me briefly summarize the most recent results:

Meanwhile, I acquired a complete dataset of the S&P500 for the 92 years and did a lot of feature engineering today. I was actually stunned by the feature ranking since many well-known stock indicators (MACD, APX, etc etc) are totally useless b/c they correlate about 50% - 60%. with the closing price and those technical indicators that correlate the most are pretty obscure combinations rarely used in practice. The correlation heatmap of the final feature set indicates a promising start to train the model.

In case you are interested, you can run your algorithms over the dataset. I don’t mind sharing the data & feature set through email or DM, but for obvious reasons, I cannot share a download link in a public forum.

In case you really want to dig deeper into financial forecasting, you can start here: Financial forecasting with probabilistic programming and Pyro

Probabilistic programming with Pyro is perhaps the most underrated trend I know atm, and combining your image-net with a pyro bayesian neural network might be a first of its kind and may lead to an entirely new category of time-series predicting approaches.

I am super interested to see, how your image-net compares to the transfer learning I am working on and how both stack-up against a Bayesian Hybried ©NN network.

Would you be interested?
Marvin

S&P500 Feature Ranking

Correlation-Matrix-Heatmap

marvin · March 26, 2019, 7:40pm

Thanks nok for looking into that guy,

the entire approach is just not working, no matter how you call it.

I spent some time on rebuilding the data, features, and some of the code, but ultimately most of the features had no correlation and some of the code I could make working, delivered noticeably different plots… I assume that one was a dead-end and got dumped. Total waste of time.

gevezex · March 27, 2019, 5:54pm

Hi Marvin, very interisting stuff you are working on,I wish i could participate. I had a thought some time ago about using triggers if some of the features come together, like rsi, macd and moving averages conditions and backtesting. But after some testing I found out this is an unbalanced class problem as these conditions do not occur a lot of time so I stopped there after bad results with LSTM’s. I had the same issue, unbalanced class, with divergence conditions what was also promising at first sight. Would want to test with tick data but it’s very hard to find tick data.
So i am very interested and will monitor this thread for new insights.

MicPie · March 27, 2019, 7:48pm

I would be also very interested in the pytorch + pyro combination.

This sounds very similar to the approach outlined in this paper:

(However, so far I didn’t found a pyro implementation of it and I’m still working on my pyro skills.)

I’m happy if you can recommend stuff in that direction.

marvin · March 27, 2019, 9:35pm

@MicPie
I don’t know more than you do but here is a starting point:

Financial forecasting with probabilistic programming

https://medium.com/@alexrachnog/financial-forecasting-with-probabilistic-programming-and-pyro-db68ab1a1dba

marvin · March 27, 2019, 9:40pm

@gevezex

Yes you can participate soon. I’m preparing to share data & code within the next few days. @oguiza suggested to launch a competition to predict S&P500 and I think that’s the way to go. I prepare the release of my related work as a baseline to get things started.

You get 1 minute tick data for free from AWS:

https://registry.opendata.aws/deutsche-boerse-pds/

And a sample prediction system done in TensorFlow:

github.com

Originate/dbg-pds-tensorflow-demo/blob/master/README.md

# Stock Price Movement Prediction Using The Deutsche Börse Public Dataset & Machine Learning

## Introduction

We use neural networks applied to stock market data from the Deutsche Börse Public Dataset (PDS) to make predictions about future price movements for each stock.

Specifically, we make a prediction on the direction of the next minute's price change using information from the previous ten minutes. We use this to power a simplified trading strategy to show potential returns.

This is intended as a demonstrate of the applications on this data set.

## The Deutsche Börse Public Dataset

The [Deutsche Börse PDS project](https://github.com/Deutsche-Boerse/dbg-pds) provides minute-by-minute statistics over trading data from the [XETRA](http://www.xetra.com/) and [EUREX](http://www.eurexchange.com/exchange-en/) engines.

We focus on XETRA only. It is comprised of a variety of equities, funds and derivative securities. The PDS contains details for on a per security level, detailing trading activity by minute including the high, low, first and last prices within the time period.

## Getting Started

Ensure you have Docker installed before completing the following steps.

This file has been truncated. show original

tcapelle · March 28, 2019, 9:10am

I have been following this thread silently, but I am veryinterested in TS forecasting as well .
I have tried XGboost and Tabular model to forecast time series, but with pretty poor results, so the LSTM approach looks interesting.
My data is composed of various TS (power consumption, meterological data, calendar data) and I need to forecast the power consumption for the next hour, using the other TSs and the previous power data. I can update in real time my model with the actual data every hour.

The super meta model showed to predict Goldman Sachs looks very suspicious, but not completely wrong. Anyone was able to reproduce the results?
Any advice appreciated.

oguiza · March 28, 2019, 9:47am

Hi Thomas,
It’s great that you joined this group.
It’s a bit difficult to give any hint without seeing the data, but if they are equally-spaced (1h), I think I’d start with either a ResNet or an FCN model on the raw data. These models work pretty well on multivariate time series data. You may want to take a look at this link where a review article is summarized.