Time series/ sequential data study group

That is super interesting. Can you share a notebook?

Thanks @marvin. I’m currently reorganizing my notebooks, which are not in a shareable state. As soon as I get it in good shape I’ll share it.

1 Like

Deep Neural Network Ensembles for Time Series Classification: SOTA
Today I’ve read a new paper just published (15 Mar 2019) that shows that an ensemble of nn achieves the same performance as the current univariate time series state-of-the-art (and ensemble called HIVE-COTE).
This is another proof that neural networks are very useful in ts classification (TSC). Some key points from the paper are:

  • An ensemble of nn models (several ResNets models, or several FCNs models - with different random weight init) improves TSC.
  • An ensemble of a mixture of nn models (several ResNet + several FNCs + several Encoders) is even better, and matches the performance of the best non DL time series models.
    This trick may not be very applicable to day to day problems, but may be useful in cases where you really need to get the absolute best performance (for example in Kaggle competitions).
8 Likes

[Update Mar/25]
The mentioned article isn’t terribly useful because all utils, important code & RL models, are missing, no data, and no response from the author to any reported issue.

Notebook
Open Issues

== Original post ==
It doesn’t surprise me at all. About two months ago, I have read a long blog post about a Time Series Prediction system that uses a combined architecture, according to the author, as following:

Generative Adversarial Network (GAN) with LSTM , a type of Recurrent Neural Network, as generator, and a Convolutional Neural Network, CNN , as a discriminator. […] use Bayesian optimisation (along with Gaussian processes) and Reinforcement learning (RL) for deciding when and how to change the GAN’s hyperparameters (the exploration vs. exploitation dilemma). In creating the reinforcement learning we will use the most recent advancements in the field, such as Rainbow and PPO . [Src]

The author claims a near-perfect approximation of the time series after 200 epochs but unfortunately, has neither published the source code nor the underlying dataset. Just a bunch of plots and some code snippets. From a domain knowledge perspective, it all makes perfect sense especially w.r.t. to feature engineering. However, when looking at the deep learning architecture, it is really hard for me to discern whether the claimed results are credible or just blunder on an extraordinarily high level.

Therefore, I am trying currently to replicate the data and feature set as to run these through my base model. In a nutshell, the question is how the much simpler FCN compares to the bonkers-sophisticated model mentioned above when using about the same data and features?
In case the difference is negligible, it supports the view of the unreasonable effectiveness of FCN that often outperform RNN/LSTM. If not, well, I guess I have to explore the stack / ensemble methods.

My base model, that replicates the current Rossmann example, delivers out of the box about 80% accuracy, which is pretty damn good because a tweaked LSTM model on a similar dataset stagnates at about 70% at most. However, there is almost no feature engineering done yet, and thus, I believe I have to run a batch of experiments on a growing feature set before tweaking the model.

That said, feature engineering usually leads to diminishing returns at some point and that is exactly when a mixture of nn models usually pushes accuracy further ahead. And yes, it is very useful in cases where you really need to get the absolute best performance.

3 Likes

Thanks for sharing this post! It’s interesting. I had not seen anything of this complexity before.

I have the same feeling.
The model is a combination of GANs, LSTM, CNN, Deep Reinforcement Learning, BERT, stacked autoencoders, ARIMA, etc. It may work, but there are some many components, that it’s very difficult to understand why it works, or how to tune it.

I’d be interested in knowing more about your learnings in this area!

In a nutshell, I am working on handling risk of derivatives such as Options, Futures, FOP’s, etc with deep learning. Some risk model exists, pricing model exists, but the complexity of these models is mindblowing and I am wondering whether AI can do a similar job with less modeling complexity.

As for the article, when I started rebuilding the dataset today, I noticed that the 2265 days in the dataset imply an very ugly bias widely spread in financial modeling: Excluding the last financial crisis. Just including the 10 years bull market that started in 2010 really raises concerns of overly optimistic assumptions that may lead to poor performance during the next market downturn. Usually, I try to get at least 20 years of data (~5k days), depending on the IPO date of the included companies.

However, the remaining feature engineering is absolutely spot on b/c asset correlation is omnipresent, option/equity interrelation is very real, and reverse correlation to Bonds is as real as it gets. Technical/Fundamental analysis is bread and butter, and so is ARIMA.

However, my current thinking gravitates about applying transfer learning. The idea goes back to Jeremy’s idea to use transfer learning in NLP and that turned out to be a hit, just as it was it for CNN/images.

The core idea is to use the above feature engineering on the S&P index because that gives about 50 years* of reliable data to learn various patterns and while also having plenty of data for testing and validation. Once the model is good enough, just export, and apply it to a set of given stocks fit as to see how that goes. I guess the main idea is the same in the sense of putting the majority of engineering work in the master model to make life easier.

That said, for modeling derivates risk, predicting equity prices is about a quarter of the equation with the rest being linked to other factors such as volatility. Thus simplifying equity price prediction is on the very top of my list.

I post an update once I know how that works out.

M

[*] Most electronic stock data records begin with Jan/1970, although the S&P started back in 1957 and the Dow Jones in 1896.

4 Likes

Thanks a lot Marvin for your detailed post.
I think the idea of applying transfer learning with dynamic features to predict SP500 is really an interesting one.
I’ve performed a few tests transforming univariate time series into images and the results were promising.
Please, let me know if I can help in any way.

2 Likes

I have read that post before, that github repository together with the medium blog look very suspicious, especially after reading his resume. Code snipper he shows are all snippets u can copy online. You see some part are using mxnet, next part it is Keras.

The chance is it is a really sophisticated system or it is just a scam.

Thank you Ignacio,

There is indeed something you can contribute some help.

But let me briefly summarize the most recent results:

Meanwhile, I acquired a complete dataset of the S&P500 for the 92 years and did a lot of feature engineering today. I was actually stunned by the feature ranking since many well-known stock indicators (MACD, APX, etc etc) are totally useless b/c they correlate about 50% - 60%. with the closing price and those technical indicators that correlate the most are pretty obscure combinations rarely used in practice. The correlation heatmap of the final feature set indicates a promising start to train the model.

In case you are interested, you can run your algorithms over the dataset. I don’t mind sharing the data & feature set through email or DM, but for obvious reasons, I cannot share a download link in a public forum.

In case you really want to dig deeper into financial forecasting, you can start here: Financial forecasting with probabilistic programming and Pyro

Probabilistic programming with Pyro is perhaps the most underrated trend I know atm, and combining your image-net with a pyro bayesian neural network might be a first of its kind and may lead to an entirely new category of time-series predicting approaches.

I am super interested to see, how your image-net compares to the transfer learning I am working on and how both stack-up against a Bayesian Hybried ©NN network.

Would you be interested?
Marvin

S&P500 Feature Ranking

Correlation-Matrix-Heatmap

4 Likes

Thanks nok for looking into that guy,

the entire approach is just not working, no matter how you call it.

I spent some time on rebuilding the data, features, and some of the code, but ultimately most of the features had no correlation and some of the code I could make working, delivered noticeably different plots… I assume that one was a dead-end and got dumped. Total waste of time.

Hi Marvin, very interisting stuff you are working on,I wish i could participate. I had a thought some time ago about using triggers if some of the features come together, like rsi, macd and moving averages conditions and backtesting. But after some testing I found out this is an unbalanced class problem as these conditions do not occur a lot of time so I stopped there after bad results with LSTM’s. I had the same issue, unbalanced class, with divergence conditions what was also promising at first sight. Would want to test with tick data but it’s very hard to find tick data.
So i am very interested and will monitor this thread for new insights.

I would be also very interested in the pytorch + pyro combination.

This sounds very similar to the approach outlined in this paper:


(However, so far I didn’t found a pyro implementation of it and I’m still working on my pyro skills.)

I’m happy if you can recommend stuff in that direction. :slight_smile:

@MicPie
I don’t know more than you do but here is a starting point:

Financial forecasting with probabilistic programming

https://medium.com/@alexrachnog/financial-forecasting-with-probabilistic-programming-and-pyro-db68ab1a1dba

1 Like

@gevezex

Yes you can participate soon. I’m preparing to share data & code within the next few days. @oguiza suggested to launch a competition to predict S&P500 and I think that’s the way to go. I prepare the release of my related work as a baseline to get things started.

You get 1 minute tick data for free from AWS:

https://registry.opendata.aws/deutsche-boerse-pds/

And a sample prediction system done in TensorFlow:

2 Likes

I have been following this thread silently, but I am veryinterested in TS forecasting as well .
I have tried XGboost and Tabular model to forecast time series, but with pretty poor results, so the LSTM approach looks interesting.
My data is composed of various TS (power consumption, meterological data, calendar data) and I need to forecast the power consumption for the next hour, using the other TSs and the previous power data. I can update in real time my model with the actual data every hour.

The super meta model showed to predict Goldman Sachs looks very suspicious, but not completely wrong. Anyone was able to reproduce the results?
Any advice appreciated.

Hi Thomas,
It’s great that you joined this group.
It’s a bit difficult to give any hint without seeing the data, but if they are equally-spaced (1h), I think I’d start with either a ResNet or an FCN model on the raw data. These models work pretty well on multivariate time series data. You may want to take a look at this link where a review article is summarized.

Maybe I am not getting something, but using resnets and CNN is for classification purpose no?
I have equally spaced data (5min) for each TS.

Not necessarily. MLPs, RNNs and CNNs can all be used for forecasting problems.
The ResNet model I talk about is a model with a resnet-like structure, but adapted to time series (this is described in the link I shared before). Instead of hading 2d convolutions it has 1d convolutions. The rest is similar to a traditional resnet model.
This TS ResNet model takes an 2d array as an input (not an image) but the filters are only convolved in 1 dimension along the time axis.
In this case, instead of predicting classes, what we want is to predict an amount. So the output of the last fully connected layer will be a single float instead of multiple floats (one per class). So classes should be set to 1. The other thing you need to do is to use an adequate loss (like for example MSELoss) and the metrics you want to optimize.

My data looks like this:


I am trying to forecast P as a function of TS, TOUT and the Date, I am currently working with Tabular model and XGboost, but it is not working that good.
Also, my model will act as Control, so I want to use it online and update with the real values of P that I will be getting.

I get your idea of 1D - Resnet , but how you would set up the data to train such a model?
Would you cut the TS in smaller pieces, for instance in n+1 values, where you would use the n preceding values to predict the n+1? How would you incorporate Dateime data (categorical)? Would you join a tabular model to the output of the feature map of the resnet?

1 Like

One option would be to split data first between train and test (or train, val and test), so that there is no overlap between them.
25
This example, just includes 1 val and 1 test fold, but you could have more using a walk-forward approach.
57

Now, within each of the datasets, you will need to use a sliding window to create the X_train, y_train, etc as you describe. You would have 2 features (TS and TOUT). So if your sliding window is of size 20, you would create an 3d array of size (n_samples, 2, 20). In this way, none of the train samples would overlap with val or test samples.15

As to time, it really depends whether you think it has some predictive power or not, and how long your sliding window is. You could extract features from time like minute, hour, day/night, day of week, month, etc if you think those could have an impact on the prediction. You could treat those time features as the other 2 features. If you have extract 1 time feature, you would then feed an n_samples, 3, 20 shape array into the model.

I mentioned ResNet and FCN becuase they’ve been shown as effective multivariate time series models, and may give you a good starting point. But of course you may also use many other models: non-DL, LSTM, GRU, etc. or hybrid models.

4 Likes