[Update Mar/25]
The mentioned article isn’t terribly useful because all utils, important code & RL models, are missing, no data, and no response from the author to any reported issue.
Notebook
Open Issues
== Original post ==
It doesn’t surprise me at all. About two months ago, I have read a long blog post about a Time Series Prediction system that uses a combined architecture, according to the author, as following:
Generative Adversarial Network (GAN) with LSTM , a type of Recurrent Neural Network, as generator, and a Convolutional Neural Network, CNN , as a discriminator. […] use Bayesian optimisation (along with Gaussian processes) and Reinforcement learning (RL) for deciding when and how to change the GAN’s hyperparameters (the exploration vs. exploitation dilemma). In creating the reinforcement learning we will use the most recent advancements in the field, such as Rainbow and PPO . [Src]
The author claims a near-perfect approximation of the time series after 200 epochs but unfortunately, has neither published the source code nor the underlying dataset. Just a bunch of plots and some code snippets. From a domain knowledge perspective, it all makes perfect sense especially w.r.t. to feature engineering. However, when looking at the deep learning architecture, it is really hard for me to discern whether the claimed results are credible or just blunder on an extraordinarily high level.
Therefore, I am trying currently to replicate the data and feature set as to run these through my base model. In a nutshell, the question is how the much simpler FCN compares to the bonkers-sophisticated model mentioned above when using about the same data and features?
In case the difference is negligible, it supports the view of the unreasonable effectiveness of FCN that often outperform RNN/LSTM. If not, well, I guess I have to explore the stack / ensemble methods.
My base model, that replicates the current Rossmann example, delivers out of the box about 80% accuracy, which is pretty damn good because a tweaked LSTM model on a similar dataset stagnates at about 70% at most. However, there is almost no feature engineering done yet, and thus, I believe I have to run a batch of experiments on a growing feature set before tweaking the model.
That said, feature engineering usually leads to diminishing returns at some point and that is exactly when a mixture of nn models usually pushes accuracy further ahead. And yes, it is very useful in cases where you really need to get the absolute best performance.