[Solved] Auto-regression with TabularModel - Zero accuracy Why?

marvin · March 21, 2019, 11:57am

[Solution]

add drop=false to add_datepart
add_datepart(train_df, “Date”, drop=False)
add_datepart(test_df, “Date”, drop=False)
Apply all feature engineering to train & test
Remove NaN’s from train & test
test_df = test_df.fillna(0)
train_df = train_df.fillna(0)

The last one was a bit of a surprise, but I found the solution here in the forum:

The classifier works now. Accuracy is 80%, which is ~10% better than the best LSTM model I did with Keras and the first base line with the TabularModel before any optimization was done.

[Update: Mar/22/2019]
Most of the hazzle seems to come from incorrect data pre-processing; It looks like I was looking at an outdated example notebook from the 2018 course. Meanwhile, I found the latest (2019) Rossmann example in Lesson 6 and I am working now to apply that pre-processing. I update the post later on.

Hi,
Last week, I started with getting through the part-1 course and so far, lesson 1- 4 were quite helpful to get my mini-project started. So far, most of the documentation and code examples were good but after having setup a model, I am getting zero accuracy so there must be something wrong with the way I do things.

I am dealing with an “auto-regression” problem, that is, I want to use earlier time periods of the dependent variable (and its features) as predictors for values in future time. My dataset has about 5k of data with 168 features from which I want to predict 4 dependent variables.
For now, predicting just one target variable with a subset of 18 features is sufficient to get something working.

The Rossman example used the “ColumnarModelData”, which seems to create a model for a specified column. Now, that makes a ton of sense to me since auto-regression is essentially working on a column of time series of data. Unfortunately, that 'ColumnarModelData" is gone in fast.ai -1.

Next, I was looking in the tabular data model and when following the “adult” example, I get some code working, but no usable model. Specifically, I did:

Separated categorial from continuous columns
Converted data to categories
Added data processors: [FillMissing, Categorify, Normalize]
Split data by index
Created a test data set as TabularList
Created a databunch
Created a tabular learner

The last step is where I am unsure whether its the right thing because the learner should be an RNN or column learner or whatever gets auto-regression done in fast.ai.

code gist:

gist.github.com

https://gist.github.com/marvin-hansen/a65d1f38201ab8572fb6065a3db92d27

Auto-Regression.py

#Lib versions

#Python Version: 3.6.7
#Pandas Version: 0.24.2
#Numpy Version: 1.16.2
#FastAI Version: 1.0.50.post1
#PyTorch Version: 1.0.1.post2

#GPU Acceleration
#GPU: NVDIA K80

This file has been truncated. show original

Note, the underlying sample data is just 5k of numerical data (~500kb) so fast runtime is expected. A simple LSTM model with Keras gets an accuracy of about 70% so I thought fast.ai can beat this with a bit of tweaking.

However, when I run the code shown in the gist, I get an accuracy of zero and a pretty abnormal valuation loss. When reading the API Doc, I simply cannot figure out whether the tabular learner operates row or column wise. The results indicate row-wise, but that’s just a guess.

My questions:

Is the tabular_learner the right learner for autoregression on a time-series data?
I have read a few times that RNN / LSTM isn’t the best choice anymore for auto-regression because a fully connected layer can perform better. How can I do that in fast.ai to predict future values in a time-series dataset?
Can I predict more than one dependent variable or do I have to create four different models when I want to predict four dependent variables?

I am thinking increasingly about writing up and contributing a time-series tutorial, so any help to make things work is greatly appreciated.

gevezex · March 21, 2019, 1:46pm

I am not able to answer all your questions but maybe an answer for q1:

Tabular learner is just a fully connected layer where the inputs are the cont_names AND the embeddings of cat_names. Jeremy uses 2 hidden layers in his video (1000 and 500 hidden units I think for rossman data) with one output for Sales. I don’t think this is the best model for a time-series data. Actually there is a time dependence in sales data (more sales on Saturdays for example), but that is been solved by the embeddings of the date column separated in different categories (day of year, weekend or not etc).

You probably need an LSTM or GRU for your problem.

As Jeremy only showed RNN for NLP type of problems it is hard do come with a solution for a non NLP problem as we need part 2 of the course to build our own models. I am also waiting for that part so I can create a model for stock data prediction.

marvin · March 21, 2019, 2:35pm

Thank you,

indeed, adopting RNN from NLP seems pretty challenging at this stage.

For your stock prediction, you can do something in Pytorch in the meantime.
Take a look at this article & code:

http://chandlerzuo.github.io/blog/2017/11/darnn

Looking further ahead, there is a ton of IoT sensor data out there and usually, these are time series. Predicting them in real-time is going to become increasingly relevant over time. Anyway, thank you.

pnema · July 20, 2020, 9:59pm

Hi Did you finally adapt NLP RNN for stock prediction or any other time series problem. I am keen to learn that and facing similar issues. My MAPE 20% if RossMann approach while my Naive predictions (taking previous value as prediction has MAPE of 10% only).

pnema · July 22, 2020, 3:51am

Hi Marvin,

Did you finally get any solution for auto regression target variable problem ? I am kind of in the same boat. Also did you ended using LSTM/RNN for tabular data ? Or did you create any lag variable (last week sales etc ?) and if so did you normalize them ?

Thanks

marvin · July 22, 2020, 5:22am

Hi @pnema

Yes and no.

No, I don’t use LSTM/RNN for any financial time series data although it can be done with more effort and other libraries. Results, however, aren’t worth the effort.

Yes, time series can be done extremely well with deep learning. I would say, gluon-ts is the only contender in the field and constantly establishes SOTA results. I think I got the mean error down to 5% when I tweaked my gluon-ts model but for my application an error below 2.5% was the target and that wasn’t feasible back then. I don’t use it at the moment, but in case I would have a use cases with less stringent error requirements, gluon is the only kid on the block. And quite frankly, it just works out of the box.

https://gluon-ts.mxnet.io/

For stock market data, I found deep learning largely insufficient but I can say that deep learning is actually used for pattern matching on binary encoded data because, after all try & error, it’s the only thing that actually works reliably. Currently, I study automated trading systems under the four times world champion while collaborating with a quantitative investment fund.

However, I cannot share any technology details of that work for obvious reasons.

Here you have it, go with gluon-ts for time series data, or if you want to do automated trading systems, forget about deep learning because you have to learn it first from scratch to save you many very hard years.

That said, I am not actively using fast.ai anymore for over a year now so I cannot comment any further.

Best

pnema · July 22, 2020, 4:31pm

Thanks Marvin. My problem is more like rossmann one where I have a time series target variable like sales which is also dependent on store, city, day of week etc. and I do not think I can use gluon-ts to predict that. It is more like regression problem with auto-regression component to it. I also read your another post wherein you mentioned that creating features like " adding another feature that measures the distance from the moving average immediately improves accuracy." - can you please elaborate on how did you calculate distance from moving average ? Appreciate any feedback /guidance.

Thanks

marvin · July 23, 2020, 2:16am

@pnema

have you tried gluon-ts yet?

gluon-ts and some custom models are used in production at Amazon to forecast monthly sales and revenue and they use that to adjust marketing expenditures to offset pro-actively a predicted drop.
Just saying…

From memory, my time series data were

non-stationary
semi-seasonal
heteroskedasticity

Non-stationary didn’t allow for several mean adjusted techniques. Thus the exploration of moving average.

Semi-seasonal means there was no clear seasonality i.e. June/July drop in sales of winter items etc. but there was some time regularity i.e. dip every third Friday of a month due to market structures.

Tackling heteroskedasticity is a bit tricky, so the best I came up back then was smoothing things out through moving averages, ratio calculations, and binary encoding. Working on absolute numbers proved to be futile.

I did roughly speaking this:

A simple regression model with few features -> poor results
Feature ranking -> Few given features were actually useful
Feature generation -> moving averages, Bollinger band, Pivot points, Fibnonacci’s etc
Feature ranking -> Some generated features were really useful out of the box
Higher-order feature generation (features of features)
Ranking & tuning
7 Repeat

Feature generation:

used TALib, a Python lib for quant finance
wrote a generator function that produced moving averages for 1 to 250 days to rank the significance
From memory, 20, and 200 day moving average was most useful on daily data

Temporal features

Date decomposition i.e. day, month, year, quarter, year etc.
Simple moving average (SMA)
Change in momentum

Historial features (Usually, 3 to 5 days back helped, but nothing beyond 5 days)

Previous n-day percentage change
Previous n-day moving average
Previous n-day momentum / change in momentum

Higher-order feature generation, for example:

Current price divided by n-day moving average => Distance from mean ratio
Current price divided by upper / lower Bollinger band => Distance from 2 stnd dev. range
Current price divided by high / low

Binary encoded features:

is high/low
above / below support / resistance
above / below moving average
above / below upper / lower Bollinger band

Trend detection features

Price today higher / lower than Price yesterday
Price today higher / lower than yesterday, and the day before yesterday
Price today higher / lower than 5 days ago (again, 3 to 5 days max were useful)

I wrote a custom pre-processor library to do all that in an automated way. it’s still on GH:

When you implement a new data-loader to load your CSV data, you can roughly re-use the remaining codebase given the data-structure matches the assumed OHLC. Volume is at most optional and can be left out as I was unable to find any volume study that improves the model.

You can replace “Price” with revenue, COGS, or whatever measures your financials.

Also, instead of working on total numbers, I used to convert these into daily percentage changes as these were much easier to predict with a much lower total error.

A much smaller subset of these features applied to a novel model in gluon-ts was immediately outperforming fast.ai by an order of magnitude. I cannot remember the name of the model anymore, but it was build and published by some smart cookies at MIT and Amazon was porting it to gluon-mx. Although the implementation was pre-release back then, the actual results were just staggering.
MAPE was hovering around 3% before advanced feature engineering. If that’s good enough for you, give gluon a go and try all their models b/c these are pretty damn good.

Again, I that’s all from my limited memory. You may want to look at the util code for pre-processing and feature generation.

However, I cannot support this kind of work as the project has been closed a long time ago and I don’t maintain the code anymore.

Utils for feature generation

Documented results:

github.com

marvin-hansen/StockUtils/blob/master/src/procs/ProcFlow.py

from src.procs import Procs as p

"""
Pre-processor-worklows (ProcFlows) simplify data pre-processing as each apply a well 
specified formula of how to prepare the data. 

Usage example: 

     # Create a ProcFlow
     pf = ProcFlow(DBG)
     # Call the selected ProcFlow with ID=3 
     data = pf.proc_switch(data=df_all, stock=stock, y_col="Close", nr_n=5, proc_id=3)
     # Parameters:
     # data - data from the default data-loader. By convention, the DataLoader does column renaming
     # stock - Stock ticker. Required to pull technical indactors that match the data for the ticker
     # y_col - The "prediction" field, or the main attrbiute. Often the "Close" price, but really can be anything
     # nr_n - A parameter to certain procs. For example, next_N takes y and n as a parameter and adds the next n instances of y
     # proc_id= the id of pre-defined ProcFlows. Currently, only 1 - 3 procs are there, but custom procs can be added

This file has been truncated. show original

pnema · July 23, 2020, 10:31pm

Super helpful Marvin. Appreciate the note. I will study gluon-ts in more details and give it a try; and thanks for sharing your pre-processing code.