Time series/ sequential data study group

dear Thomas,

Thank you for your valuable help!
I was able to create time series data using the timeseriesAI (my data are time series of length 500, and 11 targets) :

    df = pd.read_csv('D:\\Projects\\PQD_classification\\NILM_Classific\\TS_AVG.csv')
db = (TimeSeriesList.from_df(df, '.', cols=df.columns.values[:500], feat=None)
      .split_by_rand_pct(valid_pct=0.2, seed=seed)
      .label_from_df(cols=['AVG_1','AVG_2','AVG_3','AVG_4','AVG_5','AVG_6','AVG_7','AVG_8','AVG_9','AVG_10','AVG_11'], label_cls=FloatList)
      .databunch(bs=bs,  val_bs=bs * 2,  num_workers=0,  device=torch.device('cuda'))
      .scale(scale_type=scale_type, scale_by_channel=scale_by_channel, 
             scale_by_sample=scale_by_sample,scale_range=scale_range)
     )
db



Still, once that’s done, I find two problems:
-First, has to do with the learning rate finder. The validation loss during the search is always #nan.

arch = InceptionTime # :eight_spoked_asterisk:
arch_kwargs = dict() # :eight_spoked_asterisk:
opt_func=Ranger
model = arch(db.features, db.c, **arch_kwargs).to(device) #db.c=11
learn = Learner(db, model, metrics= [mean_absolute_error, r2_score], opt_func=opt_func,loss_func= nn.MSELoss())
learn.lr_find()
learn.recorder.plot(suggestion=True)


(also the Loss is very high…I wonder if regression for this problem is doable!)

-Second, Once I try to train the model, I get : ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.
This is the code i used for the training:

from sklearn.model_selection import StratifiedKFold
skf=StratifiedKFold(n_splits=2, random_state=1, shuffle=True)
acc_val=[]
acc2_val=[]
np.random.seed(42)

import time
start_time = time.time()

for train_index, val_index in skf.split(df.index, df[‘AVG_1’]):
src = (TimeSeriesList.from_df(df, base_dir, cols=df.columns.values[:500], feat=None)
.split_by_idxs(train_index,val_index)
.label_from_df(cols=[‘AVG_1’,‘AVG_2’,‘AVG_3’,‘AVG_4’,‘AVG_5’,‘AVG_6’,‘AVG_7’,‘AVG_8’,‘AVG_9’,‘AVG_10’,‘AVG_11’], label_cls=FloatList))
data_fold = (src.databunch(bs=bs, val_bs=bs * 2, num_workers=1 , device=torch.device(‘cuda’))
.scale(scale_type=scale_type, scale_by_channel=scale_by_channel,scale_by_sample=scale_by_sample,scale_range=scale_range))
model = arch(db.features, db.c, **arch_kwargs).to(device)
learn = Learner(data_fold, model, loss_func=nn.MSELoss(), metrics=[mean_absolute_error, r2_score], opt_func=opt_func, callback_fns=ShowGraph)
learn.fit_one_cycle(2, slice(lr2))
loss,acc,acc2 = learn.validate()
acc_val.append(acc.numpy())
acc2_val.append(acc2.numpy())

print("— %s seconds —" % (time.time() - start_time))

Do you have any input on the problems? Thank you again for all!

I can give you some Hints that worked for me:

  • Regression needs to prepare the data to the task. I normalize my input time series and my output variables also.
  • Regressing on 11 variables is a hard problem, try with 1 first.
  • Play with loss functions, I found that MAE (mean absolute error) worked better for me than the MSE (L2 norm).

Your second bug is from sklearn, you are using the split method wrong.
I got pretty stable results for my problem after doing this.

1 Like

Here is a recent paper using transfer learning with time series
https://www.researchgate.net/publication/337095012_A_Novel_Approach_to_Short-Term_Stock_Price_Movement_Prediction_using_Transfer_Learning

4 Likes

Amazing! Thank you for setting up this repo. I followed the 01_intro notebook to build a weather forecasting model. It trains well but now I’m stuck with doing inference and I can’t quite get it working. I’d like to try it in the real world which means I don’t have the test data at the time of defining the initial databunch, only afterwards and as individual sequences.

What’s the best way to do inference on individual sequences on-demand and use the same scaling as in training? The new data sequence is a dataframe like this:

feat 0 1 2 3 4 5 6 7 target
0 12 11 13 12 10 7 9 10 None
1 7 6 6 8 9 5 6 6 None
2 9 10 9 8 8 6 7 6 None

Thanks a lot for your work!

Hi @angusde, how does Rocket handle seasonality trends ?.

Hi,

I build an library to do univariate forcasting it is based on nbeats. I adapted a lot because initially it was quite hard to train. Now it trains very fast. I also added some features to interpret to model.

Please let me know what you guys think. I’m thinking of riding up my changes in a blog post and/or expanding the library. But I would love your feedback first.

github: https://github.com/takotab/fastseq
docs: https://takotab.github.io/fastseq//index.html

11 Likes

Wow, that’s great! I was starting to read the paper just a few days ago.

How are the results with fastai compared to the results from the paper?

The full m4 does not fit in my memory but it at least is up there. I have not done a full training with switching datasets. Help (or better ideas to circumvent this issue) in this direction would be appreciated.

Saw that Google has a new model for time-series forecasting using transformer, maybe someone is interested in it.


9 Likes

I am currently trying that challenge, really cool seeing it here. I am new to ML/DL so I am struggling with the approach.

Did you find a approach to using a Time Series Regression for this challenge? I tried using Tabular but the results are meh. It only views the individual CDMs and not as a timeseries.

The timeseriesAI repo has been helpful but I am struggling to get the kelvins challenge data into the right format for a Databunch. Maybe you figured out a way to do it?

Hi, welcome to the community!

We tried different approaches, not all of them focused on DL. Our best try was to use LSTMs using only the last values of the target time series. However, the results of the leaderboard showed that ML was not playing a huge role though, probably because of the differences between the training and test sets.

For using timeseriesAI and Databunchs, you have to have each of the time series in a different row, and if you go for multivariate, the order of the rows is important and must be preserved across different variables.

I think I wrote a function to transform the input dataset from the challenge to a format for timeseriesAI. It is programmed in R though, but I can share it with you if you are interested.

Best!

1 Like

Dear community,
I would love to hear from you what you currently consider “best practice” for working with time series data in fastai.
Do people stick to tabular transformations or use functionalities originally intended for text?

1 Like

Thanks for the response! I would be interested in the R code. Maybe I can use the underlying idea to make it work with python.

For those who are interested in fastai v2, I shared my timeseries module for fastai v2 in this post. You will found more information there as I’m avoiding duplicating the same information over here.

Here is a crazy idea that I want to bounce with you.
I have a time series classification problem.
My users wants to see if in 6 hours we are going to have a maintenance issue.
I have 120+ sensors feeding data to me every minute.

Here is my approach.

  1. I’m generating an independent chart of every one of the sensor 120 of them for 6 hours with intervals of 5 minutes between them and then see if my target value 6 hours in the future is Normal or
    Error.
  2. Run it though a CNN to classify them
  3. Use CAM to highlight the charts with error to find out what are the variables that will affect the outcome in 6 hours based on the readings today.

Have any of you tried something like this?
Any pointers that I need to be aware?

do you mind to share the data so we can try it? I hope it has train and test set…

My only question is do we know if CAM will highlight the variables that were important? (Has this been successfully viewed before?) if so then it seems to check out in my book atleast

I don’t know. That’s what I’m trying to figure out.
If this works out. CAM will show me the variables that I will need to observe and maybe the ranges of values for the operator to correct the problem and prevent an outage

image

1 Like

How does everyone in the group keep up to date with latest research/work (particularly with time series, but may be interested with other work too)? What do you guys follow besides this group?

1 Like

@gerardo, IMHO your use-case falls more in the probabilistic forecasting. It’s a savant word to say that instead of predicting one value per time step, you predict a range of values inside a certain percentile : Image please:

As you can see our predictions fall in different percentile intervals. You can then decide that if a prediction falls outside the 90% percentile (for example), it will be considered as an anomaly and you trigger an alert for example. Bear in mind, this a simplified explanation, and one way to do anomaly detection :wink: . Another way would be using time series classification. However, in your case you are also forecasting your time series

You may check @takotab time series forecasting module fastseq (see here above) that he implemented in fastai v2 (it’s for univariate time series (one variable)) or Amazon Labs’ GluonTS Tutorial. I talked about it here

You can also search on Google for time series forecasting anomaly. Be prepared, there is a lot of information.

4 Likes