Time series/ sequential data study group

Hello,

I want to compare the performance of an EfficientNet-B0 model trained on images with the performance of the same model trained on time series data. The problem is a multi-label classification of 9 classes.

For now I was able to create an appropriate TimeSeriesList using fastai_timeseries (timeseriesAI) :


I know that I have to define the number classes the EfficientNet-B0 model should have:

Yet it seems that I should also change the input size of the model to be appropriate with the data, because when trying:

print(learn.summary())

I get this error:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 3 3, but got 3-dimensional input of size [1, 2, 100] instead.

Can you help?

Dear Thomas,

Can you please share with me the code for regression that you have used? I am stuck at using InceptionTime for regression…
Thank you!

For regression I am using a very similar ItemLists as @oguiza timeseries library.
The only thing you need to do is change the loss_func to L2 or MSE.
So you need a LabelList where x’s are timeseries and y's are floats. Then a Learner with a compatible loss_func, the network is exactly the same. My code is something like this:

def curves_from_arrays(X_train, y_train, X_valid, y_valid, label_cls=FloatList):
    src = ItemLists('.', IVCurveList(X_train), IVCurveList(X_valid))
    return src.label_from_lists(y_train, y_valid, label_cls=FloatList)
#reading from numpy arrays
data = curves_from_arrays(X_train, y_train, X_valid, y_valid, label_cls=FloatList)

The data is x is a timeseries (an IV curve with 2 channels Voltage and Current) and the y are 3 floats. (I am regressing on 3 values)

>>data
LabelLists;

Train: LabelList (162176 items)
x: IVCurveList
IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200)
y: FloatList
[ 0.596429 -0.778986  2.887455],[ 0.191421 -0.728332 -0.999715],[ 0.766844 -0.152139 -0.545443],[-1.48883   0.756254 -0.743682],[ 0.776515 -0.732029 -0.526256]
Path: .;

Valid: LabelList (18020 items)
x: IVCurveList
IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200),IVCurve(ch=2, seq_len=200)
y: FloatList
[ 0.530938 -0.308305 -0.074202],[-2.674041  0.203413 -0.858775],[ 0.75456   0.636721 -0.159548],[0.762624 0.798379 0.160289],[ 0.74852  -0.539873 -0.589922]
Path: .;

Test: None

Then we put everything on a DataBunch

#on a databunch
db = data.databunch(bs=1024, val_bs=2048, num_workers=10)
#the network
model = create_inception(2,3)
#the learner
learn = Learner(db, model, loss_func=nn.MSELoss(), metrics=[mean_absolute_error, r2_score])

et voila!

1 Like

dear Thomas,

Thank you for your valuable help!
I was able to create time series data using the timeseriesAI (my data are time series of length 500, and 11 targets) :

    df = pd.read_csv('D:\\Projects\\PQD_classification\\NILM_Classific\\TS_AVG.csv')
db = (TimeSeriesList.from_df(df, '.', cols=df.columns.values[:500], feat=None)
      .split_by_rand_pct(valid_pct=0.2, seed=seed)
      .label_from_df(cols=['AVG_1','AVG_2','AVG_3','AVG_4','AVG_5','AVG_6','AVG_7','AVG_8','AVG_9','AVG_10','AVG_11'], label_cls=FloatList)
      .databunch(bs=bs,  val_bs=bs * 2,  num_workers=0,  device=torch.device('cuda'))
      .scale(scale_type=scale_type, scale_by_channel=scale_by_channel, 
             scale_by_sample=scale_by_sample,scale_range=scale_range)
     )
db



Still, once that’s done, I find two problems:
-First, has to do with the learning rate finder. The validation loss during the search is always #nan.

arch = InceptionTime # :eight_spoked_asterisk:
arch_kwargs = dict() # :eight_spoked_asterisk:
opt_func=Ranger
model = arch(db.features, db.c, **arch_kwargs).to(device) #db.c=11
learn = Learner(db, model, metrics= [mean_absolute_error, r2_score], opt_func=opt_func,loss_func= nn.MSELoss())
learn.lr_find()
learn.recorder.plot(suggestion=True)


(also the Loss is very high…I wonder if regression for this problem is doable!)

-Second, Once I try to train the model, I get : ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.
This is the code i used for the training:

from sklearn.model_selection import StratifiedKFold
skf=StratifiedKFold(n_splits=2, random_state=1, shuffle=True)
acc_val=
acc2_val=
np.random.seed(42)

import time
start_time = time.time()

for train_index, val_index in skf.split(df.index, df[‘AVG_1’]):
src = (TimeSeriesList.from_df(df, base_dir, cols=df.columns.values[:500], feat=None)
.split_by_idxs(train_index,val_index)
.label_from_df(cols=[‘AVG_1’,‘AVG_2’,‘AVG_3’,‘AVG_4’,‘AVG_5’,‘AVG_6’,‘AVG_7’,‘AVG_8’,‘AVG_9’,‘AVG_10’,‘AVG_11’], label_cls=FloatList))
data_fold = (src.databunch(bs=bs, val_bs=bs * 2, num_workers=1 , device=torch.device(‘cuda’))
.scale(scale_type=scale_type, scale_by_channel=scale_by_channel,scale_by_sample=scale_by_sample,scale_range=scale_range))
model = arch(db.features, db.c, **arch_kwargs).to(device)
learn = Learner(data_fold, model, loss_func=nn.MSELoss(), metrics=[mean_absolute_error, r2_score], opt_func=opt_func, callback_fns=ShowGraph)
learn.fit_one_cycle(2, slice(lr2))
loss,acc,acc2 = learn.validate()
acc_val.append(acc.numpy())
acc2_val.append(acc2.numpy())

print(“— %s seconds —” % (time.time() - start_time))

Do you have any input on the problems? Thank you again for all!

I can give you some Hints that worked for me:

  • Regression needs to prepare the data to the task. I normalize my input time series and my output variables also.
  • Regressing on 11 variables is a hard problem, try with 1 first.
  • Play with loss functions, I found that MAE (mean absolute error) worked better for me than the MSE (L2 norm).

Your second bug is from sklearn, you are using the split method wrong.
I got pretty stable results for my problem after doing this.

1 Like

Here is a recent paper using transfer learning with time series
https://www.researchgate.net/publication/337095012_A_Novel_Approach_to_Short-Term_Stock_Price_Movement_Prediction_using_Transfer_Learning

4 Likes

Amazing! Thank you for setting up this repo. I followed the 01_intro notebook to build a weather forecasting model. It trains well but now I’m stuck with doing inference and I can’t quite get it working. I’d like to try it in the real world which means I don’t have the test data at the time of defining the initial databunch, only afterwards and as individual sequences.

What’s the best way to do inference on individual sequences on-demand and use the same scaling as in training? The new data sequence is a dataframe like this:

feat 0 1 2 3 4 5 6 7 target
0 12 11 13 12 10 7 9 10 None
1 7 6 6 8 9 5 6 6 None
2 9 10 9 8 8 6 7 6 None

Thanks a lot for your work!

Hi @angusde, how does Rocket handle seasonality trends ?.

Hi,

I build an library to do univariate forcasting it is based on nbeats. I adapted a lot because initially it was quite hard to train. Now it trains very fast. I also added some features to interpret to model.

Please let me know what you guys think. I’m thinking of riding up my changes in a blog post and/or expanding the library. But I would love your feedback first.

github: https://github.com/takotab/fastseq
docs: https://takotab.github.io/fastseq//index.html

11 Likes

Wow, that’s great! I was starting to read the paper just a few days ago.

How are the results with fastai compared to the results from the paper?

The full m4 does not fit in my memory but it at least is up there. I have not done a full training with switching datasets. Help (or better ideas to circumvent this issue) in this direction would be appreciated.

Saw that Google has a new model for time-series forecasting using transformer, maybe someone is interested in it.


9 Likes

I am currently trying that challenge, really cool seeing it here. I am new to ML/DL so I am struggling with the approach.

Did you find a approach to using a Time Series Regression for this challenge? I tried using Tabular but the results are meh. It only views the individual CDMs and not as a timeseries.

The timeseriesAI repo has been helpful but I am struggling to get the kelvins challenge data into the right format for a Databunch. Maybe you figured out a way to do it?

Hi, welcome to the community!

We tried different approaches, not all of them focused on DL. Our best try was to use LSTMs using only the last values of the target time series. However, the results of the leaderboard showed that ML was not playing a huge role though, probably because of the differences between the training and test sets.

For using timeseriesAI and Databunchs, you have to have each of the time series in a different row, and if you go for multivariate, the order of the rows is important and must be preserved across different variables.

I think I wrote a function to transform the input dataset from the challenge to a format for timeseriesAI. It is programmed in R though, but I can share it with you if you are interested.

Best!

1 Like

Dear community,
I would love to hear from you what you currently consider “best practice” for working with time series data in fastai.
Do people stick to tabular transformations or use functionalities originally intended for text?

1 Like

Thanks for the response! I would be interested in the R code. Maybe I can use the underlying idea to make it work with python.

For those who are interested in fastai v2, I shared my timeseries module for fastai v2 in this post. You will found more information there as I’m avoiding duplicating the same information over here.

Here is a crazy idea that I want to bounce with you.
I have a time series classification problem.
My users wants to see if in 6 hours we are going to have a maintenance issue.
I have 120+ sensors feeding data to me every minute.

Here is my approach.

  1. I’m generating an independent chart of every one of the sensor 120 of them for 6 hours with intervals of 5 minutes between them and then see if my target value 6 hours in the future is Normal or
    Error.
  2. Run it though a CNN to classify them
  3. Use CAM to highlight the charts with error to find out what are the variables that will affect the outcome in 6 hours based on the readings today.

Have any of you tried something like this?
Any pointers that I need to be aware?

do you mind to share the data so we can try it? I hope it has train and test set…

My only question is do we know if CAM will highlight the variables that were important? (Has this been successfully viewed before?) if so then it seems to check out in my book atleast