Time series/ sequential data study group

fastai_geek · December 1, 2021, 9:31am

Hello Folks,

I appreciate this brilliant initiative to make a top-of-the-art time series model accessible to everyone (especially non-geek personals).

I have the following challenge in hand, which I hope to have your support. I want to make a time series analysis of network traffic stored as a CSV file. Each row in the file has metadata (several features like IP address, port ID, protocol, timestamp…) of the captured packet, and the last column have a label indicating the category of the traffic. Here is the dataset structure:

Some concepts related to time series, like sequence and time steps, are not clear for me. I used the preprocess_df and df2xy functions to format the dataset and split X and y. The model is providing good results. However, I am not sure whether this is the correct procedure and if I have to make any special preprocessing on the dataset to make it suitable for time series analysis (sliding window, time steps)?

Could anyone please provide reference to information where I can find a reasonable explanation about these concepts?

oguiza · December 3, 2021, 9:21am

Hi @fastai_geek ,
It looks to me that your data fit into the tabular data category (where the rows are independent from each other). If that is the case, you don’t need to use create a numpy array (you don’t need to use df2Xy). You may need to preprocess your data (for example add some new features through feature engineering). Once you have the dataframe ready you can use fastai tabular or tsai tabular (which is very similar, but includes at least an additional tabular transformer model).

oguiza · December 3, 2021, 9:41am

Hi all,

I just wanted to share with you that timeseriesAI’s tsai library that started as a project out of this thread has reached 1k+ in GitHub this week!

Many people have made this possible. I’d like to thank all those who have somehow contributed with their ideas, discussions, PRs, issues, etc. This wouldn’t have been possible without you. And I mean it! THANK YOU!!

fastai_geek · December 4, 2021, 5:55am

Hi @oguiza ,
Appreciate your feedback.
Each row has a metadata about one packet which belong to a session between source and destination devices. I will implement the approach that you proposed and compare the results.
Br

fastai_geek · December 23, 2021, 3:02pm

Hello Everyone,
I want to train a model using MiniRocket (Pytorch implementation). Am having issue, when I use the SlidingWindow to split the data and introduce time step in the dataset. However, it works when I use the df2xy to split the data. Nevertheless, the SlidingWindow works using others architecture like XCM, TST. Here are the code (I can provide additional information if required:

splits = get_splits(new_y, valid_size=.5, balance=True, stratify=True, random_state=23, shuffle=True)
tfms  = [None, [TSClassification()]]
batch_tfms = [TSStandardize(by_sample=True)]
dsets = get_ts_dls(X, new_y, tfms=tfms, splits=splits, batch_tfms=batch_tfms, inplace=True)
dls = get_ts_dls(X, new_y, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
model = build_ts_model(MiniRocket, dls=dls)
learn = Learner(dls, model,metrics=metrics, cbs=config["cbs"])
learn.save('stage0')

After a while, I get this error:

More logs about the error:


epoch	train_loss	valid_loss	balanced_accuracy_score	precision_score	recall_score	fbeta_score	roc_auc_score	time
100.00% [5419/5419 00:24<00:00 nan] 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_7952/3124617560.py in <module>
      1 start = time.time()
      2 #with ContextManagers([learn.no_logging()]):
----> 3 learn.fit_one_cycle(config["n_epoch"], lr_max=config["lr"])
      4 print(time.time() - start)
      5 learn.save('stage1')

~/.local/lib/python3.8/site-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    114     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    115               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 116     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    117 
    118 # Cell

~/.local/lib/python3.8/site-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    219             self.opt.set_hypers(lr=self.lr if lr is None else lr)
    220             self.n_epoch = n_epoch
--> 221             self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
    222 
    223     def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

~/.local/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    161 
    162     def _with_events(self, f, event_type, ex, final=noop):
--> 163         try: self(f'before_{event_type}');  f()
    164         except ex: self(f'after_cancel_{event_type}')
    165         self(f'after_{event_type}');  final()

~/.local/lib/python3.8/site-packages/fastai/learner.py in _do_fit(self)
    210         for epoch in range(self.n_epoch):
    211             self.epoch=epoch
--> 212             self._with_events(self._do_epoch, 'epoch', CancelEpochException)
    213 
    214     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):

~/.local/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    161 
    162     def _with_events(self, f, event_type, ex, final=noop):
--> 163         try: self(f'before_{event_type}');  f()
    164         except ex: self(f'after_cancel_{event_type}')
    165         self(f'after_{event_type}');  final()

~/.local/lib/python3.8/site-packages/fastai/learner.py in _do_epoch(self)
    205     def _do_epoch(self):
    206         self._do_epoch_train()
--> 207         self._do_epoch_validate()
    208 
    209     def _do_fit(self):

~/.local/lib/python3.8/site-packages/fastai/learner.py in _do_epoch_validate(self, ds_idx, dl)
    201         if dl is None: dl = self.dls[ds_idx]
    202         self.dl = dl
--> 203         with torch.no_grad(): self._with_events(self.all_batches, 'validate', CancelValidException)
    204 
    205     def _do_epoch(self):

~/.local/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    163         try: self(f'before_{event_type}');  f()
    164         except ex: self(f'after_cancel_{event_type}')
--> 165         self(f'after_{event_type}');  final()
    166 
    167     def all_batches(self):

~/.local/lib/python3.8/site-packages/fastai/learner.py in __call__(self, event_name)
    139 
    140     def ordered_cbs(self, event): return [cb for cb in self.cbs.sorted('order') if hasattr(cb, event)]
--> 141     def __call__(self, event_name): L(event_name).map(self._call_one)
    142 
    143     def _call_one(self, event_name):

~/miniconda3/lib/python3.8/site-packages/fastcore/foundation.py in map(self, f, gen, *args, **kwargs)
    152     def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step))
    153 
--> 154     def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs))
    155     def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs))
    156     def argfirst(self, f, negate=False): return first(i for i,o in self.enumerate() if f(o))

~/miniconda3/lib/python3.8/site-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs)
    664     res = map(g, iterable)
    665     if gen: return res
--> 666     return list(res)
    667 
    668 # Cell

~/miniconda3/lib/python3.8/site-packages/fastcore/basics.py in __call__(self, *args, **kwargs)
    649             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    650         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 651         return self.func(*fargs, **kwargs)
    652 
    653 # Cell

~/.local/lib/python3.8/site-packages/fastai/learner.py in _call_one(self, event_name)
    143     def _call_one(self, event_name):
    144         if not hasattr(event, event_name): raise Exception(f'missing {event_name}')
--> 145         for cb in self.cbs.sorted('order'): cb(event_name)
    146 
    147     def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state)

~/.local/lib/python3.8/site-packages/fastai/callback/core.py in __call__(self, event_name)
     43                (self.run_valid and not getattr(self, 'training', False)))
     44         res = None
---> 45         if self.run and _run: res = getattr(self, event_name, noop)()
     46         if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit
     47         return res

~/.local/lib/python3.8/site-packages/fastai/learner.py in after_validate(self)
    517     def before_validate(self): self._valid_mets.map(Self.reset())
    518     def after_train   (self): self.log += self._train_mets.map(_maybe_item)
--> 519     def after_validate(self): self.log += self._valid_mets.map(_maybe_item)
    520     def after_cancel_train(self):    self.cancel_train = True
    521     def after_cancel_validate(self): self.cancel_valid = True

~/miniconda3/lib/python3.8/site-packages/fastcore/foundation.py in map(self, f, gen, *args, **kwargs)
    152     def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step))
    153 
--> 154     def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs))
    155     def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs))
    156     def argfirst(self, f, negate=False): return first(i for i,o in self.enumerate() if f(o))

~/miniconda3/lib/python3.8/site-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs)
    664     res = map(g, iterable)
    665     if gen: return res
--> 666     return list(res)
    667 
    668 # Cell

~/miniconda3/lib/python3.8/site-packages/fastcore/basics.py in __call__(self, *args, **kwargs)
    649             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    650         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 651         return self.func(*fargs, **kwargs)
    652 
    653 # Cell

~/.local/lib/python3.8/site-packages/fastai/learner.py in _maybe_item(t)
    471 # Cell
    472 def _maybe_item(t):
--> 473     t = t.value
    474     try: return t.item()
    475     except: return t

~/.local/lib/python3.8/site-packages/fastai/metrics.py in value(self)
     65         preds,targs = torch.cat(self.preds),torch.cat(self.targs)
     66         if self.to_np: preds,targs = preds.numpy(),targs.numpy()
---> 67         return self.func(targs, preds, **self.kwargs) if self.invert_args else self.func(preds, targs, **self.kwargs)
     68 
     69     @property

~/miniconda3/lib/python3.8/site-packages/sklearn/metrics/_ranking.py in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
    544     y_type = type_of_target(y_true)
    545     y_true = check_array(y_true, ensure_2d=False, dtype=None)
--> 546     y_score = check_array(y_score, ensure_2d=False)
    547 
    548     if y_type == "multiclass" or (

~/miniconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    790 
    791         if force_all_finite:
--> 792             _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    793 
    794     if ensure_min_samples > 0:

~/miniconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    112         ):
    113             type_err = "infinity" if allow_nan else "NaN, infinity"
--> 114             raise ValueError(
    115                 msg_err.format(
    116                     type_err, msg_dtype if msg_dtype is not None else X.dtype

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

oguiza · December 23, 2021, 5:26pm

Could you please open an issue on the tsai repo?

s.s.o · February 5, 2022, 7:28pm

ETSformer: Exponential Smoothing Transformers for Time-series Forecasting

Transformers have been actively studied for time-series forecasting in recent years. While often showing promising results in various scenarios, traditional Transformers are not designed to fully exploit the characteristics of time-series data and thus suffer some fundamental limitations, e.g., they generally lack of decomposition capability and interpretability, and are neither effective nor efficient for long-term forecasting. In this paper, we propose ETSFormer, a novel time-series Transformer architecture, which exploits the principle of exponential smoothing in improving Transformers for time-series forecasting. In particular, inspired by the classical exponential smoothing methods in time-series forecasting, we propose the novel exponential smoothing attention (ESA) and frequency attention (FA) to replace the self-attention mechanism in vanilla Transformers, thus improving both accuracy and efficiency. Based on these, we redesign the Transformer architecture with modular decomposition blocks such that it can learn to decompose the time-series data into interpretable time-series components such as level, growth and seasonality. Extensive experiments on various time-series benchmarks validate the efficacy and advantages of the proposed method. The code and models of our implementations will be released.

@oguiza , you might find it iteresting.

oguiza · February 6, 2022, 3:42pm

Thanks for sharing @s.s.o! Agree it looks interesting.

Pomo · February 22, 2022, 4:44am

Hola @oguiza et al.,

Can you advise me about this time series problem? I decided to post publically so that someone else may someday benefit from my confusion.

I built a model that processes a time series in parallel, taking the entire training ts at once as one batch. At one point the model has tensor X of shape [4000,11000]. The ts is of length 4000, with 11000 features (float) at each time point. I want to use Dropout on the features to avoid overfitting to the many features.

nn.Dropout()(X) will zero features randomly in X. For example, it may zero feature 1002 at time point 121, and zero feature 5001 at time point 321. It bothers me that different features are zeroed at different time points.

First question: is this how Dropout is supposed to be used with time series?

It seems to me that you would prefer to zero features entirely at every time point. For example, zero features 1002 amd 5001 at every time point in the series, before sending to the next layer.

2nd question: does this idea make sense?

And third question: do you know how to do this in PyTorch? I have experimented with nn.Dropout and F.dropout2d without success.

I apologize for not using your fine tsai package. I am working in pure PyTorch with a little bit of fastai.

Malcolm

oguiza · February 22, 2022, 9:49am

Hi @Pomo, great to hear from you!

If you use plain Dropout, your description is accurate. That’s how it works.
As to your question, I think it depends on where you want to use the Dropout layer. If you use it between Conv layers or even Transformer layers, it probably won’t work well due to feature autocorrelation.
If you use it at the end of the model, right before an MLP layer, where there are only 2 dimensions, it’s probably ok.

I actually think it’s a great idea!!!
It reminds me of DropBlock: A regularization method for convolutional networks:

Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout. Thus a structured form of dropout is needed to regularize convolutional networks. In this paper, we introduce DropBlock, a form of structured dropout, where units in a contiguous region of a feature map are dropped together. (Ghiasi, 2018)

What you are proposing is similar. It makes sense in your case where there are many features. You can zero out some of the features based on a probability p.

Here’s an implementation you can try. It works with inputs of shape:
[batch size x features x seq_len]. If you have a different shape you will need to adapt the input.

It drops a % of features from each sample (all time steps):

class FeatureDropout(nn.Module):
    "During training, randomly zeroes some of the features per sample with probability `p`"
    def __init__(self, 
        p:float = 0.5 # probability of an element to be zeroed. Default: 0.5
        ):
        super().__init__()
        if p < 0 or p > 1:
            raise ValueError(f"p has to be between 0. and 1., but got {p}")
        self.p = p

    def forward(self, 
        X: Tensor) -> Tensor:
        if self.training and self.p:
            mask = torch.rand_like(X[..., 0]) > (1 - self.p)
            X = X.masked_fill(mask.unsqueeze(-1), 0)
            X = X / (1-mask.float().mean()) # scales remaining features during training (inverted dropout)
        return X

Is this what you are looking for?

no need to apologize! Use whatever fits your needs.

Pomo · February 22, 2022, 10:33pm

Thanks for your encouraging reply! I will give it a try and let everyone know if there is any promise.

I think I see a “tricky” way to to this operation directly in PyTorch. I will post here if successful.

Thanks again, and good to hear from you too.

Malcolm

Pomo · February 23, 2022, 3:05am

Hi again. Your code does exactly what I am looking for.

I wrote a layer that I think does the same thing as the code you posted above. The key is to use F.dropout2d(). It is intended to zero an entire image belonging to a channel. If instead you misuse F.dropout2d() with an “image” that is [time-point, feature], it seems to do exactly what I want.

class FeatureDropout2(nn.Module):
    "During training, randomly zeroes some of the features per sample with probability `p`"
    def __init__(self, p:float = 0.5, inplace = False): # probability of an element to be zeroed. Default: 0.5
        super().__init__()
        if p < 0 or p > 1:
            raise ValueError(f"p has to be between 0. and 1., but got {p}")
        self.p = p
        self.inplace = inplace

    def forward(self, X): # [batch size x features x seq_len] (X must include a batch size.)
        return F.dropout2d(X, self.p, self.training, self.inplace)

Caveats:

The original code scales using the actual fraction of zeroed features. I think that F.dropout2d scales by 1/(1-p) as a shortcut.
It ought to run faster on GPU, but I have not tested this.
You could use nn.Dropout2d() directly if you do not mind your code being completely incomprehensible to any future reader. FeatureDropout2=nn.Dropout2d (with explanatory docs) would be better.

I have not yet tested whether feature dropout improves overfitting.

Thanks so much for your help and for your contributions to the community!

Malcolm

Pomo · February 28, 2022, 11:56pm

Hi everyone,

Here’s what I found out after some very informal experiments.

Feature dropout does help with overfitting, yet not enough to make my model predictive. But the surprise is that regular Dropout (timepoints x feature zeroed) works better than dropping out entire features (across all timepoints). The training loss was similar, but the validation loss (generalization) was much better with standard nn.Dropout on the activations.

Some details…

11191 features
4400 timepoints
50% of activations zeroed
Simplified model based on nn.Linear and MSE loss
Stop training when validation loss stops improving

In hindsight, I am less surprised because zeroing a (timepoint x feature) will force the model to select a more robust set of features that approximates the target at every timepoint. There is probably some mathematical argument that confirms this. Still, I had hoped that feature dropout would prove to work better than regular dropout.

The conclusion is based on only 20 runs or so, and on one particular model and time series dataset. So YMMV. I do not have the interest to investigate further. But the code for FeatureDropout can be found above in case anyone wants to research further.

I hope this observation helps someone, someday!

Malcolm

oguiza · March 1, 2022, 7:08am

Thanks for sharing Malcolm!

Interneuron · March 5, 2022, 2:24am

Anyone have a good implementation of HDC-minirocket working with tsai already?

folgertk · May 3, 2022, 7:17am

This study group truly is an amazing resource! I was hoping someone could help me out with the following problem:

I’m working on a problem where sets of time series are linked to one label. The size of the sets varies, and the series from one set cannot be easily matched or aligned with the series from another set. The problem is, in other words, not translatable to a common multi-variate classification problem, but rather to a “bag of time series” classification problem.

Does anyone have any experience with such a setting? Or does anyone have any suggestions on how to address this problem? Obviously, I can consider the series in each set as independent, but I expect that information can be exploited by taking their co-occurrences into consideration.

oguiza · May 3, 2022, 7:55am

Hi @folgertk,
I’ll just share a quick thought.
You might consider adding an extra channel with the set # information. This would be a categorical variable that would need to be passed through an embedding layer and then concatenated to the other channels you have. In this way, the model would be able to learn from the time series taking into account the set number.
As a bonus, you get the embeddings for the sets, which will allow you to see how close/ different they are (in case you are interested in this).

folgertk · May 3, 2022, 8:09am

That’s actually pretty clever! Yes, ultimately, I need a representation to compare sets of series, which will then form the input for another step. This might be worth checking out. Thanks!

folgertk · May 3, 2022, 8:19am

One question, though: this wouldn’t work for unseen, test data, would it?

oguiza · May 3, 2022, 9:49am

I know nothing about your dataset, so it’s difficult to propose a solution.
The solution I proposed would be useful if you have for example certain cities and hourly temperatures between dome dates that may differ by city. In this case, the set codes would be repeated in the future.
If you have new cities, you normally assign them a code 0 to pass through the embedding layer. Or have a different model for unseen sets (which wouldn’t benefit from the set embedding).