Time series/ sequential data study group

s.s.o · February 9, 2021, 8:15pm

@oguiza, I think with some changes somethings are not working with TSAI any more. You can check the 04_Intro_to_Time_Series_Regression notebook. dls.show_batch() gives below error…

tsai : 0.2.14
fastai : 2.2.5
fastcore : 1.3.20
torch : 1.7.1

IndexError Traceback (most recent call last)
in
----> 1 dls.show_batch()

d:\conda3\lib\site-packages\tsai\data\core.py in show_batch(self, b, ctxs, max_n, nrows, ncols, figsize, unique, sharex, sharey, **kwargs)
395 sharex, sharey = True, True
396 elif b is None: b = self.one_batch()
–> 397 db = self.decode_batch(b, max_n=max_n)
398 ncols = min(ncols, math.ceil(len(db) / ncols))
399 nrows = min(nrows, math.ceil(len(db) / ncols))

d:\codes\fastai_dev\fastai\fastai\data\core.py in decode_batch(self, b, max_n, full)
78
79 def decode(self, b): return to_cpu(self.after_batch.decode(self._retain_dl(b)))
—> 80 def decode_batch(self, b, max_n=9, full=True): return self._decode_batch(self.decode(b), max_n, full)
81
82 def _decode_batch(self, b, max_n=9, full=True):

d:\codes\fastai_dev\fastai\fastai\data\core.py in decode(self, b)
77 if isinstance(f,Pipeline): f.split_idx=split_idx
78
—> 79 def decode(self, b): return to_cpu(self.after_batch.decode(self._retain_dl(b)))
80 def decode_batch(self, b, max_n=9, full=True): return self._decode_batch(self.decode(b), max_n, full)
81

d:\codes\fastai_dev\fastai\fastai\data\core.py in _retain_dl(self, b)
56
57 def _retain_dl(self,b):
—> 58 if not getattr(self, ‘_types’, None): self._one_pass()
59 return retain_types(b, typs=self._types)
60

d:\codes\fastai_dev\fastai\fastai\data\core.py in _one_pass(self)
49
50 def _one_pass(self):
—> 51 b = self.do_batch([self.do_item(None)])
52 if self.device is not None: b = to_device(b, self.device)
53 its = self.after_batch(b)

d:\codes\fastai_dev\fastai\fastai\data\load.py in do_batch(self, b)
142 else: raise IndexError(“Cannot index an iterable dataset numerically - must use None.”)
143 def create_batch(self, b): return (fa_collate,fa_convert)self.prebatched
–> 144 def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
145 def to(self, device): self.device = device
146 def one_batch(self):

d:\conda3\lib\site-packages\tsai\data\core.py in create_batch(self, b)
358 self.idxs = L(b)
359 if hasattr(self, “split_idxs”): self.input_idxs = self.split_idxs[it]
–> 360 return self.dataset[it]
361
362 def create_item(self, s): return s

d:\conda3\lib\site-packages\tsai\data\core.py in getitem(self, it)
273
274 def getitem(self, it):
–> 275 return tuple([typ(ptl[it])[…,self.sel_vars, self.sel_steps] if i==0 else typ(ptl[it]) for i,(ptl,typ) in enumerate(zip(self.ptls,self.types))])
276
277 def subset(self, i): return type(self)(tls=L(tl.subset(i) for tl in self.tls), n_inp=self.n_inp, inplace=self.inplace, tfms=self.tfms,

d:\conda3\lib\site-packages\tsai\data\core.py in (.0)
273
274 def getitem(self, it):
–> 275 return tuple([typ(ptl[it])[…,self.sel_vars, self.sel_steps] if i==0 else typ(ptl[it]) for i,(ptl,typ) in enumerate(zip(self.ptls,self.types))])
276
277 def subset(self, i): return type(self)(tls=L(tl.subset(i) for tl in self.tls), n_inp=self.n_inp, inplace=self.inplace, tfms=self.tfms,

d:\conda3\lib\site-packages\tsai\data\core.py in new(cls, o, **kwargs)
22
23 def new(cls, o, **kwargs):
—> 24 if isinstance(o, (list, L)): o = stack(o)
25 res = cast(tensor(o), cls)
26 for k,v in kwargs.items(): setattr(res, k, v)

d:\conda3\lib\site-packages\tsai\utils.py in stack(o, axis, retain)
262 # Cell
263 def stack(o, axis=0, retain=True):
–> 264 if isinstance(o[0], torch.Tensor):
265 return retain_type(torch.stack(tuple(o), dim=axis), o[0]) if retain else torch.stack(tuple(o), dim=axis)
266 else:

d:\codes\fastai_dev\fastcore\fastcore\foundation.py in getitem(self, idx)
109 def _xtra(self): return None
110 def _new(self, items, *args, **kwargs): return type(self)(items, *args, use_list=None, **kwargs)
–> 111 def getitem(self, idx): return self._get(idx) if is_indexer(idx) else L(self._get(idx), use_list=None)
112 def copy(self): return self._new(self.items.copy())
113

d:\codes\fastai_dev\fastcore\fastcore\foundation.py in _get(self, i)
113
114 def _get(self, i):
–> 115 if is_indexer(i) or isinstance(i,slice): return getattr(self.items,‘iloc’,self.items)[i]
116 i = mask2idxs(i)
117 return (self.items.iloc[list(i)] if hasattr(self.items,‘iloc’)

IndexError: list index out of range

saurk · February 10, 2021, 3:03pm

Hi, I am interested in TS from rocket

oguiza · February 10, 2021, 5:04pm

Hi @saurk, welcome to the time series thread!
I’m not sure I understand what you mean. Do you mean you are interested in using Rocket in time series classification? Regression? Could you please clarify?

Pomo · February 11, 2021, 1:23am

Hello oguiza and all,

I am having trouble installing tsai to my local system. I have tried two ways:
pip install git+https://github.com/timeseriesAI/tsai.git@master
and

git clone https://github.com/timeseriesAI/tsai.git
pip install -e .

Both ways give the error,

Attempting uninstall: llvmlite
    Found existing installation: llvmlite 0.31.0
ERROR: Cannot uninstall 'llvmlite'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

llvmlite is not installed by pip but is in the fastai conda environment:

(fastai2) malcolm@PC-GPU:~/fastaiActive/repos-ML/tsai$ conda list llvmlite
# packages in environment at /home/malcolm/anaconda3/envs/fastai2:
#
# Name                    Version                   Build  Channel
llvmlite                  0.31.0           py37hd408876_0

I next tried
conda uninstall llvmlite

This gives errors and hangs:

Collecting package metadata (repodata.json): done
Solving environment: | 
The environment is inconsistent, please check the package plan carefully

The following packages are causing the inconsistency:

  - anaconda/linux-64::bkcharts==0.2=py37_0
  - conda-forge/linux-64::pyarrow==0.11.1=py37hbbcf98d_1002
  - anaconda/noarch::dask==2.30.0=py_0
  - pytorch/noarch::torchtext==0.6.0=py_1
  - anaconda/noarch::seaborn==0.11.0=py_0
/ failed

CondaError: KeyboardInterrupt

Can anyone advise me on how to proceed?

Thank you!

P.S. Though I have tried to study and understand Linux packages, pip, and conda, it seems I spend 1/3 of my time on problems like these.

oguiza · February 11, 2021, 8:33am

Hi @ModdingLeo,
I’m not 100% clear on what you want to achieve.
There are 2 ways to create splits in tsai. Either using the get_splits function (which will allow you to use kfold - optionally you can choose stratified=True to use stratified kfold), or TimeSplitter (which will just split data in the order you have passed them). So TimeSplitter assumes data is already time ordered.
I’d say that you can order your data using any available data, and then split them in a way that leaves each group where you want. In TimeSplitter you can pass as valid_size a percent (.2 by default) or an int with the exact number of samples you want to have in the validation set).
I don’t know if this helps.

oguiza · February 11, 2021, 8:37am

Hi @s.s.o,
Thanks for sharing this.
I’m currently preparing a new release in pip (0.2.15) that will fix this. I’ll release it sometime this week.

oguiza · February 11, 2021, 9:11am

Hi @Pomo,
I’m not an expert on this so this may or may not help.
You have not shown your full trace, but it looks to me the issue is that one of the dependencies in tsai is numba (which is required by the Rocket family of models). And your llvmlite version is too old. You can check how you installed numba (pip or conda) and update it. Numba has a dependency on llvmlite and might update it. The one I use (and works well) is llvmlite-0.35.0.

oguiza · February 11, 2021, 1:04pm

I’ve now created a new pip release of the tsai library (0.2.15) that fixes the issue you raised before.

s.s.o · February 11, 2021, 7:28pm

Thank you very much. Also, it was great meeting you organized and was nice to meet people who were interested in tsai and ts.

s.s.o · February 11, 2021, 8:50pm

you can try to install with out dependencies (they are installed before it fails) pip install git+https://github.com/timeseriesAI/tsai.git@master --no-deps

angusde · February 12, 2021, 2:51am

@s.s.o, @oguiza, I’d be very interested to do more meetings, to hear from others about/discuss what they are working on in the “real world”, and I’d be especially interested to hear from Ignacio about the process of developing the tsai library, the practical motivations behind it, and some of the things in there like TSBERT.

mrfabulous1 · February 12, 2021, 11:17am

Hi oguiza hope you are having a jolly day!

I am not sure what the issue was, but I found the sound quality of every body who spoke very clear, but your microphone sounded “windy and slightly distorted”. It might be worth checking.

Cheers mrfabulous1

oguiza · February 12, 2021, 11:24am

Thanks a lot, @angusde for your interest in the tsai library. I’m flattered

Based on the feedback I’ve received from our first meeting, we can organize more.

I’ll be more than happy to have a meeting to talk about tsai (or anything else related to time series). I’ll follow up with you Angus, and will make a proposal to the rest.

oguiza · February 12, 2021, 11:25am

Thanks @mrfabulous1! It’s good to know. I’ll look into that for our next meeting.

tyoc213 · February 12, 2021, 7:21pm

Looking forward!

Pomo · February 13, 2021, 5:33am

Hi @oguiza, @s.s.o,

I made some progress on the installation problem, though I really do not know what I am doing!

Numba was uninstalled successfully, but conda uninstall llvmlite lists resampy and librosa as “inconsistencies” and then hangs forever.

I did conda uninstall ... on both of the inconsistent packages.

conda uninstall llvmlite still hangs forever, but without first issuing the error messages.

Next, conda uninstall llvmlite --force seemed to remove llvmlite, with the warning that it was creating more inconsistencies.

But now pip install -e . from the tsai directory runs without an error.

Tomorrow I will check whether tsai can be imported or not!

Welcome to the DLL Hell of the open source world.

Goodnight!

geoHeil · February 13, 2021, 7:17am

I suggest always to trash the whole environment and start a new one for each project fresh from scratch - then it usually works pretty well. Also: https://github.com/mamba-org/mamba conda is sometimes pretty slow to resolve conflicts. Perhaps it will be useful for you to use mamba.

Interneuron · February 14, 2021, 4:14pm

I ran into this issue as well. You should be able to get around it with pip install llvmlite —ignore-installed

Then installing tsai with pip should work.

avatar · February 15, 2021, 4:29pm

Hi I have the following data

15 min interval data collected over a year and some features for each customer
2 month interval data collected over a year and the same features as above for each customer

Can I predict the 15 min interval time series for customers that only have 2 month interval data?

Any thoughts on how to approach this problem or helpful resources you have come across?

oguiza · February 15, 2021, 6:18pm

Hi @avatar,
That’s an interesting question.
The answer depends on many things.

Is it possible to create predictions using shorter time series than the ones used to train the model?
The answer is it depends on the model. If the model uses a Global Adaptive Pooling layer, then yes, you can. If it doesn’t you won’t be able to create a prediction.
Will the performance of the prediction drop?
It also depends on what part of the time series is used when training the model. If the most important part is the last one, and that’s the one you happen to have, then it shouldn’t hurt too much. In any case, you’ll likely see a drop in performance.
Is there a better option?
I’ recommend you try at least 2 options:
- Train the model using the long time series. Get predictions for the short time series (if you can use this option as explained above).
- Train a model using all data you have, but limited to the section of the time series available for all data. In you case, use all data for the last 3 months only. then get predictions.
  If you use both approaches, it’s likely that the 2nd works better.

I’ve created a small gist to demonstrate how you could do this using the tsai library.

Useful resources in tsai for this particular task are:

tutorial nb on regression: https://github.com/timeseriesAI/tsai/blob/master/tutorial_nbs/04_Intro_to_Time_Series_Regression.ipynb
documentation on multiinputnet (for mixed data like time series and covariates/ static data): https://timeseriesai.github.io/tsai/models.MultiInputNet