TimeSeries

farid · March 5, 2020, 2:37pm

You will find the N-BEATS model here. You will notice there are always 2 files (_network.py and _estimator.py) for almost all the models. The following are the one from N-BEATS:

_network.py used to train the model
_estimator.py used for forecasting a sequence

In the N-BEATS case, you also have _ensemble.py, and this is because it uses an ensembling technique.

vrodriguezf · March 5, 2020, 3:11pm

Got it, thank you!

takotab · March 5, 2020, 4:13pm

It rang a bell somewhere, but never knew they where this far. But looks very extensive. Would be awesome to work on, quite the challenge. I’m certainly willing collaborate to port to fastai2.

Maybe start with n-beats (1 or 2 more) and wait for the results of M5 to spot the best models, before adding them all.

farid · March 5, 2020, 5:15pm

Sounds great!

nestorDemeure · March 6, 2020, 12:07pm

I just found a nice python matrix profile implementation.

Its a time serie preprocessing technique that might be interest to people doing :

pattern/motif (approximately repeated subsequences within a longer time series) discovery
anomaly/novelty (discord) discovery
shapelet discovery
semantic segmentation
density estimation
time series chains (temporally ordered set of subsequence patterns)
and more …

farid · March 6, 2020, 5:56pm

I pushed a univariate time series classification example notebook. It uses the Yoga Univariate Dataset.

I also added all the UCR univariate datasets urls in timeseries.data. Now, we have all the UCR univariate and multivariate datasets urls. All the univariate urls have the UNI_ prefix which eases accessibility.

Example:

path = unzip_data(URLs_TS.UNI_YOGA)

farid · March 10, 2020, 4:39am

The documentation of the timeseries library is now live. It is pretty extensive and easier to navigate thanks to the nbdev built-in features.

@takotab, I think you expressed interest in checking out the doc.

farid · March 10, 2020, 5:23pm

Mimicking fastai2.vision, I created ts_learner with the following defaults:

model = inception_time
opt_func = Ranger
loss_func = LabelSmoothingCrossEntropy()
Metrics = accuracy

So now, we can train any time series dataset end-to-end with 4 lines of code (Raise your hand if this looks familiar to you ):

path = unzip_data(URLs_TS.NATOPS)
dls = TSDataLoaders.from_files(bs=32,fnames=[path/'NATOPS_TRAIN.arff', path/'NATOPS_TEST.arff'], batch_tfms=[Normalize()]) 
learn = ts_learner(dls)
learn.fit_one_cycle(25, lr_max=1e-3)

While checking out the accuracy for the NATOPS dataset, I was amazed by the fact of using just fastai2 default settings you can achieve above 97% accuracy (even 98,6% now and then) in only 18 epochs. Awesome! … and this is just another normal day using fastai2

I added a notebook called training_using_default_settings where you can play with it.

If you like the timeseries library and/or find it useful, please star it on github, and share it. Any constructive feedback is very welcome

tcapelle · March 11, 2020, 12:32pm

Super!
How do we proceed? Let’s make an unified repo to work together.
I would think that maybe is better to fork fastai V2 and develop a branch for timeseries. Doing so, would enable us to propose a merge to the master in the future. We can make issues and traack the progress of various features. This will also enable to request advice from the pros on fastai on how to do the thing in the most fastaish possible way.
I made a fork with a branch called timeseries here both of you are admins of the repo., notebooks starting from 100 are my simple classification example. Can you move your core funcionalities here @takotab and @farid?

farid · March 11, 2020, 2:47pm

@tcapelle I fully share your enthusiasm to move fast and merge our contributions in one common repo. I think we all also share the same goal to see one common time series library emerging from our current projects; and we hope it will make it to the official fastai repo as a first-citizen module like vision, text, etc.

Having said that, I also share both @MadeUpMasters and @jeremy point of view expressed (here below).

Please, no one should view my statements as an expression of someone who doesn’t want to collaborate and/or who is trying to keep his project separate. I’m sharing all my work here, and I welcome anybody to use it. I think the latter statement is shared by most contributors as long as we credit the work of the original contributors where it fits.

One of the main takeaway from the quote, here above, is that it counter-intuitively seems more efficient to use one central repo (project) where in fact it slows down the pace of the group as a whole, and limit the creativity of the involved members.

By keeping the projects separate, each one will be able to iterate at her/ his own pace without the fear of stepping on each-others toes by merging too quick or too slow her/his contributions.

As of today, we have 4 different implementations of the time series data. IMHO, I think it will be good to discuss on how we can:
1- Come up with a simple hierarchy that can federate times series data used in both classification/regression and forecasting.
2- Create common classes that deal with importing different datasets (classification/regression/forecasting)
3- Mimic fastai2 classes by using/creating the equivalent of Datasets, DataLoaders, DataBlock classes in the common time series library

I’m browsing our 4 different repos as well as Amazon gluonts repo in order to better understand each one’s structure in order to give some feedback. Unfortunately, this week is going to be quite busy both at work and at finishing the time series blog post that I promised to share soon.

Basically, what I am trying to say is that I share with you the idea of merging our projects but I would like to take some time to understand how we can efficiently merge them; and meanwhile keep iterating and experimenting with the current project.

I encourage each one of us, and any fastai experienced members who are following this thread and who have some advices on this subject, to share their opinions in this thread in order to create a common time series library built on a solid foundation, and consequently avoid potential misunderstandings down the road.

Thank you in advance for sharing your opinions and your experiences.

oguiza · March 11, 2020, 3:25pm

My view on this is very practical, very aligned to the proposal Robert Braco (MadeUpMasters) made, and very close to what @farid proposes.
I think it’d be good to meet and discuss:
1. Develop high level scope based on most common use cases
By answering what are some of the most common use cases in TS we’d like to cover. For example,

TS data download
Univariate/ Multivariate TS classification
Univariate/ Multivariate TS regression
Univariate/ Multivariate Forecasting
Fastai code
TS models
…

2. Define high level requirements
it’s important to choose the right approach so that we don’t miss important requirements in the future. To give you example, I cannot use implementations that don’t allow me to use large datasets that don’t fit in memory.

3. Decide which notebooks we should develop to cover the main use cases
We can start with just a few (2-3) and increase them in the future, pretty much like Jeremy does in the course.

4. Have several of us working in parallel in the nbs where we can add value. For example, I’m interested in TS classification/ regression, but not much in forecasting, while it may be different for others.

5. Review the proposals made by others and agree on a common approach for each of the use cases.
By working in parallel we’ll use different approaches that can be beneficial to others. I’m already benefitting from the work @tcapelle and @farid have shared, as I’m sure they’ll benefit from some of the ideas I’m developing.

6. Set up a new GitHub organization where we could have multiple co-owners, and 1 or several repos. That’s the approach I used with timeseriesAI. We may fork fastai v2 to ensure there’s a good integration with it as Thomas suggested.

If we build something great, and get Jeremy and the rest interesting in adding it to fastai excellent, but I think that shouldn’t be the primary driver. At least for me, personally, the primary driver is to create a state-of-the-art framework for TS classification/regression that I can use in my work, and share with the rest just in case they can leverage it. But I understand each one of us will have a different driver.

There are some very capable people that have participated in the thread, and I do belive we can build something really exceptional if we work in parallel, but in a coordinated way.

oguiza · March 11, 2020, 3:30pm

BTW, I also agree with the idea of moving forward quickly.
I’m also working in a fastai v2 version with a different approach to yours that I hope to be able to share later this week.

tcapelle · March 11, 2020, 4:08pm

This is awesome! we will have your Timeseria_AI on V2 soon?
Regarding oguiza post:
2. This is something I am facing, my V2 implementation is slower than my V1 implementation, probably due to how I build the batches (dataframe->list->tensor is not great). Right now all the dataset is stored on memory, that is something not possible for large datasets.
3. That’s why I think a central repo is a good idea, maybe not a fork of V2.
4. Same
5. This! I am also waiting for the course, because there are many ways of building the same thing.
6. How do we do that? We could use this timeseriesAI then?

farid · March 11, 2020, 4:44pm

@oguiza, I agree with the core proposal that you laid out. Likewise, my main drive is to build the best time-series library as I expressed it in my first post in this thread. If it makes it to the official fastai2 it will be the cherry on the cake, and it will create more exposure to the library. For me, the more we democratize these kind of tools the better.

Before nbdev was released and fastai2 was in a dev mode, I used a fastai_dev fork. By using a fastai fork, we oblige the users to pip install the whole forked repo in order to use (test) the time-series library, and override their existing fastai2 package. I think that’s not desirable especially if the forked repo is lagging behind (and that’s easily done given the rapid pace of changes introduced in both fastcore and fastai2 )

Since fastai2 and nbdev repo were released, I stopped forking and I start using fastai2, fastcore, and nbdev as editable packages and try to keep-up with their fast releases.

By keeping the project independent, the latter will remain both light and agile, and it will facilitate testing the library by the users and getting their feedback.

Ignacio, I’m looking forward to seeing your fastai2 implemetation.

tcapelle · March 12, 2020, 10:28am

@oguiza I will like to know the performance you see on V2.
How do we debug performance issues?
I would think it is the way I am building batches.

takotab · March 12, 2020, 11:03am

%lprun -f func full_func() always helps me. Dont forget to set num_workers=0

tcapelle · March 12, 2020, 4:38pm

Have been playing the whole day with the codes.
My first takes are,

For univariate Timeseries a pandas approach is the way to go, it is way faster. Look at this notebook. The idea is that we grab the full batch at once.
We can integrate rapids.ai pipeline and do mostly on GPU.
For big datasets probably can integrate using dask.
Probably there is cleaner way to do this than re-writing the whole tabular notebook again.

oguiza · March 12, 2020, 6:03pm

I’m sorry but can’t help much. I’m not an expert in performance.
I’ll be able to measure performance with the setup I’m creating when I’m done. I can’t still compare it to v1.

farid · March 12, 2020, 7:39pm

Can you please share what kind of machine you ran your notebook on , and test it on Google Colab and share those results in order to be able to compare the different implementations on comparable hardware?

tcapelle · March 12, 2020, 10:13pm

I put together a small comparison and basic Datasets/DataLoaders constructors on this notebook, I also benchmarked the speed of cycling the dataloaders using a very basic technique:

def cycle_dl(dl):
    for x,y in iter(dl):
        pass

%time cycle_dl(dls.valid)

Notebook here