Time series/ sequential data study group

oguiza · November 21, 2018, 6:29am

Great @tukun! Thanks! Looking forward to reading your posts!

oguiza · November 21, 2018, 6:42am

Hi group,

I was thinking that in order to make it a more interesting experience, we should reduce the timeline of this competition. We could relabel these competitions and call them ‘fast learning competitions’, like the do in chess with ‘fast chess’. Are you ok if we reduce it to just one week since announcement? Otherwise I believe participants will relax a bit, and it won’t be such a good learning experience as I think it could be.
In this way we could have a new competition every week, and will thus be able to learn and share more about time series.
Agree? If so please click like, otherwise share your thoughts.

I’d also like to ask you please send your first posts to the learning competition to make this really interesting. I’m sure we we all have something to learn and contribute.

oguiza · November 21, 2018, 10:51am

I guess you had issue and couldn’t create the notebook. Since I’m a few hours ahead of you, I’ve created one and loaded to a repo I have created for this competition.
I have taken inspiration from the CNN notebook you created, although I didn’t include anything related to images, as this is just one of the multiple approaches you could take to this problem.
Thanks in any case!

oguiza · November 25, 2018, 9:44am

Proposed TSSD study group goals

Hi all,

Based on some of the insights we are getting from the Earthquakes competition, I think it’s a good time to start a discussion on what the ultimate goals of this TS projects should be.
I can share what I’d personally like to achieve, but please feel free to challenge, add your goals, agree, etc.

Goals:

Identify a series of proven, reproducible, state of the art techniques with TS data that can be used on other datasets in the real world (this is what we can achieve with competitions)
Generate and test new ideas to improve the current state of the art.
Create a notebook/ collection of notebooks containing identified best practices.
Develop the TSSD study group members expertise in this area.

Stretch goals:

Maintain/ increase community of people interested in TSSD beyond this course.
If we can generate a solid notebook/s with best practices in this area, we could propose an expansion of the current fastai library to add time series. Now there are 4 applications: vision, text, tabular and collaborate. I don’t know if Jeremy would be interested in adding time series to the list. In that way TS best practices/ state of the art techniques could be applied with fastai out of the box.

Please, let me know your thoughts.

oguiza · November 26, 2018, 5:45pm

Large benchmark Time Series Classification studies: summary

After reading @henripal’s post I have performed a review of that paper and 4 other recent Time Series Classification papers that are very interesting:

The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version (Bagnall, 2016)
Deep Learning for Time-Series Analysis (Gamboa, 2017)
Deep learning for time series classification: a review (Fawaz, 2018)
Proximity Forest: An effective and scalable distance-based classifier for time series (Lucas, 2018)
Is rotation forest the best classifier for problems with continuous features?(Bagnall, 2018)

I’ve prepared a summary for me that I thought I could share.

Summary:

1-NN dynamic time warping with a warping window set through cross-validation (DTW) has been extremely difficult to beat for over a decade, but it’s no longer considered state of the art.
SOTA TSC algorithms for univariate problems:
1.1. HIVE-COTE: current state of the art, but hugely computationally intensive. It combines predictions of 35 individual classifiers built on four representations of the data. Impractical in many problems. The HIVE version uses a hierarchical vote.
1.2. Resnet: same performance as COTE but much faster python code
Other interesting TSC algorithms in these studies:
2.1. Shapelet Transform (ST): extracts discriminative subsequences (shapelets) and builds a new representation of the time series that is fed to an ensemble of 8 classifiers. While it is considered a state-of-the-art classifier, it has little potential to scale to large datasets given its training complexity. python code
2.2. BOSS (Bag-of-SFA-Symbols): forms a discriminative bag of words by discretizing the TS using a Discrete Fourier Transform and then building a nearest neighbor classifier with a bespoke distance measure. It is of limited use on large data sets as it has a high training complexity. The authors produced a similar approach with improved scalability, the BOSS in Vector Space (BOSS-VS). The same authors recently proposed WEASEL, which improves on the computation time of BOSS and on the accuracy of BOSS-VS, but has a very high memory complexity (it doesn’t scale beyond 10k TS) python code
2.3. Proximity Forest (PF): new algorithm presented in Aug 2018. It is similar to Random Forest but replaces the attribute-based splitting criteria by a random similarity measure java code. I don’t think there is any python code yet.
2.4. Rotation Forest (RotF): an algorithm that has recently been used with very good results in TSC. I’ve contacted Dr. Bagnall, and he’s sent me this python code. They are still trying to optimize code.
2.5. Fully Convolutional Network (FCN): python code
2.6. Encoder: whose architecture is inspired by FCN with a main difference where the GAP layer is replaced with an attention layer python code
The studies are inconclusive as to best algorithms to use in multivariate TS due to the small number of datasets used. However, FCN, Encoder, and Resnet also seem to work well.
Most non-DL state-of-the-art algorithms do not scale to large time series datasets. This still needs to be confirmed with Proximity Forest and Rotation Forest.

czechcheck · November 26, 2018, 10:52pm

Hi group,

This sounds like great goals. I am quite new to time series forecasting and still surprised how this seemingly simple task can be difficult and different (univariate vs. multivariate, short vs. long, well predictable vs mostly noise, seasonal, with change points, …).

One another great dataset and source of information not mentioned above (correct me if I am wrong) is Kaggle’s Wikipedia Traffic Forecasting Challenge. There are two reasons why it is exceptional: 7 out of 8 best solutions use deep learning (the remaining one is a very smart application of Kalman filtering). And if you look into a discussion, many of them are well described there. I wonder how much work would it be to create a pytorch/fastai simple solution like this one.

My another interest (as somebody coming originally from statistics & probability) is when it is actually a good idea to use deep learning for time series forecasting. It is not a simple question and many ML applications here are questionable, as is shown in this highly accessed PLOS One paper:

Statistical and Machine Learning forecasting methods: Concerns and ways forward

sam2 · November 26, 2018, 11:44pm

A question for @mb4310,

I have attempted to play with https://github.com/mb4310/ts/blob/master/CNN.ipynb on my lilliput GPU of 4 GB
It runs into OOM for most image sizes + batch sizes.
What is the size of GPU for your playground?
I apologize if this was mentioned elsewhere.
Thanks

whamp · November 27, 2018, 12:20am

If I remember correctly I believe he was using a 1080Ti

sam2 · November 27, 2018, 12:25am

Thanks @whamp

marcmuc · November 27, 2018, 8:05am

But the same applies here as in Jeremies notebooks, you can always reduce the batchsizes if you have memory problems and if that is not enough, reduce the image size (with potential effects on possible accurracy in case of ts-2-image usecases). You also have to see that @mb4310 s approach stacks 3 resnets, so that is pretty “expensive”, so on a 1050 that is maybe not the ideal arch. But I still think it is an interesting idea to play around with…

marcmuc · November 27, 2018, 8:10am

Thanks for linking the kaggle datatset and the paper. Just wanted to point out that 90% of what is currently being discussed here in the channel is time series classification, not time series forecasting. Which at least in most cases is something very different. But I would also be interested in looking at ts forecasting more closely, especially multistep forecasting.

oguiza · November 27, 2018, 8:52am

Hi @marcmuc, @czechcheck and rest of the group,
I’d also love to learn more about ts multistep forecasting.
Would you like to propose a dataset for a learning competition?
Do you have any other ideas?

oguiza · November 27, 2018, 9:04am

BTW, I’ve seen some people call time series forecasting: the process of "developing and using a predictive model on data where there is an ordered relationship between observations” (link), and then differentiate regression or classification time series forecasting problems, while others use the term time series forecasting just for regression problems.
I don’t think there’s a correct way, but it’d be good to take this into account to avoid confusion in the future.

czechcheck · November 27, 2018, 11:05am

Marc, Ignacio - thanks for the clarification. I should not have been posting late at night. I have noticed above mentions of Rob Hyndman’s TS data library, Uber’s blog and M4 competition that are all my favorite sources and added the remaining one. I agree that t.s. classification and forecasting might be very different problems.

oguiza · November 27, 2018, 11:23am

No worries! We all post when we find some time to do it!
And it’s great to have somebody with statistical background. I’m sure you’ll bring a different and useful perspective.

marcmuc · November 27, 2018, 11:26am

Just as a clarification of what I meant:
This may not be what the literature says, so not an academic but rather my personal “definition”, but I think the huge difference lies in the meanings of prediction and forecasting.

“Prediction” in normal English can mean predicting the future. But in our context, we usually predict classes in a classification problem or predict values in a regression problem. So prediction means inferring something from given data.

“Forecasting” for me always involves the future, so I will not “forecast” a class in a classification problem, although I can predict it. So if you want to predict future values of a time series, that is for me is “forecasting”

And for forecasting maybe there is ways of using classification models, I don’t know. But if the output of the model is supposed to be a value or values for multiple time-steps in the future, I think this is more of a regression problem.

So what we have done here so far and what the UCR dataset has, is lots of time series, and we try to predict what class / type / event… whatever it is. We are not trying to infer anything about the future there but just something about characteristics of the given timeseries -> predicting classes, while the kaggle challenge and the M4 competition mentioned by @czechcheck are forecasting related.

marcmuc · November 27, 2018, 11:28am

Speaking of kaggle: has anyone here tried their hands on the PLAsTICC challenge? It is all about time series classification. (Although the astronomical domain seems quite complex/complicated I think…)

oguiza · November 27, 2018, 11:56am

Ok, makes sense . I’m happy to adopt your definition @marcmuc .
I was just highlighting there are different interpretations when you say about time series forecasting, but don’t have a strong preference one way or the other.
Just to make sure I understand, how do we call a problem where you want to, for example, predict if the stock exchange will go up or down tomorrow? I guess it’s a time series prediction of type classification, but not a time series forecasting problem, correct?

marcmuc · November 27, 2018, 12:34pm

Oh, well, that’s why I said it is not an academical definition… But, predicting tomorrows opening and closing stock price would be a forecasting problem, predicting the energy usage per hour for tomorrow would be a forecasting problem, but saying up/down (even with respect to tomorrow) for me is only a prediction by classification into 2 classes (binary), not forecasting. My “definition” does not seem to be very sound…

Maybe this is better: If the input is a timeseries and the output of the model is a also a timeseries with at least one timestamp it is forecasting. So if your “labels” for the training data are also timeseries (at least 1 timestamp ahead), it probably is a forecasting problem. If your labels are class labels, it would not be forecasting.
(But let’s not continue this, you will find holes in all of my definitions (what is forecasting the up/down for the next 5 days?) My objective was just saying that the problems of forecasting and label classification of time series may require different approaches and are not exchangable.)

oguiza · November 27, 2018, 12:43pm

Agree

Sorry if that was the impression I gave. It wasn’t certainly my intent. It was just to understand the definition.