Time series/ sequential data study group

Same for me, but real live projects kinda slow down substantial extra commitments. I’ll try to replicate stuff that was posted here in the meantime and might join for round 2…

Thanks for pointing that out. I don’t have a lot of kaggle experience, have never been on or created a team and was apparently misguided by some forum statements.

So please, everyone interested, please let me know via pm or kaggle message if you want to join (same username as here).

1 Like

I think this is an exciting choice for our project. I’d like to join the team!

Marc: On your kaggle profile I see: “You cannot contact users until you reach the Contributor tier”. I guess I’ll have to make one submission first, so I’ll go with the null hypothesis.

Excellent initiative. But, considering the current commitment, I don’t think I will be able to do justice to this Kaggle Competition (Again I am neither a Deep Learning expert or a Kaggle Expert). Please let us know here (or on a different thread?) about the public kernels which you post as you make progress. Of course we will wait to hear back about your findings once the competition is closed.

Thank you all for making this course/thread awesome!

1 Like

TIL about a new time series library called cesium.
cesium is an open source library that allows users to:

  • extract features from raw time series data (see list),
  • build machine learning models from these features, and
  • generate predictions for new data.

This is an example - Epilepsy Detection Using EEG Data - that I think illustrates the power of this library.

10 Likes

Hi, I’m looking to catch up with this group.

Are there any main repos that collect all the different time series transform types (like vector to image, fourier transforms, etc.) that people here have experimented with?

Thanks in advance for any pointers!

Hi @keijik, you might be interested to check this library pyts that is one some of us have used to make TS to image transformations.
You may also be interested in this notebook which demonstrates a how to apply this technique in a practical example.

6 Likes

Awesome I’ll check it out (time permitting :D)

1 Like

When working with non-dl algorithms based on sklearn API (instantiating a classifier or regressor, fitting the classifier/regressor and predicting) you can use a autoML kind of library called TPOT.
It is built using genetic programming. It can run through a bunch of algorithms and their hyper-parameters and arrive at the best combo. Here is the repo.
I found it to be very handy.

3 Likes

It seems libraries called h2o4gpu and scikits.cuda provide cuda computational support to non-dl algorithms of scikit-learn. I plan experimenting with those this weekend.

4 Likes

Update

  1. scikits.cuda is a collection of math-based solver operations on cuda which potentially could be used to build cuda based scikit-learn

  2. h2o4gpu is a work-in-progress. It is also inadequately supported/staffed. Some of the demo notebooks fail. They have not been updated for 7 moths to 1 year. Issue log shows that the corrections are targeted for release 4.0.
    Following is a listing of attributes of h2o4gpu which shows what algorithms are supported:

h2o4gpu.DAAL_SUPPORTED h2o4gpu.get_config(
h2o4gpu.ElasticNet( h2o4gpu.h2o4gpu_exceptions
h2o4gpu.ElasticNetH2O( h2o4gpu.import_data
h2o4gpu.FunctionVector( h2o4gpu.libs
h2o4gpu.GradientBoostingClassifier( h2o4gpu.linear_model
h2o4gpu.GradientBoostingRegressor( h2o4gpu.logger
h2o4gpu.KMeans( h2o4gpu.logging
h2o4gpu.KMeansH2O( h2o4gpu.metrics
h2o4gpu.Lasso( h2o4gpu.model_selection
h2o4gpu.LinearRegression( h2o4gpu.neighbors
h2o4gpu.LogisticRegression( h2o4gpu.os
h2o4gpu.PCA( h2o4gpu.preprocessing
h2o4gpu.PCAH2O( h2o4gpu.random_projection
h2o4gpu.Pogs( h2o4gpu.re
h2o4gpu.RandomForestClassifier( h2o4gpu.set_config(
h2o4gpu.RandomForestRegressor( h2o4gpu.setup_module(
h2o4gpu.Ridge( h2o4gpu.solvers
h2o4gpu.TruncatedSVD( h2o4gpu.svm
h2o4gpu.TruncatedSVDH2O( h2o4gpu.sys
h2o4gpu.base h2o4gpu.typecheck
h2o4gpu.clone( h2o4gpu.typechecks
h2o4gpu.compatibility h2o4gpu.types
h2o4gpu.config_context( h2o4gpu.util
h2o4gpu.exceptions h2o4gpu.utils
h2o4gpu.externals h2o4gpu.warnings
h2o4gpu.feature_selection

Now that the competition is over, I would be interested to hear about your learning.

Hi Time Series study group! I wrote a summary of my learnings while participating in the PLAsTiCC astronomical classification Kaggle competition. I briefly explain what the competition was about, the winning approaches and some general Kaggle tips. Check it out here: “Learnings from my first Kaggle competition: PLAsTiCC” by Francisco Ingham https://link.medium.com/egGyoj4UcT

4 Likes

Thanks a lot!

I was reading your blog. It has been mentioned that "Many winning participants used 5-fold cross-validation and this is a very Kaggle thing. From the data description, it seems the data is highly temporal in nature. So, should I understand that the CVs are not random, but based on the time split?

No, we didn’t use a time based split because this was a classification problem where you needed to assign a class to each sample based on the entire time series. There was not forecasting involved.
We randomized the samples, which were independent from each other.
Does this answer your question?

1 Like

HI @mayank4, I’m not sure this post belongs here

I am working on a problem which is somewhat similar to predicting the stock price of 5 top picked stock. let’s name them A,B,C,D,E,F. I have collected and combined the data from all the available sources and dump them into training and test data of sizes 82 Million Rows and 35 Million rows respectively with 277 different features(Consider them as handcrafted features) right now i am using xgboost for model building. My concern is my model building is taking a hell lot of time for training because of the large amount of data and features. Just wanted to understand is there any way where i can create a feature embedding of these 277 different features into some 50 odd features and then using these 50 deep features in model building to reduce my training time.

And Thanks for all the people posting interesting stuff.