Time series/ sequential data study group

jacksth22 · March 14, 2023, 6:35pm

Hello, thanks @oguiza for maintaining this study group with so many interesting resources.

I have a couple questions in trying to understand how to use my custom dataset with Rocet/MiniRocket(Plus) for regression. I’m following the examples here. I’m struggling to figure out how to structure my data to be accepted by the model.

Perhaps a naive question: even though my dataset itself is multivariate, I am only interested in predicting future values of one feature. Does that mean I should actually use the examples for a univariate regression model? Here’s the structure of my data:

from tsai.all import *
check_data(X, y)

X      - shape: [21936 samples x 11 features x 1 timesteps]  type: ndarray  dtype:float64  isnan: 0
y      - shape: (21936,)  type: ndarray  dtype:float64  isnan: 0

Despite the words outputted here, X has 21936 time steps (~2.5yr of hourly data, no gaps), each of which has 11 features, and a 12th feature is the target (y).

But here’s the structure of the data used in the multivariate example (“Multivariate regression with sklearn-type API”):

dsid = 'AppliancesEnergy'
_test_X_train, _test_y_train, _test_X_valid, _test_y_valid = get_Monash_regression_data(dsid, verbose=True)

Dataset: AppliancesEnergy
X_train: (95, 24, 144)
y_train: (95,)
X_valid: (42, 24, 144)
y_valid: (42,)

My understanding is that above, X_train has 95 samples x 24 features x 144 time steps. I’m a bit confused about how there are multiple samples that each have multiple time steps. Do I need to set up a SlidingWindow dataloader to achieve this data format? In other words, are the 95 “samples” comprised of random non-overlapping (or overlapping?) regions of 144 time steps {t-144, t-143, ..., t-1}, each having 24 features, in some original dataset that is much longer (like mine)? If so, can I assume the corresponding value in y is just the target value at time t-0?

Thanks in advance for any help.

george23 · March 26, 2023, 1:05pm

hey everyone, I hope you are doing well
How should I set the horizon parameter when applying a sliding window function to my multi-variate, multi-label data as I don’t really get what it does?

Additionally, what steps should I take to avoid leakage in my data? Should I split the training and test data and apply the sliding window function separately to each of them?

thanks for help

oguiza · March 27, 2023, 6:18pm

I have noticed that several questions, requests, and even issues about the tsai library have been posted here lately. If you want these to be tracked properly, please submit them to GitHub. Otherwise, you may not receive a response.

cmackenzie · April 18, 2023, 6:30pm

Hi, back at this forum after a couple of years (redoing the course!) and no, didn’t really get anywhere.

How about you?

Eshta · February 29, 2024, 3:29pm

Hello, I have a time series prediction problem, How to arrange the following input data of the following format?
Input={mix1, mix2, mix3, temp values on an hourly basis for a month, weight load on an hourly basis for a month}
Output={ anomaly1, anomaly2} (Output varies on a monthly basis)
The input is for the same mix1, mix2, mix3 is for 12 months, thus 12 rows of data
then my same data for mix3, mix4, mix5 similarly 12 rows of data. and it goes on.
My first doubt is that will I list the temperature values sequentially as temp1, temp2… temp 24x30 and similarly for load? This implies 720 data items each for temp and load.
Also, the other doubt is if I have to perform an analysis for next 20 years do I have to repeat the same rows of data for say next 20 years assuming that temperature data set and load data is same throughout for the next 20 years. Is there a way that I consider the iteration for 20 years without literally adding the rows as only the output values change with every month for 20 years.

noisyearth · April 10, 2024, 12:58pm

Hey everyone! I’m new here but excited to be part of the community. I’ve got a unique challenge and I’m hoping to get some advice on which Fastai model might be the best fit. I’m working with structured spreadsheet data to explore the relationship between food intake (with details like macros) and bowel movements, trying to identify trigger foods. The twist is the delayed reaction - up to 48 hours between eating and the effects. So, it’s not straightforward.

Here’s the lowdown:

Data: Two tables. One with food and nutrient intake, and another describing bowel movements (e.g., thickness, fragmentation).
Objective: Predict ‘good’ vs ‘bad’ bowel movements based on food intake, accounting for the up to 48-hour delay. Ideally this is a Regression model with a scale of the degree of fragmentation, thickness, etc. But to start a simple binary “good” or “bad” would suffice.
Data Size: 6 months worth.

Given the delay in reaction and the structured nature of my data, I’m scratching my head on how to model this. Any suggestions on models or a particular approach within fastai that could handle this delayed effect scenario effectively?

Appreciate all your insights and recommendations!