Hello, thanks @oguiza for maintaining this study group with so many interesting resources.
I have a couple questions in trying to understand how to use my custom dataset with Rocet/MiniRocket(Plus) for regression. I’m following the examples here. I’m struggling to figure out how to structure my data to be accepted by the model.
Perhaps a naive question: even though my dataset itself is multivariate, I am only interested in predicting future values of one feature. Does that mean I should actually use the examples for a univariate regression model? Here’s the structure of my data:
from tsai.all import *
check_data(X, y)
X - shape: [21936 samples x 11 features x 1 timesteps] type: ndarray dtype:float64 isnan: 0
y - shape: (21936,) type: ndarray dtype:float64 isnan: 0
Despite the words outputted here, X
has 21936 time steps (~2.5yr of hourly data, no gaps), each of which has 11 features, and a 12th feature is the target (y
).
But here’s the structure of the data used in the multivariate example (“Multivariate regression with sklearn-type API”):
dsid = 'AppliancesEnergy'
_test_X_train, _test_y_train, _test_X_valid, _test_y_valid = get_Monash_regression_data(dsid, verbose=True)
Dataset: AppliancesEnergy
X_train: (95, 24, 144)
y_train: (95,)
X_valid: (42, 24, 144)
y_valid: (42,)
My understanding is that above, X_train
has 95 samples x 24 features x 144 time steps. I’m a bit confused about how there are multiple samples that each have multiple time steps. Do I need to set up a SlidingWindow
dataloader to achieve this data format? In other words, are the 95 “samples” comprised of random non-overlapping (or overlapping?) regions of 144 time steps {t-144, t-143, ..., t-1}
, each having 24 features, in some original dataset that is much longer (like mine)? If so, can I assume the corresponding value in y
is just the target value at time t-0
?
Thanks in advance for any help.