I have a timeseries dataset that is similar to the example below and I have been trying in vain for a couple of days to figure out how to modify .dataloaders() (assume a batch size of 1) to give me a tensor.shape(3,6) that is batched by ID 0, 1, etc. Is this possible?
import pandas as pd
import numpy as np
# Generate time series data
n = 4
ts1 = np.random.randn(n)
ts2 = np.random.randn(n)
ts3 = np.random.randn(n)
cat_col = np.random.randint(n)
# Create unique ID values for each time point
ids = range(n)
# Repeat each ID three times
repeated_ids = np.repeat(ids, 3)
# Generate random binary labels
labels = np.random.randint(2, size=n*3)
cat_col = np.random.randint(n, size=n*3)
# Combine the data into a dataframe
data = {'ID': repeated_ids, 'TimeSeries1': np.tile(ts1, 3), 'TimeSeries2': np.tile(ts2, 3),
'TimeSeries3': np.tile(ts3, 3), "c_col": cat_col, 'y': labels}
df = pd.DataFrame(data)
# Preview the dataframe
df
Hello, @DannyK ! I don’t quite understand the task but you can try to create your own dataset and dataloader. Something like this:
from fastai.data.all import *
class MyDataset:
def __init__(self, df, name='train'):
self.df = df
self.name = name
def __len__(self):
return len(self.df['ID'].unique())
def __getitem__(self, j):
XY = self.df[self.df['ID']==j]
X = XY[['ID','TimeSeries1','TimeSeries2','TimeSeries3','c_col']]
y = XY['y']
return tensor(X), tensor(y)
train_ds = MyDataset(df, name='train')
dls = DataLoaders.from_dsets(train_ds, bs = 1)
Here I have assumed that you need a label column so the output is not one tensor with shape tensor.shape(3,6) but two tensors - one with tensor.shape(1,3,5) and one with tensor.shape(1,3,1). The first dimension is for the index in the batch. You can check the reslut like this:
@krasin Thank you so much! You nailed my question to a T.
The basic task is that I have wanted to be able to load in my dataset based on how long the sequence of data was in my dataset. I kept trying to control it through the dataloader instead of a Dataset