Custom Pipeline for Seq2Seq Encoder Regression

I am trying to implement a Seq2Seq AutoEncoder for a Regression Problem. My input data consist of multiple records of multidimensional sensor data streams. Assuming I have 10 k Samples, 4500 time steps per sample (Length varies, but assume 4500 for simplicity) and 3 Sensor measurements.

I need help, to adjust the right classes in the fastAI pipeline. I use modified version of the Seq2Seq model from

# teacher forcing
class AE_TF(nn.Module):
    def __init__(self,input_size,hidden_size, output_size, num_layers, y_range):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.num_directions = 1
        if y_range is not None:
            self.y_range = torch.from_numpy(y_range.astype('float32')).to('cuda')
        else: self.y_range = None

    self.gru_enc = nn.GRU(input_size,hidden_size,num_layers, batch_first=True)

    self.gru_dec = nn.GRU(input_size,hidden_size,num_layers, batch_first=True)
    self.out_dec = nn.Linear(hidden_size,output_size)
    self.pr_force = 0.

def encoder(self, inp, h):
    _ ,h = self.gru_enc(inp, h)
    return h

def decoder(self, dec_inp, h):
    outp, h = self.gru_dec(dec_inp, h)
    outp = self.out_dec(outp)
    return h, outp

def forward(self, inp, targ=None):

    B_SZ, SEQ_LEN, _ =  inp.size()
    h = self.init_hidden(B_SZ, self.hidden_size)
    dec_inp = self.init_dec_inp(B_SZ)
    h = self.encoder(inp, h)        
    res = []
    for i in range(SEQ_LEN):   
        h, outp = self.decoder(dec_inp, h)
        dec_inp = outp
        # if techer forcing replace prediction by true value
        if (targ is not None) and (np.random.random()<self.pr_force):
                        dec_inp = targ[:,i:i+1,:]

    if self.y_range is None:
        return, dim = 1)
        return torch.sigmoid(, dim = 1)) * (self.y_range[1] - self.y_range[0]) + self.y_range[0]

def init_hidden(self, B_SZ, FEAT_SIZE):
    # Create zero vector of size (SEQ=1,b_sz, feat_size)
    # init hidden state with zeros, init t0-1 as zeros (assume mean(data)=0)
    return torch.zeros(self.num_layers*self.num_directions , B_SZ, FEAT_SIZE, device='cuda')

def init_dec_inp(self, B_SZ):
    return torch.zeros(B_SZ, 1, self.input_size, device='cuda')

I reshape the Data into a processable Sequence Length (currently 150) and use FloatList to build a data bunch.

X_long_tr = X_tr.reshape((-1,SEQ_LEN, FEAT_N))
X_long_ts = X_ts.reshape((-1,SEQ_LEN, FEAT_N))
db = FloatList(X_long_tr.astype('float32')).split_by_idx(val_idx).label_from_func(lambda x: x, label_cls = FloatList).databunch(bs = 128)

This works well, but I am not sure how to pass the hidden state between batches:
Given SEQ_LEN = 150, B_SZ = 100 , X_tr.shape = (10000,4500,3)
I need the dataloader to first load X[:100, :150,:] in the first batch.
Then I need to reinitialize the hidden state with the last hidden state from the previous batch.
The DataLoader should load X[:100, 150:300,:] in the next batch and repeat until X[:100,4350:4500,:]. Then the hidden state must be initialized with zeros and the next batch should hold (X[100:200, :150,:]), and so on.

  1. I know most of this functionality is already in the TextPipeline, but I am not sure what to adjust, since I am not familiar with most of the subclasses. Can this be solved with transforms or do I need to set some flag in the model & dataloader?

  2. Optimally when loading the data, the start index of the first recording batch and the sequence length would be chosen with some randomness.

Would be great if somebody had an Idea where to start!

Thanks in Advance