DataParallel with variable Seq. Size

Hello,
I am trying to parallelize a Deep Speech model across 2 GPUs, but I have batches with varying sequence length (in each batch, the sequence length is constant, but it vaires between batches). When I use DataParallel, I get the error

RuntimeError: Gather got an input of invalid size: got [4, 390, 42], but expected [4, 409, 42] (gather at torch/csrc/cuda/comm.cpp:183)

The second dimension is the length of the sequences

On 1 GPU, everything runs fine. Are there any tricks? The only thing that comes to my mind is to implement Hogwild…

Kind regards,
Ernst