Data handling for Unsupervised signal processing

Hello everyone !

I am interested in using the Fastai library to reimplement the Wavenet VQ VAE Autoencoder described in this paper :

I read the fastai docs about how to use the core blocks to build your own pipeline but I don’t know what I should use for my task. The goal is to build an autoencoder on speech data from different people.

Here is my problematic :
As inputs, I have different files assigned for each person in the dataset. Each file can be of different length.
As output, I would like to have a model able to output me batches of size (Batch Size, Window Length), where Window Length is the size of the window of time where we look at a portion of one user’s audio signal.

In a drawing it looks like this :

So the challenge is to wrap up this data handling part with Fastai, to be able to load all files, and create a DataBunch able to output me batches where each observation in the batch is a fixed size slice of an user’s audio file.

Besides, working on this made me wonder : how are data in different files handled in Fastai ? Do we load everything in memory once ? Or are files opened and closed ?

Thank you very much !