Loading data in PyTorch

I have a csv file that includes two columns. One column for the training files path (each file-say a song- has its own path) and a second column with the label. I would like to load the training samples with a certain batch size using PyTorch and I don’t know how to proceed). Does anyone know how to make a generator to give back a batch of files as in Torchvision dataloader?

Implement a custom Dataset:

class CSVPathDataset(Dataset):
    def __init__(self, csv):
        # rows = [(path1, label1), (path2, label2), ...]
        self.rows = parse_csv(csv)
    def __len__(self):
        return len(self.rows)
    def __getitem__(self, i):
        content = read_file(self.rows[i][0])
        x = convert_to_tensor(content)
        y = self.rows[i][1]
        return x, y

CSV can be parsed with the standard library module. convert_to_tensor is some domain-specific conversion function.

Then, just pass the dataset to DataLoader, and you’re done.

5 Likes

Thank you so much :grinning: !