I have a csv file that includes two columns. One column for the training files path (each file-say a song- has its own path) and a second column with the label. I would like to load the training samples with a certain batch size using PyTorch and I don’t know how to proceed). Does anyone know how to make a generator to give back a batch of files as in Torchvision dataloader?
Implement a custom Dataset:
class CSVPathDataset(Dataset):
def __init__(self, csv):
# rows = [(path1, label1), (path2, label2), ...]
self.rows = parse_csv(csv)
def __len__(self):
return len(self.rows)
def __getitem__(self, i):
content = read_file(self.rows[i][0])
x = convert_to_tensor(content)
y = self.rows[i][1]
return x, y
CSV can be parsed with the standard library module. convert_to_tensor
is some domain-specific conversion function.
Then, just pass the dataset to DataLoader, and you’re done.
5 Likes
Thank you so much !