When using multi datasets by ConcatDataset, i encount some bug

WBSUN · July 16, 2021, 4:38am

I’m trying to merge two datasets, and an error left me very confused. I will reproduce this error in a simple way as below:

# define two datasets
import torch.utils.data as data


class test_data1(data.Dataset):
    def __init__(self):
        super(test_data1, self).__init__()

    def __getitem__(self, index):
        inputs = {}
        inputs['img'] = 1
        inputs['mask'] = 2
        inputs['filename'] = 3
        inputs['task'] = 4
        return inputs

    def __len__(self):
        return 50


class test_data2(data.Dataset):
    def __init__(self):
        super(test_data2, self).__init__()

    def __getitem__(self, index):
        inputs = {}
        inputs['depth'] = 5
        inputs['apple'] = 6
        inputs['banana'] = 7
        inputs['task'] = 8
        return inputs

    def __len__(self):
        return 50

# call ConcatDataset to combine this two datasets
all_train_dataset = torch.utils.data.ConcatDataset([datasets.test_data1(), datasets.test_data2()])
train_loader = DataLoader(
            all_train_dataset, self.opt.batch_size, True,
            num_workers=self.opt.num_workers, pin_memory=True, drop_last=True)

for batch_idx, inputs in enumerate(self.train_loader):
      print(inputs['task'])

And the following error occurred:

How can i solve this error?

WBSUN · July 16, 2021, 6:54am

if I make the two datasets have the same type of data,this error will not occur. So, how can i merge two datasets that consist of different types of data? I just want the same batch to contain data from the same dataset, and different batches to contain data from different datasets

WBSUN · July 16, 2021, 7:08am

In other words, how can i merge two dataloaders?

JackByte · July 16, 2021, 4:37pm

Do you have two csv-files as source? Maybe you could read them in an Pandas Dataframe, join them, and then just use one dataset based on the joined Dataframe.

I have the feeling that you want to join the two data sets like an sql join (i.e. the number of rows is the same, you just have now the columns of both datasets). And based on the error message I think that concat is more like a sql union (i.e. number of columns stays the same, but you have now the row amount of both datasets combined). KeyError:'depth'