Aggregate/Concat Datasets

Hello, I’m trying to aggregate different Image Datasets together based on some mapping rules between the different datasets.

Example:

train1 = (ImageList.from_csv(path, csv_name=‘train_labels.csv’, folder=‘dataset1/train’)
.split_from_df(col=6)
.label_from_df(cols=1)
.transform(augmentations, size=size, resize_method=ResizeMethod.SQUISH))

train2 = (ImageList.from_csv(path, csv_name=‘train_labels.csv’, folder=‘datset2/train’)
.split_from_df(col=6)
.label_from_df(cols=1)
.transform(augmentations, size=size, resize_method=ResizeMethod.SQUISH))

I have no problem getting the Datasets independently but how do I merge them together. For example, train1 has 100 classes and train 2 has 150 classes but they are all unique classes. So I want labels to train one to be 1-100. and labels of train2 to be 101-250. Something like this.

What is the best way to go about this with the new fastaiv1.0 library?
I would also like to know if both datasets share the same class how will I merged them without having to change the labels values.

@sgugger seems you the only one answering questions here. So when you available feel free to give your comments on this question.

There is an add method in ItemList that would probably do what you want: it adds the items in two ItemList together. This would look like this:

(ImageList.from_csv(path, csv_name=‘train_labels.csv’, folder=‘dataset1/train’)
  .add(ImageList.from_csv(path, csv_name=‘train_labels.csv’, folder=‘datset2/train’)
  .split_from_df(col=6)
  .label_from_df(cols=1)
  .transform(augmentations, size=size, resize_method=ResizeMethod.SQUISH))

You will need to use a dev install for it to work properly as I have just pushed some changes to make it work smoothly (as long as the labelling and splitting columns are the same in the two dataframes).

2 Likes

Hello !
I’m trying to do the same thing. Had the add method been added to the master branch of fastai ?