Proposal for ObjectDetectionData

The hope is that this will allow to automate some of the manual data creation of lesson 8 / 9 and will make the core educational content (model / loss creation, visualization) stand out more. Also, this should make working with other datasets quicker and less error prone.

Please find the gist here.

We feed the class a csv of the following format (header optional):

file_name,bbox_coords,category
000012.jpg,96 155 269 350,car
000017.jpg,61 184 198 278 77 89 335 402,car person

It gives us a data object that does the following (effectively what we need for the detection model from lesson 9 but I have only done minimal testing thus far):

This is an early version and I tried to keep the functionality together for easier discussion. Among other things, we might want to take a look at how / if we can integrate the reading of the csv with what the ImageClassifierData is already doing.

I also would like to see if I can extend this in a nice way to one hot encode labels for localization (single object per image like we did for the largest bb in lesson 8).

This is not completed yet by any means but wanted to share nonetheless in case this might be useful to someone and to kick off the discussion as per Jeremy’s suggestion on github.

The next steps that I am going to take take will be to integrate this into the notebooks and continue to test / add functionality / refactor.

This works with the current version of the library but I think the work that @piotr.czapla is doing can make this much nicer. I will follow what he does to see how we can use proposed changes in managing transforms for giving us greater flexibility in creating datasets. Hopefully we will be able to get rid of concat_datasets_for_detection and make the ConcatDataset into something nicer and more useful across a broader spectrum of applications.

You can find the branch for this here.

BTW this uses the method proposed by @binga for working directly with the dataframes which ends up being quite concise and probably easier to read than falling down to Python primitives. That was a neat gist you shared with us @binga :slight_smile: :+1:

EDIT: Here is a notebook using the new functionality that reproduces what we did in lesson 8.

10 Likes

This isn’t getting a whole lot of discussion and I am thinking this might be due to my wall-of-text post and also due to possibly right now not being a good time for such functionality to be considered for addition to the library.

All this is cool - I will keep this floating around as I feel it makes my life easier and will continue to keep it in working condition. Maybe at some point we might want to revisit this or maybe the library will take some other architectural turn which will make this obsolete.

Didn’t expect to bring this to working condition so quickly so to some extent my earlier considerations regarding this (especially the ones from github) are void and only wanted to share what my current plan re this is before letting this thread fade away.

Seems I forgot to push the changes. Here is the branch with this updated against current master.

It all should work (with transforms, etc) - ran it last before the rebase but all should be well.

Since I am starting to drawn in utils files that I create I am trying to put everything in one place and also don’t want to constantly maintain this branch just for my use (want to stick to using fastai master) so will leave it how it is. But if anyone would want to use it and would encounter issues - let me know and I will try to help.

Probably most useful would be to link the diff showing this branch vs master?

@kcturgutlu @groverpr @piotr.czapla might be interested.

@radek This would definitely reduce some redundancy of repeating same stuff over and over for every object detection experiment/notebook. Probably, an option of adding an CSV file with another optional column for mask coordinates could also incorporate segmentation data in this. (Just thinking out loud :slight_smile: )

This is the code that I added:

class ConcatDataset(Dataset):
    """Concatenates a dataset and an iterable of appropriate size."""
    def __init__(self, ds, y2):
        assert(len(ds)==len(y2))
        self.ds,self.y2 = ds,y2
    def __len__(self): return len(self.ds)
    def __getitem__(self, i):
        x,y = self.ds[i]
        return (x, (self.y2[i],y))
    def denorm(self, im): return self.ds.denorm(im)


def concat_datasets_for_detection(fnames, ys, transform, path):
    """
    Arguments:
        fnames: image file names
        y[0]: an array of lables for each example
        y[1]: bounding box coordinates
    fnames, y[0] and y[1] need to be in corresponding order.
    Returns:
        ConcatDataset
    """
    return ConcatDataset(FilesIndexArrayRegressionDataset(fnames, ys[1], transform, path), ys[0])


class ObjectDetectionData(ImageData):
    @classmethod
    def from_csv(cls, path, folder, csv_fname, bs=64, tfms=(None,None),
               val_idxs=None, suffix='', test_name=None, skip_header=True, num_workers=8):
        """ Read in images and associated bounding boxes with labels given as a CSV file.
        The csv file needs to contain three columns - first one containing file names, second one classes and the third one
        bounding box coordinates.
        Example:
            file_name,category,bbox_coords
            000012.jpg,car,96 155 269 350
            000017.jpg,car person,61 184 198 278 77 89 335 402
        Arguments:
            path: a root path of the data (used for storing trained models, precomputed values, etc)
            folder: a name of the folder in which training images are contained.
            csv_fname: a name of the CSV file which contains target labels.
            suffix: suffix to add to image names in CSV file (sometimes CSV only contains the file name without file
                    extension e.g. '.jpg' - in which case, you can set suffix as '.jpg')
            bs: batch size
            tfms: transformations (for data augmentations). e.g. output of `tfms_from_model`
            val_idxs: index of images to be used for validation. e.g. output of `get_cv_idxs`.
                If None, default arguments to get_cv_idxs are used.
            test_name: a name of the folder which contains test images.
            skip_header: skip the first row of the CSV file.
            num_workers: number of workers
        Returns:
            ObjectDetectionData
        """
        df = pd.read_csv(csv_fname, index_col=0, header=0 if skip_header else None, dtype=str)
        for i in range(df.shape[1]): df.iloc[:,i] = df.iloc[:,i].str.split(' ')
        labels = []
        for row in df.iloc[:, 0]: labels += row
        classes = sorted(list(set(labels)))
        class2id = {l: i for i, l in enumerate(classes)}
        df.iloc[:, 0] = df.iloc[:, 0].apply(lambda row: np.array(list(map(lambda i: class2id[i], row))))
        for col in range(1, df.shape[1]):
            df.iloc[:, col] = df.iloc[:, col].apply(lambda row: np.array(list(map(lambda i: int(i), row))))
        fnames,y = df.index.values,[df.values[:, i] for i in range(df.shape[1])]
        full_names = [os.path.join(folder,str(fn)+suffix) for fn in fnames]
        return cls.from_names_and_arrays(path, full_names, y, classes, val_idxs, test_name,
                num_workers=num_workers, tfms=tfms, bs=bs)

    @classmethod
    def from_names_and_arrays(cls, path, fnames, y, classes, val_idxs=None, test_name=None,
            num_workers=8, tfms=(None,None), bs=64):
        val_idxs = get_cv_idxs(len(fnames)) if val_idxs is None else val_idxs
        (val_fnames, trn_fnames), *ys = split_by_idx(val_idxs, np.array(fnames), *y)

        test_fnames = read_dir(path, test_name) if test_name else None
        datasets = cls.get_ds(concat_datasets_for_detection, (trn_fnames,[y[1] for y in ys]), (val_fnames,[y[0] for y in ys]), tfms,
                               path=path, test=test_fnames)
        return cls(path, datasets, bs, num_workers, classes=classes)

Need to update the description for concat_datasets_for_detection.

A diff against what was fastai master at the time of this writing can be found here (didn’t want to spam PRs so the diff is against a branch in my fork imitating upstream master).

I use a version of this with additional val_ratio argument in ObjectDetectionData.from_csv since otherwise I would need to read in CSV to get its length before I use the get_val_idx.

Agree :slight_smile:

1 Like