Understanding the ideas behing ImageClassifierData

Hello everyone,

I’m relatively new student for this course (I started in January but lost some time due to some problems with linux).
I watched the 3 first lesson of the first course on deep learning (v2). I was able to reproduce more or less the results of the first notebook on some personal dataset (I did image classification on two types of cars). But I’m still very confused concerning the implementation.

It is due mainly to the fact that I’m a beginner. I looked at the code of the scripts.

Basically, I would like to have an idea of what these famous three (more precisely the second) lines of code are doing:

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)

For the first line, I guess that the variable arch now encode some predefined architecture for the neural network: number and types of layers, number of channels, types of kernels and activation functions.
For the second line, I’m much less confident. The names obviously indicates that it builds somehow an object which encodes the data or maybe explains how to manipulate it. I understand that it encodes several things concerning the data: where it is located and how to access it, what transformation apply to it before processing, how many pictures look at the same time (batch size bs). But I have difficulty understanding the actual implementation, so I thought someone could give me a kind of overall idea" orhigh level explanation of the code". There are several aspects of this object (the ImageClassifierData one) which I’m curious about:

  1. How is it that we don’t have to precise the extension of the pictures? There doesn’t seem to be a default extension in the from_paths method or in the get_ds method of this class in dataset.py. I also didn’t find anything inside dataloader.py.
  2. How are the transformations of the data stored? Let say we want to add some horizontal flipping of our cats and dogs pictures. I don’t expect our ImageClassifierData to actually copy and flip the .pdf files and then feed them as normal pictures, because I guess it would be double the size of the dataset. I imagine more that ImageClassifierData contains a list of ``some kind of pointers" (here it is really based on my imagination and nothing more) and some of these pointers indicates that before giving the image (or tensor at this level) to the neural network one has to flip it. I apologize if all of this looks like non-sense.

If there are some specific functions/methods in the code that I should look at, it would be great if someone could point them to me with a brief explanation of their role.

Best

For your first question you want to look again at dataset.py, specifically the open_image method (snipped a bit, it has comments in the source):

def open_image(fn):
    if not os.path.exists(fn):
        raise OSError('No such file or directory: {}'.format(fn))
    elif os.path.isdir(fn):
        raise OSError('Is a directory: {}'.format(fn))
    else:
        try:
            return cv2.cvtColor(cv2.imread(fn, flags), cv2.COLOR_BGR2RGB).astype(np.float32)/255
        except Exception as e:
    raise OSError('Error handling image at: {}'.format(fn)) from e

That code probably gives you a good clue what’s happening: it tries paths in a certain directory, it makes sure it exists and isn’t a directory, and then it attempts to read it as an image. That’s why you don’t need to tell it anything about your images, it just assumes you’ve given it a path where it can find images and nothing else!

The question you’re probably asking now is who calls that and how it knows what to pass in, so let’s trace backward. It’s called by get_x(i) of FilesDataset:

def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))

It’s not super easy to backtrace the code this way, so let’s look at the implementation of ImageClassifierData.from_paths (again snipping out the comments):

@classmethod
def from_paths(cls, path, bs=64, tfms=(None,None), trn_name='train', val_name='valid', test_name=None, num_workers=8):
    trn,val = [folder_source(path, o) for o in (trn_name, val_name)]
    test_fnames = read_dir(path, test_name) if test_name else None
    datasets = cls.get_ds(FilesIndexArrayDataset, trn, val, tfms, path=path, test=test_fnames)
    return cls(path, datasets, bs, num_workers, classes=trn[2])

There’s lots of stuff to note here! Firstly look at the signature default values: notice the defaults for trn_name and val_name: that’s why we didn’t need to give the exact path! Now we can start to see what’s happening here: we gave it the top path, it defaulted to PATH/train and PATH/valid, and in there it assumes everything is a picture and attempts to read it as such.

To join the dots together, FilesIndexArrayDataset (3rd line of that method) is a superclass of FilesDataset and inherits that get_x(i) method we saw that reads the given image path.


For your second question, you’re completely right that there’s no actual copying of the files or increase in the training set to support augmentation. You want BaseDataset's __getitem__ method (if you’re not familiar with that Python method, it lets you index a class, eg. x[i] calls x.__getitem__(i):

def __getitem__(self, idx):
    x,y = self.get_x(idx),self.get_y(idx)
    return self.get(self.transform, x, y)

So that calls our get_x we’ve seen before, which gets us the requested image. It then passes that through whichever transforms have been asked for before returning it. So the transforms are done on image fetch, in memory without creating any more images.

I hope that helps a bit! :slight_smile:

6 Likes

Thank you so much for this wonderful answer mcintyre1994!

I understand much better this code now!

I think I will try to dive furthermore into this implementation, as it seems to be a great exercise.

Best!

1 Like