Hello everyone,
I’m relatively new student for this course (I started in January but lost some time due to some problems with linux).
I watched the 3 first lesson of the first course on deep learning (v2). I was able to reproduce more or less the results of the first notebook on some personal dataset (I did image classification on two types of cars). But I’m still very confused concerning the implementation.
It is due mainly to the fact that I’m a beginner. I looked at the code of the scripts.
Basically, I would like to have an idea of what these famous three (more precisely the second) lines of code are doing:
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)
For the first line, I guess that the variable arch now encode some predefined architecture for the neural network: number and types of layers, number of channels, types of kernels and activation functions.
For the second line, I’m much less confident. The names obviously indicates that it builds somehow an object which encodes the data or maybe explains how to manipulate it. I understand that it encodes several things concerning the data: where it is located and how to access it, what transformation apply to it before processing, how many pictures look at the same time (batch size bs). But I have difficulty understanding the actual implementation, so I thought someone could give me a kind of overall idea" or
high level explanation of the code". There are several aspects of this object (the ImageClassifierData one) which I’m curious about:
- How is it that we don’t have to precise the extension of the pictures? There doesn’t seem to be a default extension in the from_paths method or in the get_ds method of this class in dataset.py. I also didn’t find anything inside dataloader.py.
- How are the transformations of the data stored? Let say we want to add some horizontal flipping of our cats and dogs pictures. I don’t expect our ImageClassifierData to actually copy and flip the .pdf files and then feed them as normal pictures, because I guess it would be double the size of the dataset. I imagine more that ImageClassifierData contains a list of ``some kind of pointers" (here it is really based on my imagination and nothing more) and some of these pointers indicates that before giving the image (or tensor at this level) to the neural network one has to flip it. I apologize if all of this looks like non-sense.
If there are some specific functions/methods in the code that I should look at, it would be great if someone could point them to me with a brief explanation of their role.
Best