Part 2 Lesson 8 wiki

This is mostly a language question. To find the bounding boxes, pascal.ipynb uses an ImageClassfierData with continuous=True.

Um, I thought that if you had discrete values, the term was “Classification” and if the values were continuous, the term was “Regression”. Am I missing a subtlety here?

(I went looking through dataset.py and couldn’t find ImageRegressionFoo, so had to go look at pascal.ipynb, which I was trying not to do.)

Here’s my attempt at replicating Lec 8 on the Pascal VOC 2012 dataset. (Not the whole notebook, only Lec 8) https://gist.github.com/suvash/b95fa40fd548403f9e6c2aeb8c46b10a

I have dealt with json data loading/preparation a bit differently compared to the lesson. Also, some in-notebook variable and function names are updated to help my understanding. (I got lazy towards the end though)

6 Likes

Also, I’m a bit curious as to where the number 25088 comes from in the custom head linear layer addition.

nn.Sequential(Flatten(), nn.Linear(25088,4))

I’m assuming that one would first inspect the architecture, if was based on a standard architecture, or know this before hand because they assembled it. Is there any other intuition I’m missing ?

i think because resnet final layers 512x7x7
but for some reason it does not work on resnext, even though it is also 512x7x7

3 Likes

There was a reference to using the fast-ai CPU version to browse the source code on the local machine. Can someone please post the method to do the same? thanks!

This number is 512 * 7 * 7 , from the last laters of the model architecture, before flattening. Instead of pooling we have flatten layer with output vector 25088, and that is multiplied by 4 for each activation.

Still trying to better imagine how the 512 7 7 values map to linear 25088. Not doing pooling makes sense, I guess this makes sure relative positions of the bbox is not lost in the model.

(‘BasicBlock-122’,
OrderedDict([(‘input_shape’, [-1, 512, 7, 7]),
(‘output_shape’, [-1, 512, 7, 7]),
(‘nb_params’, 0)])),
(‘Flatten-123’,
OrderedDict([(‘input_shape’, [-1, 512, 7, 7]),
(‘output_shape’, [-1, 25088]),
(‘nb_params’, 0)])),
(‘Linear-124’,
OrderedDict([(‘input_shape’, [-1, 25088]),
(‘output_shape’, [-1, 4]),
(‘trainable’, True),
(‘nb_params’, 100356)]))])

2 Likes

Sure. I can see that when inspecting with learn.summary(). I was only wondering if there was anything more to it, other than fully connecting the layers to the desired number of outputs.

There isn’t! :slight_smile:

1 Like

Great job!

1 Like

A simple git clone locally should be good enough for browsing the fastai source. You’ll only need to install other libs locally, if you want to jump/explore further into the source of libraries that fastai depends on.

2 Likes

As @suvash said, if you just want to browse the code on your GPU-challenged machine, you can just do a git clone. If you want to actually run code slowly on a CPU, do this:

conda env update -f environment-cpu.yml
2 Likes

I think we are just being sneaky here and using ImageClassifierData to give us dataloaders that are different from what this class was originally intended to do :slight_smile:

When you call it with ImageClassifierData.from_paths it does what it says it would do - return dataloaders fetching images with their corresponding classes.

ImageClassifierData.from_csv could be used in similar fashion (where the 2nd column of the read in csv would be the class id), but we sort of take all of this machinery and feed it a doctored column (a string of coords that gets split called on it deep in the internals and instead of getting class indexes we get our 4 continuous floats per image).

Not trying to comment on the naming just thought this background info might be of help to you.

@Ducky There seems to be two parts to it.

  • Extracting the labels from the csv
  • Setting the continuous parameter to True which turns on the is_reg property that’s used later in the model when training etc…

Like @radek mentioned, the labels from the space separated second column is extracted as follows.

from_csv
https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L422-L423
↓
csv_source
https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L447
↓
parse_csv_labels
https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L136
↓
def parse_csv_labels
https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L103-L128
I haven't bothered inspecting it properly, but I think this is where the extraction happens
> df.iloc[:,0].str.split(cat_separator)
https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L127
1 Like

To continue playing detective and continue to learn about the library, I would like to add that I think is_reg is only used in the automated construction of the adaptive pooling head and for the purposes of automatically picking the cost function:

 11     def get_fc_layers(self):                                                                                                                                                                                   
 10         res=[]                                                                                                                                                                                                 
  9         ni=self.nf                                                                                                                                                                                             
  8         for i,nf in enumerate(self.xtra_fc):                                                                                                                                                                   
  7             res += self.create_fc_layer(ni, nf, p=self.ps[i], actn=nn.ReLU())                                                                                                                                  
  6             ni=nf                                                                                                                                                                                              
  5         final_actn = nn.Sigmoid() if self.is_multi else nn.LogSoftmax()                                                                                                                                        
  4         if self.is_reg: final_actn = None                                                                                                                                                                      
  3         res += self.create_fc_layer(ni, self.c, p=self.ps[-1], actn=final_actn)                                                                                                                                
  2         return res                                                                                                                                                                                             
  1                      
 class ConvLearner(Learner):                                                                                                                                                                                    
  5     def __init__(self, data, models, precompute=False, **kwargs):                                                                                                                                              
  4         self.precompute = False                                                                                                                                                                                
  3         super().__init__(data, models, **kwargs)                                                                                                                                                               
  2         if hasattr(data, 'is_multi'):                                                                                                                                                                          
  1             self.crit = F.binary_cross_entropy if data.is_multi else F.nll_loss                                                                                                                                
90              if data.is_reg: self.crit = F.l1_loss                                                                                                                                                              
  1             elif self.metrics is None:                                                                                                                                                                         
  2                 self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]                                                                                                                    
  3         if precompute: self.save_fc1()                                                                                                                                                                         
  4         self.freeze()                                                                                                                                                                                          
  5         self.precompute = precompute   

So it is the data object that dictates what model we will get out of ConvLearner.pretrained which in turn will call ConvnetBuilder, and what cost function will the constructor for the ConvLearner automatically pick for us.

1 Like

I was playing with the nb and found that I got the right label (dinningtable), instead of a chair. Probably their probablities are too close?

I was trying to use the visual studio code on Mac using a git clone of fast. I also created the fastai-cpu environment as well. But after selecting folder (fastai folder) and interpreter (fastai-cpu), I was trying to find ‘open_image’ using Command+T in Mac. This opens up in visual code studio what is called "Go To Symbol in workspace’. But typing "open_image’ here shows no symbols found. But using Shift+Command+F (Find in all files) works but it shows all files that has open_image in it. Am I missing something in Visual Studio Code or is it just the way it is in Mac?

Hello, I’m not a Mac user but there are instructions for Mac in the wiki. In case you’ve not seen them. :slight_smile:

Thanks Vikrant! From what I could understand they were meant for users who have PyCharm implemented in their Mac instead of Visual Studio Code.

Ah! Sorry for the confusion then.

Yeah this is the kind of refactoring and cleaning up which will happen later - but in part 2 of the course you get to see the “in process” development work before that kind of cleaning up happens.

(By the time this gets to the MOOC stage it’s likely that this particular issue will be fixed by moving stuff into a parent or sibling class.)

2 Likes