Retina net notebook merge idx issue

heye0507 · July 9, 2019, 6:05am

Hi all,

I am working on the Retina net notebook. Here is one line that seems to me off the focal loss paper.
I am not sure it is a bug or my understanding got wrong.

self.merges = nn.ModuleList([LateralUpsampleMerge(chs, sfs_szs[idx][1], hook) 
                                     for idx,hook in zip(sfs_idxs[-2:-4:-1], self.sfs[-2:-4:-1])])

sfs_idx is a list with [6,5,4,2], which correspond to Resnet 50 model’s grid size change layer.

My understanding is,

layer idx 6, correspond to C4, which is the layer with 16 by 16 by 1024
layer idx 5, correspond to C3, which is the layer with 32 by 32 by 512
layer idx 4, correspond to C2, which is the layer with 64 by 64 by 256
layer idx 2, correspond to C1, which is the layer with 128 by 128 by by 64

therefore if we zip the idx and hook, in idx [-2:-4:-1] we actually have idx [4,5]. Which in the upsampling part gives hook output of C2 and C3. (P2 = P3+C2, P3 = P4 + C3)

I am a bit confused, I think the slicing should [0:2:1], which should be [6,5]
Where layer idx 6, is C4, P4 = P5 + C4
layer idx 5, is C3, P3 = P4 + C3

As we know from the paper, we are capturing the feature map level from P3-P7.

tsar · September 22, 2019, 11:34pm

Actually, this really looks like a mistake. I tried to check your assumptions on practice.
You are right about sfs_idxs == [6,5,4,2] and that slice sfx_idxs[-2:-4:-1] == [4,5].

I added some debug output to forward method of LateralUpsampleMerge:

    def forward(self, x):
        conv_lat_hook = self.conv_lat(self.hook.stored)
        print("conv_lat_hook.shape:", conv_lat_hook.shape, "+ x.shape:", x.shape)
        return conv_lat_hook + F.interpolate(x, self.hook.stored.shape[-2:], mode='nearest')

And when I run learn.summary() for learner with 256x256 images in data, these are first lines of output:

conv_lat_hook.shape: torch.Size([1, 256, 64, 64]) + x.shape: torch.Size([1, 256, 8, 8])
conv_lat_hook.shape: torch.Size([1, 256, 32, 32]) + x.shape: torch.Size([1, 256, 64, 64])

I tried changing the slice from [-2:-4:-1] to [0:2:1] and got this:

conv_lat_hook.shape: torch.Size([1, 256, 16, 16]) + x.shape: torch.Size([1, 256, 8, 8])
conv_lat_hook.shape: torch.Size([1, 256, 32, 32]) + x.shape: torch.Size([1, 256, 16, 16])

That’s better. It seems like author of the code forgot that he has reversed list of encoder’s layers, which change size of image, in sfs_idxs already.

Looking forward to hearing from the authors.

Maybe we are wrong and that “mistake” was made on purpose and gives better results.
I can’t check that, still can’t make the notebook working: Having problems running pascal.ipynb notebook

heye0507 · September 23, 2019, 1:09am

I have both of the SSD and retina net working notebooks, if you want to take a look

I forgot to update this post after I finished the retina net.

Anyway, here is the link:

https://github.com/heye0507/dl_related/blob/master/play_ground/Retina_net_dev.ipynb

Henry912 · September 23, 2019, 5:17am

First, to convert these n values to probabilities! we apply the softmax activation function to them.

tsar · September 23, 2019, 11:26am

Thank you!
I looked at your RetinaNet, but it seems there is loss function from SSD, not focal loss. Can you explain this, please?

tsar · September 23, 2019, 11:27am

I can’t understand how this is related to the topic. Could you give some more context?

heye0507 · September 23, 2019, 1:04pm

My bad of laziness, it is focal loss though. I was actually doing retina net and focal loss for a interview, cycled as much code as possible from SSD. It is also true that focal loss is not different vs SSD loss, they only different is the get_weight() call.

I should rename it to focal loss, instead of adding an option said focal_loss = True

tsar · September 23, 2019, 2:06pm

And did you do any tweaks to the model, except from fixing slicing and refactoring?

heye0507 · September 23, 2019, 3:18pm

I didn’t use the latest bounding box introduced in the 2019 retina net notebook. I’m still using the scale of 2018 bounding boxes, changed the coordinates system from 0-1 to -1 to 1 (you can check my scale, I was rushing the result, not 100% sure)

I didn’t implement the extra conv for smoothing out the upsampling artifacts.

Last, I didn’t have time to implement non-max suppression and outputting confidence probablity on the graph.

That’s all the difference I can think of. Oh, I didn’t change the bias part, when first time training from scratch, the model will need more epochs to adjust the initial weight.

But you can find all of what I just said in 2019 retina notebook.

Hope this helps

tsar · September 23, 2019, 4:33pm

Thank you a lot!

Currently I am still fighting with pascal.ipynb notebook and have some progress: Having problems running pascal.ipynb notebook

tsar · September 24, 2019, 1:04pm

I finally fixed pascal.ipynb notebook and made some testing.

The results show that model with original slicing [-2:-4:-1] gives worse results than with [0:2] (which seems to be correct according to the RetinaNet paper).
And also the model with original slicing works much slower.

Here are the losses:

Fit on 128, model freezed:

Slicing	Final train loss	Final valid loss
[-2:-4:-1]	1.469326	1.805385
[0:2]	1.170873	1.387025

Then fit on 128, model unfreezed:

Slicing	Final train loss	Final valid loss
[-2:-4:-1]	1.043817	1.393275
[0:2]	0.850711	1.052559

Then fit on 192, model freezed:

Slicing	Final train loss	Final valid loss
[-2:-4:-1]	1.022913	1.303066
[0:2]	0.861015	1.055111

Then fit on 192, model unfreezed:

Slicing	Final train loss	Final valid loss
[-2:-4:-1]	0.757558	1.097417
[0:2]	0.688136	0.899993

Then fit on 256, model freezed:

Slicing	Final train loss	Final valid loss
[-2:-4:-1]	0.767718	1.063900
[0:2]	0.732141	0.907611

Then fit on 256, model unfreezed:

Slicing	Final train loss	Final valid loss
[-2:-4:-1]	0.619473	0.929040
[0:2]	0.564977	0.825705

Fixed model with slicing [0:2] gives better results every time.

Here are notebooks with all results:

Right now I am creating PR with fixes to pascal.ipynb notebook in course-v3 repository. Then I plan to create PR to fastai-dev repository.

P.S. I should mention that I’m currently testing on GTX 860M, so I have to reduce size of the batches a lot (make them 8 times smaller) and wait for a lot of time. In a week I’ll come home and will make more tests using RTX 2070.

heye0507 · September 24, 2019, 3:55pm

The thing I didn’t submit the PR when I noticed the problem is you want to visualize the feature map output. According to the paper, the implementation on the fastai repo is off, however, I am not 100% understand the bounding box implementation.

As you could expect, if the feature map is wrong, the size of receptive field should be much smaller than the one expected in the p3p4 layers(for example, instead of 3232, they probably output 6464). Which according to the paper, they should even have a better result: if you ever plot the pascal images, the missed ones are those very small things.

Loss is not the good metric here, you will need to calculate the mean average precision mAP, compare the fastai one, and compare yours to tell if indeed improved the result.

Also, since they are calculating more smaller receptive fields, it makes sense that your loss is lower.

heye0507 · September 24, 2019, 4:01pm

@radek if you don’t mind I’m adding you to this. I think you probably went over the retina net implementation. Do you think the merge idx is wrong?

The discussion is, in the fastai repo, retina net implementation is outputting C2-C3 instead of C3-C4 because of the different slicing (my first post)

Thanks in advance

radek · September 24, 2019, 4:10pm

Sorry haven’t looked at this part of the codebase in ages.

heye0507 · September 24, 2019, 4:24pm

It’s all cool I will just wait until Jeremy gets the object detection for the supplement course materials.

Thanks any way

tsar · September 24, 2019, 6:53pm

If the code, that follows training, is correct in the notebook, then I did calculate mAP already.

You may look at the endings of this notebooks:

But I haven’t studied and verified metrics calculation code yet.

BTW, my PR: Made pascal notebook work and fixed model to get better results by Tsar · Pull Request #415 · fastai/course-v3 · GitHub

tsar · September 25, 2019, 10:25pm

I also noticed, that data normalization was forgotten in original pascal notebook.

But I see it here.

Added data normalization to my PR. Will run more tests soon and post the results.

Does anyone know state-of-the-art mAP for pascal 2007 dataset? I found some numbers that are a lot higher than my results, but do not know if train/validation datasets were splitted the same way there.

heye0507 · September 26, 2019, 4:30am

I thought pascal dataset has a validation set when you download the data. valid.json is the annotation for validation set I think.

I googled it. It seems to me the SOTA is around 87% mAP.

heye0507 · September 26, 2019, 4:32am

Also, if you read the retina paper, I think the input image size is 512. If you want to reach the mAP they had, I think you will need to train things with similar image size (your receptive field will be different as you see when I grow 128 to 256)

I used a P100 at the time, you probably can push it with V100 with mix precision.

tsar · September 26, 2019, 7:14pm

Results are almost the same. Here is the notebook.

@heye0507, did you get state-of-the-art mAP for any well known dataset using your version of model? and loss function?