I am working on the Retina net notebook. Here is one line that seems to me off the focal loss paper.
I am not sure it is a bug or my understanding got wrong.
self.merges = nn.ModuleList([LateralUpsampleMerge(chs, sfs_szs[idx], hook) for idx,hook in zip(sfs_idxs[-2:-4:-1], self.sfs[-2:-4:-1])])
sfs_idx is a list with [6,5,4,2], which correspond to Resnet 50 model’s grid size change layer.
My understanding is,
layer idx 6, correspond to C4, which is the layer with 16 by 16 by 1024
layer idx 5, correspond to C3, which is the layer with 32 by 32 by 512
layer idx 4, correspond to C2, which is the layer with 64 by 64 by 256
layer idx 2, correspond to C1, which is the layer with 128 by 128 by by 64
therefore if we zip the idx and hook, in idx [-2:-4:-1] we actually have idx [4,5]. Which in the upsampling part gives hook output of C2 and C3. (P2 = P3+C2, P3 = P4 + C3)
I am a bit confused, I think the slicing should [0:2:1], which should be [6,5]
Where layer idx 6, is C4, P4 = P5 + C4
layer idx 5, is C3, P3 = P4 + C3
As we know from the paper, we are capturing the feature map level from P3-P7.