I was trying to modify the loss functions so that it looks more like the yolo loss function. In order to do this I had to modify
ssd_1_loss with the following two lines:
actn_bbox = bb_to_actn(gt_bbox[pos_idx], self.anchors[pos_idx], self.grid_sizes[pos_idx]) # loc_loss = ((a_ic[pos_idx] - gt_bbox[pos_idx]).abs()).mean() loc_loss = ((b_bb[pos_idx] - actn_bbox).abs()).mean()
Notice, how I choose
self.anchors[pos_idx]. This chooses the anchor boxes that has at least 40%IOU with the ground truth boxes. Converting the ground truth boxes in one example I get this (I have a seperate function to convert ground truth to center, height width):
(Pdb) torch.cat([ctr,hw], dim=1) Variable containing: 0.5223 0.5312 0.3661 0.7857 0.5223 0.5312 0.3661 0.7857 0.5223 0.5312 0.3661 0.7857
And the anchor box for this particular ground truth box which had 40% IOU is as follows:
(Pdb) anchors Variable containing: 0.5000 0.5000 0.7000 0.7000 0.5000 0.5000 0.3500 0.7000 0.5000 0.5000 0.5000 1.0000
Notice, especially for the second anchor, the ground truth height and width is larger than the height and width allowed by this anchor. This leads to NaNs when calculating the inverse tanh function which feeds into the yolo loss.
So the question is:
Shouldn’t we be choosing boxes that have high IOU and are reachable by what is allowed by the bounding box.
Conversely this also leads into my other question that was asked on a seperate thread. This asked essentially if we wished to multiply the predicted tanh(activation) with a fraction between (0,1) or between (0.5, 1.5). If upto 1.5 is allowed I would not be dealing with the NaN values since it is an ‘allowed’ height and width.