Part 2 Lesson 9 wiki

So that’s why! I have been wondering all week long where you pulled the -4 from.

I’ve been trying to implement the mAP loss Jeremy talked about and was wondering if I was doing it right. It begins the same way as the ssd loss, then I loop through all the ground truth objects to see how many times they have been detected with the right class.

def mAP_1sample(b_c,b_bb,bbox,clas):
    bbox,clas = get_y(bbox,clas)
    a_ic = actn_to_bb(b_bb, anchors)
    overlaps = jaccard(bbox.data, anchor_cnr.data)
    gt_overlap,gt_idx = map_to_ground_truth(overlaps)
    pos = gt_overlap > 0.4
    pr_act, pr_clas = torch.max(b_c,dim=1)#Convert our predictions into labels.
    tp = np.zeros(n_clas)
    fp = np.zeros(n_clas)
    fn = np.zeros(n_clas)
    for i in range(clas.size(0)):
        #Ground truth object i is detected if it's in gt_idx (it can be detected multiple times).
        #We check if those detections go with the right class prediction
        cl = clas[i].data #the class we are trying to detect.
        detect = ((gt_idx[pos]==i) * (pr_clas[pos].data==cl)).sum()
        if detect == 0:
            fn[cl] += 1
        else:
            tp[cl] += 1
            fp[cl] += detect - 1
    ap = tp/np.where((tp+fp) > 0, tp+fp,1) #To avoid a division by zero
    recall = tp/np.where((tp+fn) > 0, tp+fn,1) #Just if we want this one as well.
    return ap.mean() 

If I understood correctly:

  • false negatives: number of ground truth object not detected at all
  • true positives: number of ground truth object rightly detected
  • false positives: number of predictions of a given ground truth object minus one.

Does that sound right?

1 Like

To clarify - I meant to suggest you use it as a metric, not as a loss. You would use the metric after the nms step to test the end-to-end effectiveness of your trained model.

I believe that this is how most model results are presented in papers and evaluated in competitions.

Looks like you got it! :slight_smile:

1 Like

Sorry, I meant metric, not loss. Also I hadn’t understood that we have to sum all true positives/false positives other all the pictures of the validation set before applying the formula for AP/mAP so the code above won’t work.

I’m not quite sure I understand the definition of false positives though, and searching on the net has made me even more confused. When we have five predictions with the right class for ground object number one, I gather it’s one TP and four FP, but how do we count a prediction for a ground truth object that doesn’t give the right class?
For instance, let’s say I have a ground truth object of class one for which I have 5 predictions, 3 of class one, 2 of class two. How many false positives does that give? 4?

2 posts were split to a new topic: Mean average precision (mAP)

I’m unable to prevent overfitting. The train_loss is much lower in comparison to what it was without making the background removal change for an almost the same validation_loss.

Is anyone else able to bring the validation_loss below 10?

I am having trouble understanding how the “find bounding boxes AND labels” section works. In particular,

tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, tfm_y=TfmType.COORD, aug_tfms=aug_tfms)
md = ImageClassifierData.from_csv(PATH, JPEGS, MBB_CSV, tfms=tfms, continuous=True, num_workers=4)

we set things up to find bounding boxes – with tfm_y=TfmType.COORD and continuous=True, but then we shove the category ID into the dataset as well. How does the model know that it should not treat categories as continuous, or do COORD style transforms on the categories?

The bounding box dataset doesn’t get modified, only added to.

That answer didn’t make sense to me until I realized that it did IFF the effects of tfm_y and continuous have no effect after the from_csv() call happens, and then it makes perfect sense.

I don’t fully understand what tfm_y and continuous do, which is I think at the root of my problem. (That’s probably from working on refugees instead of part1v2 homework, sorry.)

1 Like

The refugees needs may be more urgent…

Python version

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=1, ncols=1, figsize = (6, 6))

pt = np.arange(0.00, 1.01, step= 0.01)
CE = -np.log10(pt)

# legend color: gamma
g = {'b': 0.0, 'r': 0.5, 'y': 1.0, 'm': 2.0, 'g': 5.0}

for i in g:
    #print(i, g[i])
    FL = (1-pt)**g[i]*CE
    ax.plot(pt, FL, c = i, label = '$gamma$ = '+str(g[i]))

ax.legend()
ax.set_xlabel('Probabilities of groud truth class')
ax.set_ylabel('Loss')
ax.set_title('Focal Loss')

image

12 Likes

I think there might be a problem with RandomRotate for bounding box coordinates.

I pick an image with a very well fitted bounding box:

download

I then apply the RandomRotate transformation:

(apologies for the redefined augs in the code making it less clear to read).

The results don’t seem right. In particular, why such a big difference between 1 and 2? I remember the discussion from the class with regards to this but given how tight the bounding box is and the object being rectangular I am concerned there might be something else amiss here.

Would be very grateful if someone could please confirm if they are seeing similar behavior?

1 Like

Interesting. Maybe try putting a original target boundary box in the image itself and try augmenting that to see how the original box fits in the new one? That might make it easier to see what to look for if there is something that can be improved.

Yeah that does look odd. Would love help debugging this! :slight_smile:

Without a doubt, that is a really, really neat way of preprocessing data :slight_smile: I think @binga was not paranoid enough though (and neither was I when re-implementing this :wink: ). As a result, I ended up questioning my life choices and the meaning of it all we so trivially call life. Becoming a castaway and living on an island with no electricity started to seem like a very appealing lifestyle. I think anyone whoever debugged a model will understand :wink:

It all started with my models giving me slightly worse performance (2 - 3%) on accuracy vs the ones I worked on earlier for lesson 8. I started working back from the end of the notebook… to spare you the sappy details of this story, this is what you need to change to get results like in the lecture:

Fun fact: the datasets will still not be exactly the same as pd.merge will grab more than one entry if the area of the bb in said image is exactly the same (which it is for a single image with multiple aeroplanes).

In case anyone wonders why the difference in performance - some items, if the are not fully visible or for other reasons can be considered hard. If such, they will be marked with ignore = 1. Here is an example:

download

Here the biggest bb is for the… table. We can infer it is a table but just looking at the picture it might be hard to figure out what this white circular blob is. Hence, ignoring some annotations, we would go for the bottle instead (this is the biggest bb in the image with ignore == 0).

3 Likes

You have room for 1 more person?

3 Likes

I think I found were it came from. There is a weird part of zeros appearing when we do the rotation to the box transformed in a square:


When the second picture is set to a bbox again, it tries to include this weird bit, hence making it larger.

This is obtained by redoing the steps of the RandomRotate Transform with the picture and it’s box, my code is:

rot_tfm = RandomRotate(10, tfm_y = TfmType.COORD)
rot_tfm.set_state()
rot_tfm.store.rdeg = 30
y_squared = CoordTransform.make_square(box,im)
fig,axes = plt.subplots(1,2,figsize=(12,4))
show_img(rot_tfm.do_transform(im,False), axis=axes[0])
show_img(rot_tfm.do_transform(y_squared,True), axis=axes[1])

where im contains the image of the car and box the associated bbox in a numpy array.

1 Like

The problem seems to be in the self.BORDER_REFLECT flag, it shouldn’t be used for y. I’m making a pull request on github.
It seems to work well once corrected:

6 Likes

You rock :slight_smile: Awesome job.

Your PR isn’t quite right (although I merged it since it’s better than what we have). You should check whether it’s TfmType.COORD or TfmType.CLASS and only use constant border mode in those cases. For TfmType.PIXEL we probably want reflection padding.