MultiObject Detection Issue

Hi all,

I have been watching the lectures and have tried to reimplement the models to run predictions on my own data.
I have collected around 500 images and have labelled the object of interest using labelImg, which outputs the .xml annotation file for each image. Then, I have implemented the functions to get the same format variables (dictionaries) as in lecture notes.

I have successfully ran lecture 8 (single object detection) and am currently struggling with debugging the execution of the code from lesson 9 (multi object detection). @jeremy, I have watched these lectures last year already but have to admit that I was lazy back then and just went through the code you wrote. This time, as I am forced to adapt the code to process specific data and adapt the model for a specific purpose, I have encountered a number of problems along the way, and I respect your implementation much more although I found the lessons helpful last year already. I would really recommend everyone attending these lectures, especially those who are attending with purpose to apply the skills in real world to reimplement everything. There is a huge difference between just running the code and writing it line by line.

Back to the issue. After writing all the helper functions which parse xml files, I got the csv file for bounding boxes in the following format:

fn,bbox
img2-0166.jpg,140 73 172 106 161 399 190 425
img2-0235.jpg,109 285 151 326 421 165 467 217
img3-0019-2.jpg,21 54 43 73 64 19 83 37 72 76 89 94
img4-0290.jpg,209 374 250 409
img3-0039-1.jpg,27 14 45 31 32 88 47 104
img4-0173.jpg,46 43 78 88
img3-0011-3.jpg,24 16 40 28
img4-0178.jpg,129 74 155 111
img3-0079-3.jpg,29 39 65 67
img3-0085-2.jpg,72 35 127 86 103 242 141 272
img4-0252.jpg,242 147 282 186 274 761 321 804
img2-0376-2.jpg,42 57 58 77
img1-0298.jpg,211 604 258 663
img3-0048-1.jpg,10 27 43 60
img4-0295.jpg,82 338 147 416 523 379 579 443 544 286 573 318 839 89 877 142 849 181 868 199 828 256 858 291 825 329 863 372 858 584 878 607 858 621 884 636 860 634 885 651 858 647 879 668
... 

I believe up to this point everything is correct.

Then, when I run the code which is supposed to display 3 x 4 grid of example images with bounding boxes, there are many objects of size 1 x 1 in the upper left corner (0, 0, 1, 1).

This is the part of code that outputs the images:
fig, axes = plt.subplots(3, 4, figsize=(16, 12))
for i,ax in enumerate(axes.flat):
show_ground_truth(ax, x[i], y[0][i], y[1][i])
plt.tight_layout()

I have modified the function show_ground_truth, to print the y values and the values are actually unexpected:

def show_ground_truth(ax, im, bbox, clas=None, prs=None, thresh=0.3):
    bb = [bb_hw(o) for o in bbox.reshape(-1,4)]
    if prs is None:  prs  = [None]*len(bb)
    if clas is None: clas = [None]*len(bb)
    ax = show_img(im, ax=ax)
    # print(im.shape)
    # print(bb)
    for i,(b,c,pr) in enumerate(zip(bb, clas, prs)):
        if((b[2]>0) and (pr is None or pr > thresh)):
            print("{}: {}".format(i, b)) # this line outputs the bounding boxes that are overlayed
            draw_rect(ax, b, color=colr_list[i%num_colr])
            txt = f'{i}: '
            if c is not None: txt += ('bg' if c==len(id2cat) else id2cat[c])
            if pr is not None: txt += f' {pr:.2f}'
            draw_text(ax, b[:2], txt, color=colr_list[i%num_colr])

This is the output:

0: [0. 0. 1. 1.]
1: [0. 0. 1. 1.]
2: [0. 0. 1. 1.]
3: [0. 0. 1. 1.]
4: [0. 0. 1. 1.]
5: [0. 0. 1. 1.]
6: [0. 0. 1. 1.]
7: [0. 0. 1. 1.]
8: [0. 0. 1. 1.]
9: [0. 0. 1. 1.]
10: [0. 0. 1. 1.]
11: [0. 0. 1. 1.]
12: [0. 0. 1. 1.]
13: [0. 0. 1. 1.]
14: [108. 106.  37.  34.]
0: [0. 0. 1. 1.]
1: [0. 0. 1. 1.]
2: [0. 0. 1. 1.]
...

Obviously, somewhere, the bounding boxes have been reshaped incorrectly. For the first image, there are 15 bounding boxes; forst 14 are in the upper left corner and the last one actually correctly marks the object on the image. Similarly, in the following images, there are many bounding boxes in the upper left corner and there are others that correctly mark the objects on the image.

So this is the point where something went wrong and I haven’t found the cause yet. I would really appreciate any help. If anyone encountered something similar and fixed it, I would be really thankful for a suggestion where to look for a possible cause of error.

Then, I nevertheless tried to run learn.lr_find and got the following error:

RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

One more thing, which I think that is not causing the problem (but might be wrong) is that not all images in the JPEGS folder are annotated (the number of lines in CSV file is smaller than the number of images, but all the images on CSV list exist).

Edit
I have finally found the cause of the RuntimeError noted above. Some objects are really small and therefore the normalization outputs some bounding boxes of width and height zero. After cropping the images in order to increase the size of object of interest relative to image size, the code executed without errors.

Still, I don’t know why many ground truth objects appear in upper left corner. I tried to rerun the original pascal-multi notebook and in that case objects also appear in upper left corner. The cause is actually in the function bb_hw:

def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])

For all the instances of 0, 0, 0, 0 sequences (which represent ‘the object’ with upper left and bottom down coordinates (0, 0), (0, 0), the calculated width and height is 1 which is not true. Anyway, it is not hard to modify the code in the notebook to skip all such cases but nevertheless it would be helpful if anyone commented this ‘issue’. Last year when I went through this notebook and everything worked as expected, I used an older version of pytorch and fastai library. A week ago, I updated pytorch and fastai library and made a symbolic link to the old version. I guess these changes shouldn’t affect the results discussed here.

This is the preview of ground truth instances:

1 Like

Hi there, I am running into the same problem with regards to that Runtime error. You say you got around it by cropping the images, can you tell me how you did that please so that I may try it also.

Thank you