Single object detection where object may not exist in the image

I am currently working on solving the RSNA pneumonia kaggle challenge (https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/) where the objective is to detect lung opacities in the images that correspond to lungs with pneumonia using bounding boxes.These images can be of category 0(Not pneumonia) or category 1(Pneumonia). I am following classes 8 and 9 but I have some difficulty understanding how I should approach this problem for the following reasons:

  1. Bounding boxes are not guaranteed in every image(images of category 0 don’t have bounding boxes).
  2. There may be more than one bounding box per image.

In regards to the first reason, I tried setting the non existing coordinates of the bounding boxes (which are NaN values in the csv) as 0,0,0,0 . However, after the training, for some reason the model would not draw the bounding boxes. I don’t know if it is correct to set this values as 0 since these bounding boxes do not exist.

With the second reason I assume that providing the model with multiple bounding boxes will be enough in order for it to predict more than one, but I have not been able to corroborate this yet since I have been working with the largest bounding box in every image in order to follow more easily the lesson.

Thanks in advance

Ok, so now I have a better understanding after reading a bit more on the forum and doing some tests. I now have a more specific question.

I am just predicting the largest bounding box in every image, however since their are images without bounding boxes, I need to avoid using these in the bounding box training.

I tried setting them to 0,0,0,0 before but then the model would predict every image as a really small value or with negative values. All the predictions were skewed to the top left corner and not even visible because of how small they were.

So the more precise question is, how can I modify the classifier head component in order to just train the bounding boxes of images where the model predicts there is pneumonia? Or should I maybe just train the bounding boxes of images where the ground truth of the image is that it has pneumonia and then just show those where the model predicted it to have pneumonia?

Just in case it is not clear, I am using fast.ai 0.7 and not implementing SSD, just the simpler algorithm in which the loss of classification and bounding box prediction is summed at the end.