I’m creating this topic to discuss the
ObjectDetectionInterpretation class I’m developing to inspect results from Object Detection with a similar API as
Any suggestion is most welcome !
I have a question for you @sgugger if you don’t mind. One issue I’m running into is in the
loss_batch function (in
basic_train) on this line :
if not loss_func: return to_detach(out), yb.detach()
Only the first element of
yb is returned (I don’t really know why?). In Object Detection
yb is the bounding boxes targets, and
yb is the classes targets, and both are needed for
get_preds. So what would you suggest ?
Can I modify
loss_batch directly to return also
yb (in that case
validate would also need to be modified, at least)? I’m not sure how that would impact other applications so that’s why I’m asking. The alternative is to developp a
get_preds method specific to ObjectDetection.
Could you advise me on the best course of action ?
Ah that’s tricky. It returns
yb has been listified and we need the actual element in that case. Not too sure how it’s gong to work with a listy target.
In you case you can try with
to_detach(yb) and in general I think it should be something like that, with a potential squeeze to remove useless dims, but it’d need more test.
Oh right, I forgot it was listified.
I’ll try to make something work then, thanks for your input !
If I may, I have another question on the design of the Object Detection API you want for fastai (I’ll do my best not to bother you with too much questions but I feel this is an important one as it concerns fastai design).
As you know I’m working off your implementation of RetinaNet. In it the forward pass of the model outputs three things in a list :
- The classification predictions, of size
batch size x number of anchors x number of classes
- The BBox predictions, of size
batch size x number of anchors x 4
- (if I understood correctly) The sizes of the feature maps that the FPN outputs.
Can I assume that the first 2 is what an Object Detection model will always output ?
Similar question for the labels : in the RetinaNet notebook for each image the class labels are not one-hot encoded as they are in the model output. Can I assume that will always be the case ?
Since I haven’t had the time to train and finish this part I can’t answer those questions now. So for now do your best and we’ll adapt whatever you come with if there is a change of API.