@chingjunehao: The output of the SSD detector is a tensor containing N_box “box tensors”. Each “box tensor” contains the data relative to one box generated by the model, the data being (your data order might be different) [x_center, y_center, width, height, x_center_variance, y_center_variance, width_variance, height_variance, class1_score, …, classN_score, x_center_offset, y_center_offset, width_offset, height_offset].
The scores and offsets are the parameters that SSD predicts and the other parameters are fixed.
So in order to crop your image, you need to follow the steps below:
- Have a decoder function to compute the predicted position of the boxes in pixels.
- Filter the boxes to keep only the ones with the highest score (non-max suppression)
- Finally use the positions of the remaining boxes to crop your image.