Finetuning DETR for object detection on custom dataset

That’s pretty much what I thought, good point about the box embed though, I suppose it is just the number of classes and the number of queries that should be modified. I am also interested to try out replacing the num_classes fc layer with a small network - just to give it a few more parameters when finetuning.

Also, really cool that you are using Detr at work! How have you found it to train generally, the batch size seems to be very low in most examoles? Do you just fine tune in the usual way, freezing the backbone and the criterion first? Additionally, have you had a chance to guage the performance on small objects, as I’ve read in a few places it suffers there slightly compared to efficientdet?