YOLO Architecture

  1. How (Is it possible) to combine Fast-RCNN (2-stage) and YOLO (1-stage) ?

  2. Why with the addition of anchor boxes we changed the resolution to 416 x 416 ? Why using anchor boxes we get a small decrease in accuracy ? How does using anchor boxes decouple the class prediction mechanism from the spatial location ?

  3. Why if we use standard k-means with Euclidean distance larger boxes generate more error than smaller boxes ? How to derive d(box, centroid) = 1 - IOU(box, centroid) ?

  4. How to derive Pr(object) * IOU(b, object) = σ(to) in Yolo v2 ? Why is this expression not used in Yolo v3 ?

  5. Why apply sigmoid function to tx and ty ? Why apply exponential function to pw and ph ?

  6. From yolov3-spp.cfg , I did not see anything about 3 different scales. Could anyone advise ?

  7. Why Yolo v3 tensor size needs to multiply by N*N ? What is represented by N ?