He didn’t really mean 4+4+21=26?
Does creating pandas DataFrames introduce any overhead that might compromise performance?
Maybe it works. We might have to do an experiment and see for ourselves.
Yes, because if one kind of input has it’s loss around 10 and the other is around 1, the former will dominate the loss, and the model won’t learn to do a good job for the latter.
The 3 objects that print out, all are indexed somewhere, where is that again?
Just want to say Lecture 8 CS231N W2016 and Lecture 11 CS231N S2017 make excellent companion lectures to this lesson. The first halves are more/less on R-CNN, but the latter halves cover YOLO/SSD.
use a conv layer that outputs that… not necessarily stride2
Does anybody have a good article to explain Jaccard Indexes?
Random question: And also, what does this difficulty of object detection in an image say about captchas, why is that more difficult, bc it’s computationally expensive for a bot to do this when it asks humans to identify what is and isn’t in an image (usually used for password protection/security questions)?
I want to know the answer to this too!
Why would spatial transformer liket his not work?
http://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html
Your captcha answers are being used as data to train neural nets, so I would bet that if deep learning can’t break captchas yet, it will be able to very soon in the future
You could even build this yourself for a small fee by getting a site to crowdsource solving these for your data.
How do the bounding boxes span across the anchor boxes? Aren’t we predicting if the object is in the anchor box?
Okay, I missed something: why are the bounding boxes found not exactly equal to the anchor boxes?
probably merging boxes having same labels spanning across anchor boxes
The anchor boxes evenly divide the image (say into a 2x2 or 4x4 grid), while the bounding boxes are still the rectangles that closely surround the object
Okay, I missed this point:
are t, x one hot encoded?
What’s their dimensions?
How do we decide how the bounding box is aligned? Stretched horizontally or vertically?
Or are we simply combining anchor boxes based on IOU?
Who came up with the Jaccard Index Trick? Its pretty cool !
and with the one-hot encoding in the loss function, why did he add one and then immediately subtract it? how is that different from not adding one / not subtracting it?