in my experience, drop out rate is just something you have to try different values for until you get something that works well for you
It’s here.
you can apply some kind of hyperparameter optimization algorithm to it if you want to be systematic
Can Jeremy explain use_clr=(32,5) usage in learn.fit?
any reason in the bbox only, you did not use sigmoid * 224 to bound the output of bbox prediction but you use it in the bbox and cat prediction loss function?
When designing a loss function with inputs of multiple kinds (i.e. L1 loss for bounding boxes and maximum likelihood loss for classes), how do we control weights for these kinds, like Jeremy did it in the lecture by multiplying one of them by 20, but without manual examination?
Well, you’ll have to throw away your old model unless you adjust the weights.
So, do you just use a new model with new dropout rates every time and see which one has the best loss?
is this concept the method behind most fully-convolutional nets? i’m thinking back to this paper - https://vision.cornell.edu/se3/wp-content/uploads/2017/07/LCDet_CVPRW.pdf
How do we know how many objects will we have in an image?
The idea is to essentially bring both the losses to the same scale range. That’s still manual. Maybe there’s a better way of doing it.
with stride 2 how would be come up with channels to be 4+c
Does it matter how many objects are in the image?
Are we just mapping the remaining outputs to all 0 when there are fewer than 16 objects?
Again, please forgive my ignorance. What does YOLO stand for in this case?
How to choose anchor boxes ? Does it matter if a anchor box has only a part of an object and another box has rest of the object ?
You Only Look Once
Anchor boxes are initially chosen by dividing the space equally as a grid.
What is this Excel wizardry?
Quote:
"Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.
We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities."
is there any thing like a nonlocal receptive field? Using data farther from the center pixel