Thanks for the photo share! It was really helpful and great learning for me too to work it through with you all.

A correction to make at the bottom-left of the photo where it says “Output = (64, 189, 4)”:

- 64 is the batch size, not channels
- 189 is the number of predictions for each of the 64 images in the batch. This represents/corresponds to the 189 anchor boxes that we defined up top.
- 4 is the set of bounding box corners that is trained to define each anchor box (x 189 from the 2nd dimension). This is the 1st of 2 outputs in a list (specifically,
`torch.cat([o1l,o2l,o3l], dim=1)]`

)

The second output has 21 elements in the 3rd dimension - full shape would be (64,189, 21) - representing the one-hot encoded predictions for the 20 categories + 1 ‘bg’ category. This is `torch.cat([o1c,o2c,o3c], dim=1)`

from the return step of the forward pass:

```
class SSD_MultiHead(nn.Module):
def __init__(self, k, bias):
super().__init__()
self.drop = nn.Dropout(drop)
self.sconv1 = StdConv(512,256, drop=drop)
self.sconv2 = StdConv(256,256, drop=drop)
self.sconv3 = StdConv(256,256, drop=drop)
self.out0 = OutConv(k, 256, bias)
self.out1 = OutConv(k, 256, bias)
self.out2 = OutConv(k, 256, bias)
self.out3 = OutConv(k, 256, bias)
def forward(self, x):
x = self.drop(F.relu(x))
x = self.sconv1(x)
o1c,o1l = self.out1(x)
x = self.sconv2(x)
o2c,o2l = self.out2(x)
x = self.sconv3(x)
o3c,o3l = self.out3(x)
return [torch.cat([o1c,o2c,o3c], dim=1),
torch.cat([o1l,o2l,o3l], dim=1)]
```