Part 2 Lesson 9 wiki

Loob · August 13, 2018, 3:03pm

still have version 0.3

prajjwal1 · August 13, 2018, 4:45pm

I think the problem I’m seeing has been solved with v0.4, there were some issues with byteTensor and they added it’s support in that. Cause I have tried removing all instances from cpu, put everything on gpu, Didn’t quite work out. Could you share the working notebook which you yourself have tried ?

MicPie · August 16, 2018, 1:56pm

I had a similar problem in the pascal notebook with pytorch 0.4.1 and I was able to fix it with two changes:

1.) In the function “detn_loss” and “detn_l1” change “F.sigmoid” to “torch.sigmoid” to avoid the errors during learn.lr_find() and learn.fit(). (This is not a problem but the output looks much better.)

2.) The learn.fit() was always throwing an error when calculating the metrics:
Expected object of type torch.LongTensor but found type torch.cuda.LongTensor for argument #2 'other'
This I could fix with copying the accuracy_np function into the notebook and adapting it to this (after the # you see the original version):

def accuracy_np(preds, targs):
    preds = np.argmax(preds, 1)
    return np.mean(to_np((preds==targs.cpu()))) # (preds==targs).mean()

Here you also see the nice info from the debugger (started with “%debug” in a cell):

> <ipython-input-199-b9616952fd70>(3)accuracy_np()
      1 def accuracy_np(preds, targs):
      2     preds = np.argmax(preds, 1)
----> 3     return np.mean(to_np((preds==targs))) # (preds==targs).mean()

ipdb> p preds
tensor([14, 17,  2, 14, 14,  6, 13,  8,  2,  9, 15,  6, 17,  7, 12,  0, 14, 14,
         7, 19,  1, 14, 14, 13, 14, 14, 14,  6,  9, 18, 13,  0,  2,  6, 18, 11,
        14,  6,  0, 10,  6, 13, 12, 14,  3, 13, 14,  7, 14, 13,  9, 14, 13,  2,
        14, 11,  6,  0,  2,  2,  2, 14, 14,  9])
ipdb> p targs
tensor([14, 17,  2, 14, 14,  5, 13, 10,  2,  9, 15,  6,  8,  7, 12,  0, 14, 14,
         7, 19,  1, 14, 14, 13, 14, 14, 14,  5,  9, 18, 13,  0,  2,  6, 18, 11,
        14,  3,  0, 10,  6, 13, 12, 14,  3, 13, 14,  7, 12, 13,  9, 14,  1,  2,
        14, 11,  6,  0,  2,  2,  2, 17, 14,  9], device='cuda:0')
ipdb> p preds==targs
*** RuntimeError: Expected object of type torch.LongTensor but found type torch.cuda.LongTensor for argument #2 'other'
ipdb> p preds==targs.cpu()
tensor([1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], dtype=torch.uint8)
ipdb> p to_np(preds==targs.cpu())
array([1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], dtype=uint8)
ipdb> p np.mean(to_np((preds==targs.cpu())))
0.875
ipdb> q

Maybe the debugging overview helps somebody not familiar with it.

Best regards
Michael

jk23541 · August 17, 2018, 3:44pm

Around 20:36 I was looking at the head_reg4 and was wondering why we output 256 features from the Linear layer. Could someone explain or refer me to a source on how I go about choosing input and output features please?

crcrpar · August 21, 2018, 9:22am

This helped me a lot.
Thank you!!!

nikhil.ikhar · August 26, 2018, 12:18pm

I got the same error multiple times. I have to replace .cpu() to .cuda() at every line to get rid of the error.

ouflepapi · September 1, 2018, 12:20pm

For multibox detection, when we one-hot encode the labels, if the ground truth is only background the vector will be full of zeros, so any vector of activations will give a cross entropy loss of zero… isn’t that a big problem ?

Mihar · September 4, 2018, 8:43pm

@ guptapankaj1993 I am having the same problem, did you find any solution?

guptapankaj1993 · September 5, 2018, 3:25pm

@Mihar For getting my work running, I used try-except to ignore those erroneous instances.

SanjayYadav · September 9, 2018, 2:43am

@jeremy
why we need to multiply sigmoid to 224 in detn_loss , when we already augmented the input bbox coordinates (tfm_y=TfmType.COORD ) which make them fall between the range 0 to 224 ?

vahuja4 · September 19, 2018, 6:03am

Around 1:04 in the lesson, @jeremy explains how to convert the activations to bounding boxes. The function in the notebook is actn_to_bb(). Can someone please explain what is going on there?

mayank4 · September 27, 2018, 6:32pm

Why is it showing aeroplane at top left of all images with ground truth boxes ?

airborneinf82 · September 28, 2018, 5:09am

I was getting this same thing, but didn’t dig enough in to it to find out…

LaurentH · October 4, 2018, 7:10pm

I wrote a piece that makes an attempt to explain SSD Multibox in less technical terms. Radek looked through it and seemed to think it was okay, so I’ve added it to the wiki post: SSD Multibox in plain English, I hope that’s okay.
In case anyone would have more feedback, I would be happy to hear it!

wyquek · October 28, 2018, 4:21pm

hey hiromi did you manage to try the activations without the +1? how’s the result? sorry it’s kinda a dated post you made in march

“
Thank you so much, Jeremy!!

To me, it is easier to understand having a “background class” in the target value because I would rather see a target class index than keeping track of it in 1-hot encode.

For convolutional output, I will certainly try without +1 and report back on how it does Thank you very much for the quick response and clarifications!!”

edit: oh nvm @binga did it in s later post

fabiocapsouza · December 2, 2018, 8:52pm

I am having trouble understanding the BCE_Loss part and I would be really grateful if someone shed a light on this.

def one_hot_embedding(labels, num_classes):
     return torch.eye(num_classes)[labels.data.cpu()]

class BCE_Loss(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.num_classes = num_classes

    def forward(self, pred, targ):
        t = one_hot_embedding(targ, self.num_classes+1)
        t = V(t[:,:-1].contiguous())#.cpu()
        x = pred[:,:-1]
        w = self.get_weight(x,t)
        return F.binary_cross_entropy_with_logits(x, t, w, size_average=False)/self.num_classes

    def get_weight(self,x,t): return None

I understand it calculates the one_hot_embedding using 1 extra class for background (so a background box will have target zero for all foreground classes), and then slices the background component off the tensors t and pred before calculating binary_cross_entropy_with_logits. Why is it being sliced off?
Doesn’t this make the activation for background to never be optimized, since the losses are calculated only for foreground classes?
How to detect a background, then?

Thanks in advance,

wyquek · December 3, 2018, 8:17am

This was quite mind-bending too when I went through it. My understanding of how it learns to detect background:

Suppose there is a table in the image, and it is matched with a particular gridcell. The one-hot-embedding will be 1 for table and 0 for other remaining classes, say 19 classes. Since we are using binary cross entropy as the loss function, during optimization the weights will learn to predict higher probability (closer to 1) for table, and lower probability (closer to 0) for the other 19 classes. In this manner the NN has just learned to predict, for this gridcell, Not-chair, Not-human, Not-car…etc, alongside predicting a table.

So next time when there is no object in that gridcell, it will predict low probabilities for all classes i.e background.

fabiocapsouza · December 3, 2018, 12:06pm

Thanks for answering. That makes sense, a background box would produce low probabilities for all foreground classes. However, that still makes no sense that there is a non-optimized background classifier in the network, since the outputs per box are 4+num_classes+1.

wyquek · December 3, 2018, 12:27pm

Not quite sure what’s your question, but I suspect if you are asking why we predict classes+1 in the first place this might be helpful

oscarismael · March 8, 2019, 6:21pm

hi, why does the data have the next shape ((64, 224, 224, 3), (64, 56), (64, 14)) ? if we have 21 classes with the background why it has 14 in the last dimension