Part 2 Lesson 9 wiki

@jeremy get_y(bbox,clas) function defined in notebook of the lesson breaks down when there is no bounding box in the batch. In that case, all the elements of input bbox are zero and resultantly bb_keep variable gets assigned nothing.

It might be a frequent case in datasets where objects are sparse.

What can be done to fix this error ?

1 Like

I also have similar questions to @guptapankaj1993.

  1. How could we handle training images that have no positive bounding box? For example, if we wanted to train a “cat” detector and some training images don’t contain a cat but a dog instead.

My naive solution is set all bbox coordinators to be zero as below. Please comment if you have a more elegant solution.
fnames, bbox
cat1111.jpg, 0 0 0 0

  1. Beside bounding box, I am also interested in predicting the segmentation masks (polygons around “cats”). However, different masks in different training images have different lengths. So my question is that how do I handle the training labels with different lengths?

My naive solution is to add zeros to the polygons until its length reaches the pre-defined maximum length Nmax. So that all training labels will have the same length of Nmax. Please comment if you have a more elegant solution.

Thank you in advance and your feedback is very much appreciated!

Hi,

I am confused by Jeremy’s explanation in the video of the reasoning behind using two Conv2d’s in the OutConv class (0:45:15) and hopefully someone can clarify.

These two Conv2d’s are described as being “nearly the same thing” as using a single Conv2d and that using two separate Conv2d layers “lets these layers specialize just a little bit” by “sharing every single layer except the last one”

However, surely the filters in a convolution layer are totally independent from each other? As these two Conv2d layers both have the exact same inputs and have the same strides and filter sizes there really is precisely no difference between a single Conv2d and two separate Conv2d layers in this case, and using a single Conv2d and slicing the required parts of the output volume to feed into the appropriate parts of the loss function would have exactly the same effect.

It is true that they share every layer except the last one (the one they are in) but that would be the case whether a single Conv2d or two Conv2d’s with the same number of filters was used.

So to my mind the ‘intuition’ that separating them will somehow improve the ability of the network to learn the necessary function is wrong and while I do not know how convolution layers are implemented in GPUs I would not be surprised if there was a performance benefit to using a single one with more filters as I could imagine that there could be opportunities for further parallelism/vectorization.

(Note: I have not been able to test if there is any performance difference as my Jupyter notebook is not running at present but I would love to hear if anyone tries it.)

Around 21:00, Jeremy shows the loss function for the object classification and bounding box regression. For the regression, he uses the sigmoid to force the bounding box coordinates inside the frame (to make it easier for the model to learn). In that function, why didn’t he use a softmax on the classification before getting the cross-entropy loss?

In the final model we have grid sizes of [4, 2, 1] and k=9 for a total of 189 anchor boxes. When we ask the model to predict on an image, we get a [189, 21] tensor of class predictions and a [189, 4] tensor of bounding box predictions.

As I understand it, we determine what predictions/boxes are meaningful/accurate by calculating the IOU for each prediction relative to a threshold.

How would we use this type of model for prediction without a known ground truth? How would we determine what bounding box predictions are meaningful? Would we do something like put the class predictions through a sigmoid and compare them to some cutoff value, or is there a better way?

I’m facing a problem here, I’m using a GPU.
When i use CPU for executing ssd_loss(batch,y,True) as given in notebook, i get this:

RuntimeError: Expected object of type Variable[torch.FloatTensor] but found type Variable[torch.cuda.FloatTensor] for argument #1 'other'

Although I made a minor tweak. After i get x,y and convert them to Variable, i had to do this differently:

batch = learn.model(x.cpu())

When i put everything on gpu, i get this when running ssd_loss(batch,y,true)

TypeError: Performing basic indexing on a tensor and encountered an error indexing dim 0 with an object of type torch.cuda.LongTensor. The only supported types are integers, slices, numpy scalars, or if indexing with a torch.LongTensor or torch.ByteTensor only a single Tensor may be passed.

How to deal with this problem ? If someone has a working nb, or solution, please share it

At 1:18:30, Jeremy draws a vector containing the ground truth’s bounding boxes. He also refers to it as the dependent variable. In what way is it a dependent variable? What’s the independent variable?

Hi everyone,

I am having difficulties with getting a test set to work for the bounding boxes.
I am getting an indexing error:

~/fastai/fastai/zeroshot/fastai/transforms.py in make_square(y, x)
195 y1 = np.zeros((r, c))
196 y = y.astype(np.int)
–> 197 y1[y[0]:y[2], y[1]:y[3]] = 1.
198 return y1
199

IndexError: index 2 is out of bounds for axis 0 with size 1

If you have gotten a test set to work with this notebook then please let me know, any help with this would be appreciated :slight_smile:

Hey guys, check out the my blog on Introduction to Object Detection. Hope you enjoy it and feel free to comment in case of any queries or suggestions.

i’m also stuck at this.

Have you tried removing the .cpu()?
So it looks like this; batch = learn.model(x)

Removing .cpu() if you are running on gpu worked for me :grinning:

I already tried with that, not working. Which Pytorch version are you using btw ?

still have version 0.3

I think the problem I’m seeing has been solved with v0.4, there were some issues with byteTensor and they added it’s support in that. Cause I have tried removing all instances from cpu, put everything on gpu, Didn’t quite work out. Could you share the working notebook which you yourself have tried ?

I had a similar problem in the pascal notebook with pytorch 0.4.1 and I was able to fix it with two changes:

1.) In the function “detn_loss” and “detn_l1” change “F.sigmoid” to “torch.sigmoid” to avoid the errors during learn.lr_find() and learn.fit(). (This is not a problem but the output looks much better.)

2.) The learn.fit() was always throwing an error when calculating the metrics:
Expected object of type torch.LongTensor but found type torch.cuda.LongTensor for argument #2 'other'
This I could fix with copying the accuracy_np function into the notebook and adapting it to this (after the # you see the original version):

def accuracy_np(preds, targs):
    preds = np.argmax(preds, 1)
    return np.mean(to_np((preds==targs.cpu()))) # (preds==targs).mean()

Here you also see the nice info from the debugger (started with “%debug” in a cell):

> <ipython-input-199-b9616952fd70>(3)accuracy_np()
      1 def accuracy_np(preds, targs):
      2     preds = np.argmax(preds, 1)
----> 3     return np.mean(to_np((preds==targs))) # (preds==targs).mean()

ipdb> p preds
tensor([14, 17,  2, 14, 14,  6, 13,  8,  2,  9, 15,  6, 17,  7, 12,  0, 14, 14,
         7, 19,  1, 14, 14, 13, 14, 14, 14,  6,  9, 18, 13,  0,  2,  6, 18, 11,
        14,  6,  0, 10,  6, 13, 12, 14,  3, 13, 14,  7, 14, 13,  9, 14, 13,  2,
        14, 11,  6,  0,  2,  2,  2, 14, 14,  9])
ipdb> p targs
tensor([14, 17,  2, 14, 14,  5, 13, 10,  2,  9, 15,  6,  8,  7, 12,  0, 14, 14,
         7, 19,  1, 14, 14, 13, 14, 14, 14,  5,  9, 18, 13,  0,  2,  6, 18, 11,
        14,  3,  0, 10,  6, 13, 12, 14,  3, 13, 14,  7, 12, 13,  9, 14,  1,  2,
        14, 11,  6,  0,  2,  2,  2, 17, 14,  9], device='cuda:0')
ipdb> p preds==targs
*** RuntimeError: Expected object of type torch.LongTensor but found type torch.cuda.LongTensor for argument #2 'other'
ipdb> p preds==targs.cpu()
tensor([1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], dtype=torch.uint8)
ipdb> p to_np(preds==targs.cpu())
array([1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], dtype=uint8)
ipdb> p np.mean(to_np((preds==targs.cpu())))
0.875
ipdb> q

Maybe the debugging overview helps somebody not familiar with it.

Best regards
Michael

2 Likes

Around 20:36 I was looking at the head_reg4 and was wondering why we output 256 features from the Linear layer. Could someone explain or refer me to a source on how I go about choosing input and output features please?

This helped me a lot.
Thank you!!!

I got the same error multiple times. I have to replace .cpu() to .cuda() at every line to get rid of the error.

For multibox detection, when we one-hot encode the labels, if the ground truth is only background the vector will be full of zeros, so any vector of activations will give a cross entropy loss of zero… isn’t that a big problem ?

@ guptapankaj1993 I am having the same problem, did you find any solution?