Bounding box regression using resnet34 in fastai v1

I am trying to do the bbox regression (from fastai v2 lesson 8) on PASCAL VOC dataset using the fastai v1 library. I got the csv from lesson 8 fastai v2. It has 2 columns:filenames and bbox. bbox has 4 integers specify the top-left and bottom-right coordinates of bounding box.
I am facing errors in creating the databunch and learn object.
I tried creating the databunch like this:

data = (ObjectItemList.from_df(df=lab_csv, cols='fn', path=path_img)
        .transform(get_transforms(), tfm_y=True, size=(224,224))
  • lab-csv is the dataframe from csv
  • fn is the name of column specifying file names
  • path_img is the path of folder containing images
  • get_label is a function which returns a torch tensor containing 4 integers of bbox coordinates

I am getting the error:
TypeError: iteration over a 0-d tensor
on line:

How can I fix this error?
Also, I was wondering do I need to explicitly specify the custom head in learn object, or the library will automatically create it because I am using ObjectItemList?

Thanks in advance :slight_smile:

1 Like

I have also been trying to recreate the previous lesson 8 in fast ai v1.

I used the get_annotations() function to read the pascal VOC json file directly, rather than using csv, that worked ok. required a few modifications to find the largest bounding boxes, but have the first part being the classifier working now.

Have not yet worked out how to load a regression box only dataset using the data bunch API, however it does work if I include the classes.

Still create_cnn() does not appear to build the correct custom_head for a linear regression problem by itself, so currently working through setting my own custom_head and loss functions.

Did you ever get this figured out? I was able to get the data loaded using the ObjectItemList.from_folder method with a custom label_from_func that matched the return of the fastai ‘get_annotations’ method: [ [ [y,x,y,x], [‘label’] ] ]

Lesson 8:
head_reg4 = nn.Sequential(Flatten(), nn.Linear(25088,4))
learn = ConvLearner.pretrained(f_model, md, custom_head=head_reg4)
learn.opt_fn = optim.Adam
learn.crit = nn.L1Loss()

Do you have to use the create_head method in current fastai to pass as the custom_head parameter in the create_cnn?

I skipped the regression only problem as I haven’t yet worked out how to create a bounding box only data loader. I wasn’t too sure what effect having the classes in the mini batches but not in the loss function would cause.

I do however have the final part of lesson 8 (bbox + class) implemented now up to atleast training. I used the custom head from lesson 8 and slightly modified the loss and metric functions from the lesson.

def detn_loss(input, target, c_t):
    bb_t = target
    bb_i,c_i = input[:, :4], input[:, 4:]
    bb_i = torch.sigmoid(bb_i)*224
    bb_t = torch.sigmoid(bb_t)*224
    # I looked at these quantities separately first then picked a multiplier
    #   to make them approximately equal
    return F.l1_loss(bb_i, bb_t) + F.cross_entropy(c_i, c_t.flatten())*15

That is the loss function I am using, basically the ground truth values are unpacked when they are passed to the loss function. Similar changes are required for the metric functions also. Previously these were unpacked within the loss (and metric) functions like in the lesson. I suspect I may not need the sigmoids in the loss function but until I can see the results just leaving them there.

Currently create_cnn generates the head for classification when using ObjectItemList data loader so yes you do have to pass in the correct custom_head for your problem. Also data.c doesnt seem to be correct, I would expect data.c=4+ len(data.classes) but by default its just set to the number of classes.

Anyway my model is training but none of the visualizations work yet, also due to the unpacking of ground truth arguments when getting passed around.

I will post more complete code once I work out this last bit and can see what I am training! I haven’t had time to look closely into the visualization issues yet.

1 Like

I managed to do it by changing the loss function. I used the normal ObjectItemList, but it has to be used with some label for each bounding box. So I used the same label - ‘label’ for every bounding box. My loss function ignores the label and returns loss by only considering the bounding box coordinates.

label_dict = {filename:bbox for filename,bbox in zip(filenames, bbox)}

# label_dict['000012.jpg'] -> Returns [96, 155, 270, 351]

def bbox_label_func(filepath):
    filename = Path(filepath).name
    bbox = [label_dict[Path(filepath).name]]
    return [bbox, ['label']]

data = (ObjectItemList.from_df(df, path, cols='filenames', folder='train')
       .transform(get_transforms(), size=224, tfm_y=True)

Creating the learner with custom head

head = nn.Sequential(Flatten(), nn.Linear(25088,4))
learn = cnn_learner(data, models.resnet34, custom_head=head)

Custom loss function to ignore labels

def my_loss(x,y, *args):
  y = y[0].squeeze(1)
  return F.l1_loss(x,y)

Changing the learner loss function

learn.loss_func = my_loss

I was able to train the model. However, I can’t visualize the results using learn.show_results(). I think we need to make a custom ItemList to be able to visualize the results.


hi jaye
Do you know the purpose of bb_pad_collate function here what does it do ?
where is it defined are we using it from 102a_Coco notebook

    def bb_pad_collate(samples:BatchSamples, pad_idx:int=0, pad_first:bool=True) -> Tuple[FloatTensor, Tuple[LongTensor, LongTensor]]:
        "Function that collect samples and adds padding."
        max_len = max([len(s[1].data[1]) for s in samples])
        bboxes = torch.zeros(len(samples), max_len, 4)
        labels = torch.zeros(len(samples), max_len).long() + pad_idx
        imgs = []
        for i,s in enumerate(samples): 
            bbs, lbls = s[1].data
            bboxes[i,-len(lbls):] = bbs
            labels[i,-len(lbls):] = lbls
        return,0), (bboxes,labels)

Hi, May I ask what loss you have?

Here is my issue, my loss is stuck at 0.3 and bounding box is nowhere close to the result of 2019

For visualization part, I didn’t got time to figure out how to change show_result

But it is easy if you just use you prediction. Suppose you use fastai default ObjectItemList, your can do

x,y = next(iter(data.train_dl))
pred = learn.model(x)

pred[0] y[0]

It should be a vector of shape 1,4

box = ImageBBox.create(*tensor([size,size]).float(), pred[None,0], scale=False, labels=[0], classes=[‘label’])


This is your prediction, and you can the same thing to y[0], but when you put y[0] it is 1,4 already so you don’t have to stick [None] there to make it 1,4.

I hope this makes sense. Let me know if you have any issue.

1 Like

I believe what this does is pads the possible classes so that the tensor you’re training from has the same shape for each image. If image A has 2 objects and image B has 10 objects, you couldn’t train them in the same mini-batch unless you made them all 10 (padded with 0’s for the objects not in that image).

There is a good explanation in this article.

@jayeshsaita warrior, I am trying to face this same challenge. However, I have faced countless problems and, therefore, I begin to get discouraged. Regardless if you finished, you could share your draft so I can have some basic guidance.

Gratitude anyway and congratulations on all the knowledge. Amazing!