Part 2 Lesson 8 wiki

If these would be equivalent representations in image processing, why would they change the NN results? Or would this only be initially and would even out over training? Are there any papers you could suggest that talk about this?

Not really that I know of - itā€™s under-studied. But intuitively speaking I could imagine it being different difficulty to calculate where the bottom right of an object is, vs how big it is. I donā€™t know whether thatā€™s true or not, but Iā€™d at least say itā€™s possibleā€¦

I hope someone here tests it out and maybe writes a paper or blog post with their findings!

I have an error when using trn_anno,items() to fill the trn_lrg_anno dict.

The error is reproduced below, I have found that there are at least 2 IDs returned from trn_anno.items() that have no bounding box arrays - they have IDs of 0 and 1. Have I missed something that causes these entries?

trn_anno.get(0), trn_anno.get(1), trn_anno.get(12)
([], [], [(array([ 96, 155, 269, 350]), 7)])

In the meantime, if there are no bounding boxes is it really a good idea to cause an exception, perhaps we would be better to just return an empty box and carry on?

trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-80-ba3930bfe75a> in <module>()
----> 1 trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

<ipython-input-80-ba3930bfe75a> in <dictcomp>(.0)
----> 1 trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

<ipython-input-78-794d053568de> in get_lrg(b)
      1 def get_lrg(b):
----> 2     if not b: raise Exception()
      3     b = sorted(b, key=lambda x: np.product(x[0][-2:]-x[0][:2]), reverse=True)
      4     return b[0]

Exception:

Personal preference, but Iā€™d rather have the data in the form Iā€™m expecting instead of just ignoring cases during training.

If a picture doesnā€™t have a bounding box, Iā€™d probably toss it.

First, trying to find id 1,2 is incorrect because they do not exist as a key in trn_anno dict

#reason

x = []
for o in trn_j[ANNOTATIONS]:
if not o[ā€˜ignoreā€™]:
x.append(o[IMG_ID])
print(len(set(x)))
sorted(set(x))

2501

[12,
17,
23,
26ā€¦

I tried running the notebook multiple times with kernel restart and only once I was able to get the same error; I did below to understand better

for a,b in trn_anno.items():
if not b:
print(a)

[0,
1,
3,
4]

But I could not re-produce it after multiple kernel restart or git reset --hard origin/master.

So my best guess is it was a memory leak and you just have to re-run, but I believe any pro python developer would be the best person to answer this

Secondly, I strongly believe that we need to ā€˜raiseException()ā€™ when there is no bounding box to prevent leaks like this. By design, there should not be a case where a ID is found without a bounding box. My reservation for that scenario is that - it would miss-lead the network into understanding that full image can also be a bounding box and then even though it might perfectly classify the image, it may still choose to bound the full image for a lesser L1 Loss. Again, this is only my guess and @Jeremy may answer it better.

PS: If you find a full proof way to re-produce this error; feel free to drop a msg. I would love to deep dive.

1 Like

Thanks for the effort you put into this Pranjal - and it was a relief to find you could at reproduce this, even if only once! Iā€™ll try restarting and see it get the error again, meanwhile I am just handling it before I call get_lrg() with a if b != [] on the end of the dictionary comprehesion expressionā€¦

1 Like

One important thing that I missed, the way trn_anno is defined!
Since it uses collections.defaultdict() Jeremy clearly told that it would create a dictionary entry if the key wasnā€™t found or otherwise update the value. Iā€™ll just check that calling trn_anno.get(x) would create an entry like x: [ ] in the dictionary or not

1 Like

OK, but I didnā€™t call the .get() function of the dict in the first place, I just did that in my example to handle an error in case the key didnā€™t exist.

If you call trn_anno.get(5) (i.e. any number not actually in the array) for instance, you should get back None - this is what I found - I just didnā€™t include that in my example - so I donā€™t think an element is added when .get() calls with an unknown valueā€¦

Hi all!

There are several instances in the notebook where Jeremy sets the differential learning rate lrs = np.array([lr/100,lr/10,lr] but then freezes the model up to last 2 (or 3) layers and call learn.fit(lrs, ā€¦) . By freezing almost all the layers, would it defeat the purpose of setting lrs? As I remember that differential learning rate is to set different learning rates to different layer groups to finetune them.

I was confused about the same thing earlier. But learn.freeze_to(-2) freezes everything but the last two layer GROUPS, not individual layers. The model has 3 layer groups, the last one being the head that is attached with random weights according to the problems.

This means that the last two layer groups (the second part of the original resnet and our head) are unfrozen. I assume that passing three learning rates is necessary even if the first values does not actually do anything in this case. Please correct me if I am wrong.

7 Likes

I think you are right. I look at the source code and inside freeze_to function there is a get_layer_groups() that has 3 groups in this case, so technically he only froze the first group. Thanks for the help!

1 Like

Hereā€™s the nbs on colab running successfully with the help of others(Thanks),(compiled as one)

(Sign IN Before hand to your Gmail account to gain access to the notebook)

Everyone is having edit access , so make changes accordingly to help othersā€¦

  • Let me know if there is any directory error.

  • You might receive cuda-runtime error on some cells but just re-running the cell will make it disappearā€¦

  • The whole nbs ran successfully on my end.

5 Likes

@quan.tran Check out the field ā€œtrainableā€ on each layer when you run learn.summary()

3 Likes

I tried to understand and this is my conclusion. Please correct me if Iā€™m wrong.

We have 2 columns (image file((X->{pixels}) and bbox(Y ->{x1,y1, x2, y2})). We are trying to frame a regression model that takes the pixels of an image as an input and predicts the bbox values ( x1,y1, x2, y2).

1 Like

An equivalent form of

collections.defaultdict(lambda: [])

is

 collections.defaultdict(list)
4 Likes

Not wrong!

1 Like

Good point - thatā€™s much better! :slight_smile:

1 Like

In my understanding thatā€™s what Jeremy and others said at the very end of the lectureā€¦
We are trying to predict the bbox coordinates using Regression using L1 metric seems like

(Jeremy will continue with this on the next lecture and the Understanding will become stronger)

A bounding box being a rectangle will have 4 vertices. (x1, y1), (x2, y2), (x4, y4), (x3, y3). Assume clockwise order. Now, instead of predicting all the 8 values given an image, a simpler alternative could be to just predict 4 values.

x1, y1, height (x3 - x1), width (y2 - y1).

These four values are enough for us to obtain all four coordinates of the rectangle.

Remember, all the values are real-valued numbers. Hence, regression and itā€™s commonly called bounding box regression with 4 targets.

5 Likes

This is a comment on the bounding box representation discussion.

If (width,height) is just a linear combination of (x1,y1) and (x2,y2), does it really matter whether we are using (x2,y2) or (width,height) as targets?
In order to find the bounding boxes, we are doing regression, so the last layer should be the matrix product, which is also linear and can figure out that (x2,y2) = (x1,y1) + (w,h).
Does the reasoning above make any sense?