Part 2 Lesson 8 wiki

username_not_found · March 21, 2018, 2:21pm

If these would be equivalent representations in image processing, why would they change the NN results? Or would this only be initially and would even out over training? Are there any papers you could suggest that talk about this?

jeremy · March 21, 2018, 6:17pm

Not really that I know of - it’s under-studied. But intuitively speaking I could imagine it being different difficulty to calculate where the bottom right of an object is, vs how big it is. I don’t know whether that’s true or not, but I’d at least say it’s possible…

I hope someone here tests it out and maybe writes a paper or blog post with their findings!

Chris_Palmer · March 21, 2018, 8:03pm

I have an error when using trn_anno,items() to fill the trn_lrg_anno dict.

The error is reproduced below, I have found that there are at least 2 IDs returned from trn_anno.items() that have no bounding box arrays - they have IDs of 0 and 1. Have I missed something that causes these entries?

trn_anno.get(0), trn_anno.get(1), trn_anno.get(12)
([], [], [(array([ 96, 155, 269, 350]), 7)])

In the meantime, if there are no bounding boxes is it really a good idea to cause an exception, perhaps we would be better to just return an empty box and carry on?

trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-80-ba3930bfe75a> in <module>()
----> 1 trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

<ipython-input-80-ba3930bfe75a> in <dictcomp>(.0)
----> 1 trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

<ipython-input-78-794d053568de> in get_lrg(b)
      1 def get_lrg(b):
----> 2     if not b: raise Exception()
      3     b = sorted(b, key=lambda x: np.product(x[0][-2:]-x[0][:2]), reverse=True)
      4     return b[0]

Exception:

jsonm · March 21, 2018, 8:26pm

Personal preference, but I’d rather have the data in the form I’m expecting instead of just ignoring cases during training.

If a picture doesn’t have a bounding box, I’d probably toss it.

PranY · March 21, 2018, 9:32pm

First, trying to find id 1,2 is incorrect because they do not exist as a key in trn_anno dict

#reason

x = []
for o in trn_j[ANNOTATIONS]:
if not o[‘ignore’]:
x.append(o[IMG_ID])
print(len(set(x)))
sorted(set(x))

2501

[12,
17,
23,
26…

I tried running the notebook multiple times with kernel restart and only once I was able to get the same error; I did below to understand better

for a,b in trn_anno.items():
if not b:
print(a)

[0,
1,
3,
4]

But I could not re-produce it after multiple kernel restart or git reset --hard origin/master.

So my best guess is it was a memory leak and you just have to re-run, but I believe any pro python developer would be the best person to answer this

Secondly, I strongly believe that we need to ‘raiseException()’ when there is no bounding box to prevent leaks like this. By design, there should not be a case where a ID is found without a bounding box. My reservation for that scenario is that - it would miss-lead the network into understanding that full image can also be a bounding box and then even though it might perfectly classify the image, it may still choose to bound the full image for a lesser L1 Loss. Again, this is only my guess and @Jeremy may answer it better.

PS: If you find a full proof way to re-produce this error; feel free to drop a msg. I would love to deep dive.

Chris_Palmer · March 21, 2018, 9:51pm

Thanks for the effort you put into this Pranjal - and it was a relief to find you could at reproduce this, even if only once! I’ll try restarting and see it get the error again, meanwhile I am just handling it before I call get_lrg() with a if b != [] on the end of the dictionary comprehesion expression…

PranY · March 21, 2018, 10:29pm

One important thing that I missed, the way trn_anno is defined!
Since it uses collections.defaultdict() Jeremy clearly told that it would create a dictionary entry if the key wasn’t found or otherwise update the value. I’ll just check that calling trn_anno.get(x) would create an entry like x: [ ] in the dictionary or not

Chris_Palmer · March 22, 2018, 1:41am

OK, but I didn’t call the .get() function of the dict in the first place, I just did that in my example to handle an error in case the key didn’t exist.

If you call trn_anno.get(5) (i.e. any number not actually in the array) for instance, you should get back None - this is what I found - I just didn’t include that in my example - so I don’t think an element is added when .get() calls with an unknown value…

quan.tran · March 22, 2018, 3:33am

Hi all!

There are several instances in the notebook where Jeremy sets the differential learning rate lrs = np.array([lr/100,lr/10,lr] but then freezes the model up to last 2 (or 3) layers and call learn.fit(lrs, …) . By freezing almost all the layers, would it defeat the purpose of setting lrs? As I remember that differential learning rate is to set different learning rates to different layer groups to finetune them.

raspstephan · March 22, 2018, 3:39am

I was confused about the same thing earlier. But learn.freeze_to(-2) freezes everything but the last two layer GROUPS, not individual layers. The model has 3 layer groups, the last one being the head that is attached with random weights according to the problems.

This means that the last two layer groups (the second part of the original resnet and our head) are unfrozen. I assume that passing three learning rates is necessary even if the first values does not actually do anything in this case. Please correct me if I am wrong.

quan.tran · March 22, 2018, 4:01am

I think you are right. I look at the source code and inside freeze_to function there is a get_layer_groups() that has 3 groups in this case, so technically he only froze the first group. Thanks for the help!

ecdrid · March 22, 2018, 5:20am

Here’s the nbs on colab running successfully with the help of others(Thanks),(compiled as one)

(Sign IN Before hand to your Gmail account to gain access to the notebook)

Everyone is having edit access , so make changes accordingly to help others…

Let me know if there is any directory error.
You might receive cuda-runtime error on some cells but just re-running the cell will make it disappear…
The whole nbs ran successfully on my end.

jsonm · March 22, 2018, 5:41am

@quan.tran Check out the field “trainable” on each layer when you run learn.summary()

sourabhd · March 22, 2018, 7:48am

I tried to understand and this is my conclusion. Please correct me if I’m wrong.

We have 2 columns (image file((X->{pixels}) and bbox(Y ->{x1,y1, x2, y2})). We are trying to frame a regression model that takes the pixels of an image as an input and predicts the bbox values ( x1,y1, x2, y2).

poppingtonic · March 22, 2018, 11:48am

An equivalent form of

collections.defaultdict(lambda: [])

is

 collections.defaultdict(list)

jeremy · March 22, 2018, 4:52pm

Not wrong!

jeremy · March 22, 2018, 4:52pm

Good point - that’s much better!

ecdrid · March 22, 2018, 6:56pm

In my understanding that’s what Jeremy and others said at the very end of the lecture…
We are trying to predict the bbox coordinates using Regression using L1 metric seems like

(Jeremy will continue with this on the next lecture and the Understanding will become stronger)

binga · March 22, 2018, 8:07pm

A bounding box being a rectangle will have 4 vertices. (x1, y1), (x2, y2), (x4, y4), (x3, y3). Assume clockwise order. Now, instead of predicting all the 8 values given an image, a simpler alternative could be to just predict 4 values.

x1, y1, height (x3 - x1), width (y2 - y1).

These four values are enough for us to obtain all four coordinates of the rectangle.

Remember, all the values are real-valued numbers. Hence, regression and it’s commonly called bounding box regression with 4 targets.

emilmelnikov · March 22, 2018, 8:44pm

This is a comment on the bounding box representation discussion.

If (width,height) is just a linear combination of (x1,y1) and (x2,y2), does it really matter whether we are using (x2,y2) or (width,height) as targets?
In order to find the bounding boxes, we are doing regression, so the last layer should be the matrix product, which is also linear and can figure out that (x2,y2) = (x1,y1) + (w,h).
Does the reasoning above make any sense?