I find that for me there exists a gap between redoing the notebooks and actually doing things on my own. And I am not planning on making the same mistake I did in part 1 - very loosely starting with what was covered in class and going on a fanciful tangent, missing many key points. This time around I will first very thoroughly learn the course material.
Redoing the lesson notebook, with it open in one tab, is a great starting point. But after that, how do I internalize the learnings? Starting a new notebook from scratch would be the way to go but I need something I can go on from - I need to at least have the names of the classes / key params used in the lesson!
Keeping the other notebook open can lead to inadvertently cheating - copying large chunks of code without thinking. My new plan is to finish redoing the lesson notebook and then take notes of the key terms from lesson 8. The terms one needs to jog down will vary from person to person as maybe someone is more familiar with one aspect of the API then the other. I for example definitely have a bit of learning to do using matplotlib, plotting shapes, etc.
Anyhow, I will test drive creating the 2nd notebook from notes (with a slight additional twist I have on my mind) hopefully over the weekend Will both post my notes here and findings how it went in case it might be useful to someone.
All the best to everyone in our attempts to grok the lesson 8 material!
I have gone through the notebook and here are the notes I am hoping will suffice for the recreation of it without looking at it.
pathlib, pathlib.Path(...).iterdir
matplotlib.patches, matplotlib.patheffects
open_image
ax.get_xaxis().set_visible(False)
ax.add_patch(patches.Rectangle(...))
o.set_path_effects(...)
ax.text, ax.add_patch
tfms -> aug=side_on, crop=No # these are not correct param names
show_img, md.xxx_ds.denorm
learn.freeze_to # how do we get layer groups? experiment with that
tfm_y -> COORD
ImageClassifierData(continuous=True) # can I set the loss manually? what about preventing one-hot encoding?
ConvLearner.pretrained(..., custom_head=...)
L1loss
get_cv_idxs
models = ConvnetBuilder(f_model, 0, 0, 0, custom_head=head_reg4) # how does this differ from ConvLearner.pretrained?
learn = ConvLearner(md, models)
md.trn_dl.dataset = trn_ds2 # replacing the dataset
loss for both classification + localization: sigmoid(...) * 224 + 20 * cross_entropy(cat)
Something I did was go through the pascal notebook and change all the variables to have more verbose names. (Jeremy has his reasons for liking terse variable names, but we disagree. I agree that terse variable names can be easier for the author, but I believe that terse variable names make it harder for anyone who is new to the code, which is going to be most people.) That meant I had to understand what each of the variables was and what it was doing. (It also meant that I understood what the format of the annotations was.)
For a next step, I’m going to try to reproduce the notebook by myself, without looking at pascal.ipynb. I’m going to allow myself to look at ANY OTHER notebook, but not pascal.ipynb.
First I actually go through the whole notebook without re-typing anything just to first make sure that everything works as expected and I get to quickly visualize the results.
Next, I go back and re-run the notebook this time re-writing every line one by one and also executing each one to make sure I know what it is doing and why its doing that. If there is some library its using that I’m not familiar with then I spend a bit more time exploring that library (even beyond the functions that were used) just to see what else is possible.
Finally, in the end what I like to do is re-implement the notebook on my own but this time I try to make my own spin on it i.e. changing values, change the model architecture, use a different dataset (Pascal 2012 vs 2007), or take what I learned from it on a test drive with a Kaggle comp
Doing all three of the above lets you see each lesson from 3 very different perspectives and reinforces each piece of the lesson so you can hopefully get the most out of it.
I start with watching the video over again so that I can rewind (I almost always miss at least one major section from the previous live part). Then study group where I try to run the notebook as is to make sure it works, then usually try to apply it to another area that has a similar use-case. Not sure how I will do that last piece with the bounding boxes so I will probably work on recreating with my own spin. The biggest thing I try to focus on is how each of our lessons can be used in a real-world scenario.
They do not seem to follow the triangular policy from the Leslie Smith paper.
What is this magic!
Just like there are dense nets, there are dense notebooks There is just sooooo much content in the lesson 8 notebook even though it seemed so innocuous when Jeremy presented it. Not that this happens to me the first time in my life - more of Radek never learns
Not sure what would need to happen for me to complete my grand plan for this weekend (maybe less procrastination on the forums ) but as things are shaping up, I will probably be done with this notebook by summer 2019.
I’m doing something similar. Check to make sure there are no notebook errors. Next iteration, run each line and add a meaningful comment, print out variable values and dimensions. I’m trying to avoid going off on tangents in the first couple of runs. Each new iteration lets me learn something deeper, or so that’s the plan!
I am not sure I think we need to wait for the tech report from Leslie Smith to come out
In the meantime I looked up his email on arxiv and sent him a thank you for the amazing papers that he publishes Hard to say if that was a move that made a lot of sense, but I guess it might be nice for a person to hear that there are people across the globe interested in their work and waiting for what they are about to publish.
I got a reply from Leslie Smith! Maybe saying thank you to researches / OSS maintainers whose work we rely on is not such a bad idea after all?
That’s probably the least we can do. If I am ever a company and don’t contribute financially to the tools I use please refer me to this comment and kick me in my ankle
Anyhow, Leslie Smith was very kind to send me a reply and said that there is a new paper coming to arxiv soon! Cool!
I have failed to not look at the lesson 8 notebook. The ordering of dimensions that fastai expects for the bb tripped me up
This is what the ordering is for the bboxes both for COCO and Pascal VOC I believe: [<x_coord>, <y_coord>, width, height], but fastai expects them to be top-left, bottom-right coords (2 points) with the first coord being the y coordinate (to mimic the layout of a numpy array) - the switching of height with width is something I didn’t realize.
I do wonder why we subtract the 1 though np.array([bb[1], bb[0], bb[3]+bb[1]-1, bb[2]+bb[0]-1])
Aaaah I get it now. Cause if we start at x = 10 and have a width of 5, then the pixels that constitute the border are [10, 11, 12, 13, 14] (5 of them) and if we want to get the coord of the rightmost point doing 10+5 takes us one pixel too far!
I have failed to complete my project over the weekend but hoping to finish it early next week! Wishing everyone a pleasant Sunday, I am shutting down the operations down for the day!
I’m playing with building a checklist. We spend a lot of time learning the how, and it’s very easy to forget what to do when you start a new blank page.
The checklist is a roadmap: a very high-level series of steps that are more/less general to types of problems, and you take care of the implementation.
Trying it out w/ learning Random Forests in the ML course right now. This is for after you re-write & study the notebooks.
Hmm, thinking about this more now: this can grow to full diagrams for larger projects - like a computation graph. I feel like offloading ‘high-level / overview-mode’ thinking is a big help when working on the low-level / actual implementation.
Skim through nb - especially for learners who are not very proficient in coding
Create a story - crisp or long, based on individual’s comfort
Open a new nb and write code from scratch - using story!
Optionally, look into nb to see how exact code is written - do not copy, rather skim through it.
The idea is not to kill time thinking what would be next step - as with Jeremy’s top-down approach, we’ll go through code multiple times and get used to it - rather, get comfortable with code, try different P&C. Point 4 is important for learners with less experience in coding. Each nb is huge and small-2 functions can be overwhelming to get completed in the 1st iteration.
Yeah I mentioned in the lesson that we use numpy/pytorch coordinates, whereas coco and PIL use computer graphics coordinates. It’s annoying that there’s different standards!
I think it was clearly explained in the lecture. I even was able to reproduce the lesson 8 notebook just fine (looking at it in another browser tab) It was only when I set out to do localization from scratch that I noticed that I made some incorrect assumptions that I didn’t verify while rewriting code.
Not that it matters a lot, but I think I like the convention fastai adopted more than the other one. One just needs to think about how indexing into an array works and that contains all the information that we need
A couple of fastai style bounding boxes along with upper left and lower right point coordinates in case someone might find it helpful
(the numbering of images and their correspondence to rows of coordinates is I believe [[0, 1], [2, 3]])
That is an interesting thing you found there Asif! I still think though that the subtraction is because what I tried to verbalize probably in not the greatest way a little bit above in this thread:
To go from the computer graphics coordinates to numpy/pytorch/fastai coords, we go from a point given as x1 + width to the resulting coord x2.
Now, I could be horribly wrong on this one, but if we have a starting point 10 and want the x coordinate that would end a segment that is of length 3, we need to do 10 + 3 - 1. Pixel 12 will be the end pixel of the segment.
| _ | _ | _ |
10 11 12
A bit crazy because we operate in discrete pixels and not continuous coord system we are used to thinking about. In the coord system a segment of length 3 would go from x = 10 to x2 = 13, because the points have no width. Here we are really operating on numbered squares of dim 1 x 1, so we want to grab square in row 12th and not 13th!
Asif - I might be completely wrong on this one but I can’t seem to convince myself that I am If anyone could shoot holes in my reasoning that would be greatly appreciated It’s already after midnight and have a NN that needs my attention so I think I will leave my reasoning where it is even though I do not completely trust it myself