Part 2 Lesson 9 wiki

Please post your questions about the lesson here. This is a wiki post. Please add any resources or tips that are likely to be helpful to other students.

<<< Wiki: Lesson 8Wiki: Lesson 10 >>>

Lesson resources

Papers

You should understand these by now

  • Pathlib
  • JSON
  • Dictionary comprehensions - Tutorials
  • defaultdict
  • How to jump around fastai source - Visual Studio Code / Source Graph / PyCham
  • matplotlib object-oriented API - Python Plotting with Matplotlib (Guide)
  • lambda functions
  • bounding box co-ordinates
  • custom head and bounding box regression
  • everything in Part1

Timeline

  • (0:00:01) Object detection Approach
  • (0:00:55) What you should know by now
  • (0:01:40) What you should know from Part 1 of the course - model input & model output
  • (0:03:00) Working through pascal notebook
  • (0:03:20) Data Augmentations
  • (0:03:35) Image classifier continuous true explain
  • (0:04:40) Create Data augmentations
  • (0:05:55) Showing bound boxes on pictures with augmentations
  • (0:06:25) why we need to transform the bounding box
  • (0:07:15) tfm_y parameter
  • (0:09:45) Running summary model
  • (0:10:40) Putting 2 pieces together done last time
  • (0:10:45) Things needed to train a neural network
  • (0:12:00) Creating data by concatenating
  • (0:13:40) Using the new datasets created
  • (0:14:00) Creating the architecture
  • (0:15:50) Creating Loss function
  • (0:18:00) BatchNorm before or after ReLU
  • (0:19:25) Dropout after BatchNorm
  • (0:21:50) Detection accuracy
  • (0:22:50) L1 when doing both bounding box and classification at the same time is better
  • (0:25:30) Multi-label classification
  • (0:26:25) Pandas defaultdict alternative
  • (0:27:10) reuse smaller models for pre-trained weights for larger models
  • (0:29:15) architecture for going from largest object detector to 16 object detector
  • (0:33:48) YOLO, SSD
  • (0:35:05) 2x2 Grid
  • (0:37:31) Receptive fields
  • (0:41:20) Back to Archiecture
  • (0:41:40) SSD Head code
  • (0:42:40) Research code copy paste problem
  • (0:43:00) fast ai style guide
  • (0:44:42) Reusing code. Back to SSD code
  • (0:45:15) OutConv - 2 conv layers for 2 tasks that we are doing
  • (0:47:20) flattening the outputs of convolution
  • (0:47:52) Loss function needs explained
  • (0:48:36) Difficulty in the matching problem
  • (0:49:25) Break and problem statement for matching problem
  • (0:50:00) Goal for matching problem with visuals
  • (0:51:50) running code of architecture line by line on validation set
  • (0:55:00) anchor boxes, prior boxes, default boxes
  • (0:55:23) Matching problem
  • (0:55:43) jaccard index or jaccard overlap or IOU (Intersection over union)
  • (0:57:35) anchors, overlaps
  • (1:00:00) Map to ground truth function
  • (1:01:50) See the classes for each anchor box should be predicting
  • (1:03:15) Seeing the bounding boxes
  • (1:04:16) Interpret the activations
  • (1:05:36) Binary cross entropy loss
  • (1:09:55) SSD loss function
  • (1:13:52) Create more anchor boxes
  • (1:14:10) Anchor boxes vs bounding boxes
  • (1:14:45) Create more anchor boxes
  • (1:15:25) Why are we not multiplying categorical loss with constant
  • (1:17:20) code for generating more anchor boxes
  • (1:17:59) Diagram - how object detection maps to neural net approach
  • (1:19:50) Rachael - Challenge is making the architecture
  • (1:20:15) Jeremy - There are only 2 architectures
  • (1:20:35) Rachael - Challenge is with anchor boxes
  • (1:20:48) Jermey - Entirely in loss architecture of SSD
  • (1:21:08) Forget the architecture, focus on the loss function
  • (1:22:16) Matching problem
  • (1:23:14) We are using SSD not YOLO so matching problem is easier
  • (1:23:49) Easier way would have to teach YOLO then go to SSD
  • (1:24:25) Loss function needs to be consistent task
  • (1:24:45) Question - 4 by 4 is same as the 16 is a coincidence?
  • (1:25:16) Part 2 is going to assume that comfortable with Part 1
  • (1:26:41) Explaining multiple anchor boxes is next step from last lesson
  • (1:27:46) Code for detection loss function
  • (1:28:32) This class is by far going to be the most conceptually challenging
  • (1:29:40) For every grid cell different size, orientation, zoom
  • (1:30:15) Convolutional layer does not need that many filters
  • (1:30:56) Need to know k = No. of zoom by no. of aspect ratios
  • (1:31:13) Architecture - Number of stride 2 convolutions
  • (1:31:43) We are grab set of outputs from convolutions
  • (1:32:20) Concatenate all outputs
  • (1:32:53) Criterian
  • (1:33:01) Pictures after train - big objects are ok small are not
  • (1:33:55) History of Object detection
  • (1:34:05) Multibox Method Paper - Matching problem introduce
  • (1:34:30) Trying to figure out how to make this better
  • (1:34:41) RCNN - 2 stage network - computer vision and deep learning
  • (1:36:09) YOLO and SSD - same performance with 1 stage
  • (1:37:08) Focal Loss RetinaNet - figured out why mess of boxes is happening
  • (1:38:48) Question - 4 by 4 grid of receptive field with 1 anchor box each, why we need more anchor boxes?
  • (1:40:38) Focal loss for Dense Object detection
  • (1:41:00) Picture of probability of ground truth vs loss
  • (1:41:45) Importance of the picture - why the mess was happening
  • (1:44:05) Not blue but blue or purple loss
  • (1:45:01) Discussing the fantastic paper
  • (1:46:15) Cross entropy
  • (1:48:09) Dealing with class imbalance
  • (1:49:18) Great paper to read how papers should be
  • (1:49:45) Focal Loss function code
  • (1:51:00) Paper tables for variable values
  • (1:52:00) Last step - figure out to pull out interesting parts
  • (1:52:48) NMS - Non Maximum suppression copied code
  • (1:53:50) Lesson 14 Feature pyramids
  • (1:54:15) Deep learning 2 part/complicated to single deep learning
  • (1:55:42) SSD paper model description
  • (2:01:30) Read back citations

Other resources

Blog posts

Videos

Stanford CS231N

Coursera Andrew Ng videos:

Other videos

  • YOLO CVPR 2016 talk – the idea of using grid cells and treating detection as a regression problem is focused on in more detail.
  • YOLOv2 talk – there is some good information in this talk, although some drawn explanations are omitted from the video. What I found interesting was the bit on learning anchor boxes from the dataset. There’s also the crossover with NLP at the end.
  • Focal Loss ICCV17 talk

Other Useful Information

Frequently sought pieces of information in this thread

27 Likes

Because we are using a custom head, it seems that using setting continuous to true in

md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True)

is doesn’t do anything—as it simply sets the final activation function which is then removed when the custom head is added. Am I reading the FastAI code correctly?

checkout how it’s used in dataset.py

Update: sorry not that helpful- point is, there is type coercion going on!

label_arr = np.array([np.array(csv_labels[i]).astype(np.float32) for i in fnames])

Could we use random rotations on CNN models based not on images but on word vectors?

I think with continuous=False your labels get one-hot encoded, so continuos=False prevents that

2 Likes

What if we crop (or perhaps mask) the image which we want to classify based on the bounding box?

Why does it matter if the bounding box is a rectangle as far as rotation? We are predicting coordinates as far as the network is concerned, right?

4 Likes

Ah, nice catch.

You would need more coordinates, more than four numbers/two points, to do a rotated rectangle.

3 Likes

I mean, the alternative would be oval, with center, lenth, width, and the direction of rotation.

After rotation, rectangle is not a rectangle any more, but oval is still an oval.

The definition of these rectangles—the numbers we are asking the neural net to predict—are a top-left and top -right coordinate and a bottom-left and bottom-right coordinate. Rotation can’t be encoded with these.

2 Likes

Optimiser is very important too!

Not really. We don’t need it to know it is a rectangle. We don’t model any shape in the problem anyway. It’s just a regression problem. So we can take the 4 coordinates and draw lines as we see fit

Couldn’t the “rotated bounding box method” be used to straighten images?

3 Likes

Interesting idea.

What’s the intuition behind using a Dropout with p=0.5 after a BatchNorm1d—doesn’t batch norm already do a good job of regularizing?

4 Likes

You have only two coordinates, there’s no way to draw an area only linking this points, so we assume it is a rectangle

1 Like

Reason box gets bigger from rotation:

23 Likes

As a general rule, is it better to put batchnorm before or after a ReLU?

6 Likes

Good point. But we could take the original bbox and get 4 (x, y) points from them.

1 Like