Deep Learning Brasília - Lição 8

Other references

Reminder

  • Part 1 was full of best practical techniques (differentiable layers, transfer learning, architecture design, handling over-fitting, embeddings) about DL grouped into concepts (computational vision, structured data, linguagem natural)
  • Part 2 is cutting edge : more in details with atual techniques but without 100% confident (still on research)
  • Lesson 8 (part 2) : object detection (creating much richer CNN structure)
  • Differentiable layers : (from Yann Lecun) DL = Differentiable Programming (setting up a differentiable function - the loss function that describes right scores - and using it to update weights in the NN)

What we’ve learned so far :

  1. Transfer learning : most important single-thing to use DL effectively (“random weights” - starting from zero - is not the most important thing)
    1.1) replace last layer
    1.2) fine-tune new layers
    1.3) fine-tune more layers (the whole model)
  2. Architecture design : choose to adapt to dataset
    ** CNNs for fixed-size ordered data
    ** RNNs for sequences
    ** Softmax for single categorical outcome (sigmoids if you’ve got multiple outcomes)
    ** Relu for inner activations
  3. Avoid overfitting in 5 steps (but starts with a model that overfits and get it better after)
    3.1) more data
    3.2) data augmentation
    3.3) normalization : generalizable architectures (Batch Norm layers, Dense Nets (Densely Connected Convolutional Networks))
    3.4) regularization (weight decay, dropout)
    3.5) (last) reduce architecture complexity (less layers, less activations)
  4. Embeddings allow us to use categorical data (it is very used today after the release of part 1)

From part 1 to part 2 :

  • requires significant fastai customization
  • need to understand python well
  • other code resources will generally be research quality
  • code samples online nearly always have problems
  • each lesson will be incomplete - ideas to explore

Jupyter notebook

  • don’t copy/paste : try yourself by typing code
  • you must practice !

Get a GPU local or use an online GPU

Read academic papers

There are many opportunities for you in this class

  • Your homework will be a the cutting
  • There are few DL practiioners that know what you know now
  • Experiment lots, especially in your area of expertise
  • Muche of what you find will have not be written about before
  • Don’t wait to be perfect before you start communicating
  • If you don’t have a blog, try medium.com

What we’ll study in part 2

  • Generative models
    **CNNS beyond classification : localization, enhancement (colorization, super-resolution, artistic style)
    ** NLP beyond classification (translation, Seq2seq, attention, large vocabularies)
  • Large data sets (large images, lots of data points, large outputs)
  • but note time-series and tabular data (cf part 1 and ML course) as we saw mostly all techniques

Object detection

  • We want to classify multiple objects (multi classe) that can overlap (detect bounding boxes and label them)
  • bounding box : a box rectangle which has an object quasi entirely with label
  • Step 1 : classify and localize the largest object in each image (classify, localize, classify & localize in a bounding box)

1) Analyse training data

  • We use the notebook pascal.ipynb
  • Download data (see HowTo)
  • Choose GPU if you have many : torch.cuda.set_device(x)
  • PATH.iterdir() : generator (for example, can get list of files from a folder : list(PATH.iterdir())
  • How to deal with a json file (in the video : get content from json.load and open, get dictionary with dict_keys…) : trn_j = json.load((PATH/'pascal_train2007.json').open())
  • Create a dictionary with image id as key and annotations (list with bound box values and category id) as value : defaultdict
  • MAKE THINGS CONSISTENT : to be numpy compatible, we change the [X, Y, Height, Width] coordinates of the bound box to [Y, X, X+W-1, Y+H-1].

How to navigate through the Fastai code by using an editor

2) Visualize training images

  • Some libs take VOC format bounding boxes, so this let’s us convert back when required: def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])
  • import an image : use open_image() which is a fastai function that uses OpenCV which is 5 to 10 times faster than PyTorch Vision (Pillow PIL is faster that PyTorch Vision but not as OpenCV) : im = open_image(IMG_PATH/im0_d[FILE_NAME])
  • display an image : use matplotlib :
    ** plt.subplots() returns fig and ax. Really useful wrapper for creating plots, regardless of whether you have more than one subplot.
    ** Create functions to draw bounded boxes (rectangles) with labels on training images (black/white/black for the line in order to get contrast)

Largest item classifier

  • Define a function get_lrg(b)
  • Create a dataframe with columns fn (files names) and category.
  • Create a CSV file from this DataFrame.
  • Then, you can use the Fastai function ImageClassifierData.from_csv to get a classifier model ! :slight_smile:
  • In the tfms to create the data object (md), do not crop ! Use crop_type=cropType.NO
  • Fit the model as usually.

Python debugger

Do you know how to create a bounding box model?

  • Now we’ll try to find the bounding box of the largest object. This is simply a regression with 4 outputs !