Deep Learning Brasília - Lição 8

[ Português ] Esta thread permite que os participantes do “Grupo de Estudo do Deep Learning de Brasília (DLB)” estudem coletivamente (em reunião presencial e online) a lição 8 (parte 2) do curso, mas com certeza, está aberto para todos. Todos os vídeos desse curso estão online em

O idioma dessa thread é principalmente o português (e o inglês quando evitar traduções inúteis).

Por favor, use essa thread para perguntar e responder às perguntas da lição 1 (parte 2), mas antes de postar, leia a thread seguinte : Part 2 Lesson 8 wiki

[ English ] This thread allows the participants of the “Brasília Deep Learning (DLB) Study Group” to study collectively (face-to-face and online) the lesson 1 (part 1) of the course fast ai but of course, it is opened to all. All the course videos are online at

The language of this thread is mainly Portuguese (and English when it avoids useless translations).

Please, use this thread to ask / answer questions on lesson 1 (part 1) but before to post in this thread, please read : Part 2 Lesson 8 wiki

(from @jeremy : Part 2 Lesson 8 wiki)

Lesson resources


  • Quick summary of how to download the dataset:
     cd ~/fastai/courses/dl2
     ln -s ~/data data && cd $_
     mkdir pascal && cd $_
     curl -OL
     curl -OL
     tar -xf VOCtrainval_06-Nov-2007.tar
     mv PASCAL_VOC/*.json .
     rmdir PASCAL_VOC

Returning to AWS?

  • Login to AWS and get your public IP (xx.xx.xx.xx). Then, follow commands:
ssh ubuntu@xx.xx.xx.xx -L8888:localhost:8888
git pull
conda env update
jupyter notebook
  • For PyCharm and Mac users - a list of the shortcuts Jeremy provided for Visual Studio Code:
    • Action (PyCharm + Mac shortcut)
    • Command palette- (Shift + Command + A)
    • Select interpreter (for fastai env) - (Shift+Command+A) and then look for “interpreter”
    • Select terminal shell- (Shift+Command+A) and then look for “terminal shell”
    • Go to symbol (Option + Command + O)
    • Find references (Command+ G)(go down in the references) (Command + Shift + G) (go up)(Command + Function + F7) (look for all)
    • Go to definition (Command + Down Arrow Key)
    • Go back (Command + [ )
    • View documentation (Option + Space) for viewing source and (Function + F1) for viewing documentation
    • Zen mode (Control + Command + F) and same to get out too
    • Hide sidebar (Command + 1) redoing it will bring it back
    • Find them all with the (Shift + Command+ A) palette option for reference.

Time line for videos

Resources related to stuff that Jermey mentioned we can learn/Homework


Other references


  • Part 1 was full of best practical techniques (differentiable layers, transfer learning, architecture design, handling over-fitting, embeddings) about DL grouped into concepts (computational vision, structured data, linguagem natural)
  • Part 2 is cutting edge : more in details with atual techniques but without 100% confident (still on research)
  • Lesson 8 (part 2) : object detection (creating much richer CNN structure)
  • Differentiable layers : (from Yann Lecun) DL = Differentiable Programming (setting up a differentiable function - the loss function that describes right scores - and using it to update weights in the NN)

What we’ve learned so far :

  1. Transfer learning : most important single-thing to use DL effectively (“random weights” - starting from zero - is not the most important thing)
    1.1) replace last layer
    1.2) fine-tune new layers
    1.3) fine-tune more layers (the whole model)
  2. Architecture design : choose to adapt to dataset
    ** CNNs for fixed-size ordered data
    ** RNNs for sequences
    ** Softmax for single categorical outcome (sigmoids if you’ve got multiple outcomes)
    ** Relu for inner activations
  3. Avoid overfitting in 5 steps (but starts with a model that overfits and get it better after)
    3.1) more data
    3.2) data augmentation
    3.3) normalization : generalizable architectures (Batch Norm layers, Dense Nets (Densely Connected Convolutional Networks))
    3.4) regularization (weight decay, dropout)
    3.5) (last) reduce architecture complexity (less layers, less activations)
  4. Embeddings allow us to use categorical data (it is very used today after the release of part 1)

From part 1 to part 2 :

  • requires significant fastai customization
  • need to understand python well
  • other code resources will generally be research quality
  • code samples online nearly always have problems
  • each lesson will be incomplete - ideas to explore

Jupyter notebook

  • don’t copy/paste : try yourself by typing code
  • you must practice !

Get a GPU local or use an online GPU

Read academic papers

There are many opportunities for you in this class

  • Your homework will be a the cutting
  • There are few DL practiioners that know what you know now
  • Experiment lots, especially in your area of expertise
  • Muche of what you find will have not be written about before
  • Don’t wait to be perfect before you start communicating
  • If you don’t have a blog, try

What we’ll study in part 2

  • Generative models
    **CNNS beyond classification : localization, enhancement (colorization, super-resolution, artistic style)
    ** NLP beyond classification (translation, Seq2seq, attention, large vocabularies)
  • Large data sets (large images, lots of data points, large outputs)
  • but note time-series and tabular data (cf part 1 and ML course) as we saw mostly all techniques

Object detection

  • We want to classify multiple objects (multi classe) that can overlap (detect bounding boxes and label them)
  • bounding box : a box rectangle which has an object quasi entirely with label
  • Step 1 : classify and localize the largest object in each image (classify, localize, classify & localize in a bounding box)

1) Analyse training data

  • We use the notebook pascal.ipynb
  • Download data (see HowTo)
  • Choose GPU if you have many : torch.cuda.set_device(x)
  • PATH.iterdir() : generator (for example, can get list of files from a folder : list(PATH.iterdir())
  • How to deal with a json file (in the video : get content from json.load and open, get dictionary with dict_keys…) : trn_j = json.load((PATH/'pascal_train2007.json').open())
  • Create a dictionary with image id as key and annotations (list with bound box values and category id) as value : defaultdict
  • MAKE THINGS CONSISTENT : to be numpy compatible, we change the [X, Y, Height, Width] coordinates of the bound box to [Y, X, X+W-1, Y+H-1].

How to navigate through the Fastai code by using an editor

2) Visualize training images

  • Some libs take VOC format bounding boxes, so this let’s us convert back when required: def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])
  • import an image : use open_image() which is a fastai function that uses OpenCV which is 5 to 10 times faster than PyTorch Vision (Pillow PIL is faster that PyTorch Vision but not as OpenCV) : im = open_image(IMG_PATH/im0_d[FILE_NAME])
  • display an image : use matplotlib :
    ** plt.subplots() returns fig and ax. Really useful wrapper for creating plots, regardless of whether you have more than one subplot.
    ** Create functions to draw bounded boxes (rectangles) with labels on training images (black/white/black for the line in order to get contrast)

Largest item classifier

  • Define a function get_lrg(b)
  • Create a dataframe with columns fn (files names) and category.
  • Create a CSV file from this DataFrame.
  • Then, you can use the Fastai function ImageClassifierData.from_csv to get a classifier model ! :slight_smile:
  • In the tfms to create the data object (md), do not crop ! Use crop_type=cropType.NO
  • Fit the model as usually.

Python debugger

Do you know how to create a bounding box model?

  • Now we’ll try to find the bounding box of the largest object. This is simply a regression with 4 outputs !