Other references
- Video lesson 8
- Part 2 Lesson 8 wiki
- Thread from @timlee : Notas sobre a primeira hora da lição 8
- Post from @hiromi : Deep Learning 2: Part 2 Lesson 8
Reminder
- Part 1 was full of best practical techniques (differentiable layers, transfer learning, architecture design, handling over-fitting, embeddings) about DL grouped into concepts (computational vision, structured data, linguagem natural)
- Part 2 is cutting edge : more in details with atual techniques but without 100% confident (still on research)
- Lesson 8 (part 2) : object detection (creating much richer CNN structure)
- Differentiable layers : (from Yann Lecun) DL = Differentiable Programming (setting up a differentiable function - the loss function that describes right scores - and using it to update weights in the NN)
What we’ve learned so far :
-
Transfer learning : most important single-thing to use DL effectively (“random weights” - starting from zero - is not the most important thing)
1.1) replace last layer
1.2) fine-tune new layers
1.3) fine-tune more layers (the whole model) -
Architecture design : choose to adapt to dataset
** CNNs for fixed-size ordered data
** RNNs for sequences
** Softmax for single categorical outcome (sigmoids if you’ve got multiple outcomes)
** Relu for inner activations - Avoid overfitting in 5 steps (but starts with a model that overfits and get it better after)
3.1) more data
3.2) data augmentation
3.3) normalization : generalizable architectures (Batch Norm layers, Dense Nets (Densely Connected Convolutional Networks))
3.4) regularization (weight decay, dropout)
3.5) (last) reduce architecture complexity (less layers, less activations) - Embeddings allow us to use categorical data (it is very used today after the release of part 1)
From part 1 to part 2 :
- requires significant fastai customization
- need to understand python well
- other code resources will generally be research quality
- code samples online nearly always have problems
- each lesson will be incomplete - ideas to explore
Jupyter notebook
- don’t copy/paste : try yourself by typing code
- you must practice !
Get a GPU local or use an online GPU
Read academic papers
There are many opportunities for you in this class
- Your homework will be a the cutting
- There are few DL practiioners that know what you know now
- Experiment lots, especially in your area of expertise
- Muche of what you find will have not be written about before
- Don’t wait to be perfect before you start communicating
- If you don’t have a blog, try medium.com
What we’ll study in part 2
-
Generative models
**CNNS beyond classification : localization, enhancement (colorization, super-resolution, artistic style)
** NLP beyond classification (translation, Seq2seq, attention, large vocabularies) - Large data sets (large images, lots of data points, large outputs)
- but note time-series and tabular data (cf part 1 and ML course) as we saw mostly all techniques
Object detection
- We want to classify multiple objects (multi classe) that can overlap (detect bounding boxes and label them)
- bounding box : a box rectangle which has an object quasi entirely with label
- Step 1 : classify and localize the largest object in each image (classify, localize, classify & localize in a bounding box)
1) Analyse training data
- We use the notebook pascal.ipynb
- Download data (see HowTo)
- Choose GPU if you have many :
torch.cuda.set_device(x)
-
PATH.iterdir()
: generator (for example, can get list of files from a folder :list(PATH.iterdir())
- How to deal with a json file (in the video : get content from
json.load
andopen
, get dictionary with dict_keys…) :trn_j = json.load((PATH/'pascal_train2007.json').open())
- Create a dictionary with image id as key and annotations (list with bound box values and category id) as value : defaultdict
- MAKE THINGS CONSISTENT : to be numpy compatible, we change the [X, Y, Height, Width] coordinates of the bound box to [Y, X, X+W-1, Y+H-1].
How to navigate through the Fastai code by using an editor
- Visual Studio Code : good choice as editor
- Setup : download and install, open fastai folder, change interpreter to the python one of your fastai environment (
CTRL+SHIFT+P
), open the integrated terminal - vscode shortcuts
2) Visualize training images
- Some libs take VOC format bounding boxes, so this let’s us convert back when required:
def bb_hw(a): return np.array([a[1],a[0],a[3]-a[1]+1,a[2]-a[0]+1])
-
import an image : use
open_image()
which is a fastai function that uses OpenCV which is 5 to 10 times faster than PyTorch Vision (Pillow PIL is faster that PyTorch Vision but not as OpenCV) :im = open_image(IMG_PATH/im0_d[FILE_NAME])
-
display an image : use matplotlib :
** plt.subplots() returns fig and ax. Really useful wrapper for creating plots, regardless of whether you have more than one subplot.
** Create functions to draw bounded boxes (rectangles) with labels on training images (black/white/black for the line in order to get contrast)
Largest item classifier
- Define a function
get_lrg(b)
- Create a dataframe with columns fn (files names) and category.
- Create a CSV file from this DataFrame.
- Then, you can use the Fastai function
ImageClassifierData.from_csv
to get a classifier model ! - In the
tfms
to create the data object (md), do not crop ! Usecrop_type=cropType.NO
- Fit the model as usually.
Python debugger
-
pdb.set_trace()
to set on the python debugger (list of commands) %debug
Do you know how to create a bounding box model?
- Now we’ll try to find the bounding box of the largest object. This is simply a regression with 4 outputs !