I created the video timelines for Lesson 8 to 14, and added them to each wiki: they can serve as detailed syllabus for newcomers and veterans altogether.
Also many topics are covered, and explored again, through several videos so I thought I’d make a mega-thread for easier/faster keyword search.
If you are looking for the Part 1 video collection, please check this link:
Lesson 8 video timeline
0:00:00 Intro and review of Part 1
00:08:00 Moving to Python 3
00:10:30 Moving to Tensorflow and TF Dev Summit videos
00:22:15 Moving to PyTorch
00:27:30 From Part 1 “best practices” to Part 2 “new directions”
00:31:40 Time to build your own box
00:36:20 Time to start reading papers
00:39:30 Time to start writing about your work in this course
00:41:30 What we’ll study in Part 2
00:40:40 Artistic style (or neural style) transfer
00:52:10 Neural style notebook
00:54:15 Mendeley Desktop, an app to track research papers
00:56:15 arXiv-Sanity.com
00:59:00 Jeremy on twitter.com and reddit.com/r/MachineLearning/
01:01:15 Neural style notebook (continued)
01:04:05 Broadcasting, APL as “A Programming Language”, and Jsoftware
01:07:15 Broadcasting with Keras
01:12:00 Recreate input with a VGG model
01:22:45 Optimize the loss function with a deterministic approach
01:33:25 Visualize the iterations through a short video
01:37:30 Recreate a style
01:44:05 Transfer a style
Lesson 9 video timeline
00:00:30 Contribute to, and use Lesson 8 Wiki
00:02:00 Experiments on Image/Neural Style Transfer
00:05:45 Advanced tips from Keras on Neural Style Transfer
00:10:15 More tips to read research papers &
“A Neural Algorithm of Artistic Style, Sep-2015”
00:23:00 From Style Transfer to Generative Models
00:32:50 “Perpetual Losses for Real-Time Style Transfer
& Super-Resolution, Mar-2016”
00:39:30 Implementation notebook w/ re-use of ‘bcolz’ arrays from Part 1.
00:43:00 Digress: how “practical” are the tools learnt in Part 2, vs. Part 1 ?
00:52:10 Two approaches to up-sampling: Deconvolution & Resizing
01:09:30 TQDM library: add a progress meter to your loops
01:17:30 Fast Style Transfer w/ “Supplementary Material, Mar-2016”
01:27:45 Ugly artifacts like “checkerboard”: cause and fixes; Keras UpSampling2D
01:31:20 ImageNet Processing in parallel
01:33:15 DeVISE research paper
01:38:00 Digress: Tips on path setup for SSD vs. HD
01:42:00 words, vectors = zip(*w2v_list)
01:49:30 Resize images
01:52:15 Three ways to make an algorithm faster:
memory locality,
simd/vectorization,
parallel processing
Lesson 10 video timeline
00:00:10 Picking an optimizer for Style Transfer (student post on Medium)
Plus other student posts and tips on class project.
00:07:30 Use Excel to understand Deep Learning concepts
00:09:20 ImageNet Processing (continued from Lesson 9)
& Tips to speed up your model (simd & parallel processing)
00:26:45 Adding Preprocessing to Keras ResNet50
00:28:30 Transfer Learning with ResNet in Keras: difficulty #1
00:33:40 Transfer Learning with ResNet in Keras: difficulty #2
00:38:00 Use batches to overcome RAM “Out of Memory”
00:42:00 Final layers to our ResNet model
00:47:00 Nearest Neighbors to look at examples
00:55:00 Fine-Tuning our models and more “Out of Memory” fixes
01:03:00 Find images similar to a word or phrase &
Find images similar to an image !
01:08:15 Homework discussion
01:16:45 How to: multi-input models on large datasets
01:23:15 Generative Adversarial Networks (GAN) in Keras
01:32:00 Multi-Layer-Perceptron (MLP)
01:37:10 Deep Convolutional GAN (DCGAN)
01:40:15 Wasserstein GAN in Pytorch
01:46:30 Introduction to Pytorch
01:55:20 Wasserstein GAN in Pytorch (cont.)
& LSUN dataset
02:05:00 Examples of generated images
02:09:15 Lesson 10 conclusion and assignments for Lesson 11
Lesson 11 video timeline
00:00:30 Tips on using notebooks and reading research papers
00:03:15 Follow-up on lesson 10 and more word-to-image searches
00:07:30 Linear algebra cheat sheet for deep learning (student’s post on Medium)
& Zero-Shot Learning by Convex Combination of Semantinc Embeddings (arXiv)
00:10:00 Systematic evaluation of CNN advances on ImageNet (arXiv)
ELU better than RELU, learning rate annealing, different color transformations,
Max pooling vs Average pooling, learning rate & batch size, design patterns.
00:27:15 Data Science Bowl 2017 (Cancer Diagnosis) on Kaggle
00:36:30 DSB 2017: full preprocessing tutorial, + others.
00:48:30 A non-deep-learning approach to find lung nodules (research)
00:53:00 Clustering (and why Jeremy wasn’t a fan before)
01:08:00 Using Pytorch with GPU for ‘meanshift’ (clustering cont.)
01:22:15 Candidate Generation and LUNA 16 (Kaggle)
01:26:30 Accelerating K-Means on GPU via CUDA (research)
01:27:15 ChatBots ! (long section)
Staring with “memory networks” at Facebook (research)
01:57:30 Recurrent Entity Networks: an exciting area of research in Memory Networks
01:58:45 Concept of “Attention” and “Attentional Models”
Lesson 12 video timeline
00:00:05 K-means clustering in TensorFlow
00:06:00 ‘find_initial_centroids’, a simple heuristic
0012:30 A trick to make TensorFlow feel more like Pytorch
& other tips around Broacasting, GPU tensors and co.
00:24:30 Student’s question about “figuring out the number of clusters”
00:26:00 “Step 1 was to copy our initial_centroids and copy them into our GPU”,
"Step 2 is to assign every point and assign them to a cluster "
00:29:30 ‘Dynamic_partition’, one of the crazy GPU functions in TensorFlow
00:37:45 Digress: “Jeremy, if you were to start a company today, what would it be ?”
00:40:00 Intro to next step: NLP and translation deep-dive, with CMU pronouncing dictionary
via spelling_bee_RNN.ipynb
00:55:15 Create spelling_bee_RNN model with Keras
01:17:30 Question: "Why not treat text problems the same way we do with images’ ? "
01:26:00 Graph for Attentional Model on Neural Translation
01:32:00 Attention Models (cont.)
01:37:20 Neural Machine Translation (research paper)
01:44:00 Grammar as a Foreign Language (research paper)
Lesson 13 video timeline
00:00:10 Fast.ai student accepted into Google Brain Residency program
00:06:30 Cyclical Learning Rates for Training Neural Networks (another student’s paper)
& updates on Style Transfer, GAN, and Mean Shift Clustering research papers
00:13:45 Tiramisu: combining Mean Shitft Clustering and Approximate Nearest Neighbors
00:22:15 Facebook AI Similarity Search (FAISS)
00:28:15 The BiLSTM Hegemony
00:35:00 Implementing the BiLSTM, and Grammar as a Foreign Language (research)
00:45:30 Reminder on how RNN’s work from Lesson #5 (Part 1)
00:47:20 Why Attentional Models use “such” a simple architecture
& “Tacotron: a Fully End-To-End Text-To-Speech Synthesis Model” (research)
00:50:15 Continuing on Spelling_bee_RNN notebook (Attention Model), from Lesson 12
00:58:40 Building the Attention Layer and the ‘attention_wrapper.py’ walk-through
01:15:40 Impressive student’s experiment with different mathematical technique on Style Transfer
01:18:00 Translate English into French, with Pytorch
01:31:20 Translate English into French: using Keras to prepare the data
Note: Pytorch latest version now supports Broadcasting
01:38:50 Writing and running the ‘Train & Test’ code with Pytorch
01:44:00 NLP Programming Tutorial, by Graham Neubig (NAIST)
01:48:25 Question: “Could we translate Chinese to English with that technique ?”
& new technique: Neural Machine Translation of Rare Words with Subword Units (Research)
01:54:45 Leaving Translation aside and moving to Image Segmentation,
with the “The 100 layers Tiramisu: Fully Convolutional DenseNets” (research)
and “Densely Connected Convolutional Networks” (research)
Lesson 14 video timeline
00:01:25 Time-Series and Structured Data
& “Patient Mortality Risk Predictions in Pediatric Intensive Care, using RNN’s” (research)
00:07:30 Time-Series with Rossmann Store Sales (Kaggle)
& 3rd place solution with “a very uncool NN ;-)”.
00:18:00 Implementing the Rossman solution with Keras + TensorFlow + Pandas + Sklearn
Building Tables & Exploratory Data Analysis (EDA)
00:27:15 Digress: categorical variable encodings and “Vtreat for R”
00:30:15 Back to Rossmann solution
& “Python for Data Analysis” (book)
00:36:30 What Jeremy does everytime he sees a ‘date’ in a structured ML model
& other tips
00:43:00 Dealing with duration of special events (holidays, promotions) in Time-Series
00:52:00 Using ‘inplace=True’ in .drop(), & a look at our final ‘feature engineering’ results
00:53:40 Starting to feed our NN
& using ‘pickle.dump()’ for storage encodings
01:00:45 “Their big mistake” and how they could have won #1
01:05:30 Splitting into Training and Test, but not randomly
01:08:20 Why they modified their Sales Target with ‘np.log()/max_log_y’
01:11:20 A look at our basic model
01:14:45 Training our model and questions
01:16:45 Running the same model with XGBoost
01:20:10 “The really, really, really weird things here !”
& end of the Rossmann competition;-)
01:26:30 Taxi Trajectory Prediction (Kaggle) with “another uncool NN” winner
01:38:00 “Start with a Conv layer and pass it to an RNN” question and research
01:42:40 The 100-layers Tiramisu: Fully Convolutional Densenets, for Image Segmentation (Lesson 13 cont.)
01:58:00 Building and training the Tiramisu model
02:02:50 ENet and LINKNet models: better than the Tiramisu ?
02:04:00 Part 2: conclusion and next steps
Don’t forget to use each lesson wiki for additional ressources like Jupyter Notebooks and links to Research Papers.
Eric PB