<<< Wiki: Lesson 5 ｜ Wiki: Lesson 7 >>>
Lesson resources
 Lesson video
 Video timelines for Lesson 6
 Lesson notes from @timlee
 Lesson notes from @hiromi
 The ppt presentation is in the github repo
 Optimization for Deep Learning Highlights in 2017 by Sebastian Ruder
 Entity Embeddings of Categorical Variables  Guo, Berkhahn paper
 Nietzsche notes! lesson 6 from @amritv
 Configure tags for code navigation in vim
Video timelines for Lesson 6

00:00:10 Review of articles and works
“Optimization for Deep Learning Highlights in 2017” by Sebastian Ruder,
“Implementation of AdamW/SGDW paper in Fastai”,
“Improving the way we work with learning rate”,
“The Cyclical Learning Rate technique” 
00:02:10 Review of last week “Deep Dive into Collaborative Filtering” with MovieLens, analyzing our model, ‘movie bias’, ‘@property’, ‘self.models.model’, ‘learn.models’, ‘CollabFilterModel’, ‘get_layer_groups(self)’, ‘lesson5movielens.ipynb’

00:12:10 Jeremy: “I try to use Numpy for everything, except when I need to run it on GPU, or derivatives”,
Question: “Bring the model from GPU to CPU into production ?”, move the model to CPU with ‘m.cpu()’, ‘load_model(m, p)’, back to GPU with ‘m.cuda()’, ‘zip()’ function in Python 
00:16:10 Sort the movies, John Travolta Scientology worst movie of all time “Battlefield Earth”, ‘key=itemgetter()jj’, ‘key=lambda’

00:18:30 Embedding interpration, using ‘PCA’ from ‘sklearn.decomposition’ for Linear Algebra

00:24:15 Looking at the “Rossmann Retail / Store” Kaggle competition with the ‘Entity Embeddings of Categorical Variables’ paper.

00:41:02 “Rossmann” Data Cleaning / Feature Engineering, using a Test set properly, Create Features (check the Machine Learning “ML1” course for details), ‘apply_cats’ instead of ‘train_cats’, ‘pred_test = m.predict(True)’, result on Kaggle Public Leaderboard vs Private Leaderboard with a poor Validation Set. Example: Statoil/Iceberg challenge/competition.

00:47:10 A mistake made by Rossmann 3rd winner, more on the Rossmann model.

00:53:20 “How to write something that is different than Fastai library”

PAUSE

00:59:55 More into SGD with ‘lesson6sgd.ipynb’ notebook, a Linear Regression problem with continuous outputs. ‘a*x+b’ & mean squared error (MSE) loss function with ‘y_hat’

01:02:55 Gradient Descent implemented in PyTorch, ‘loss.backward()’, ‘.grad.data.zero_()’ in ‘optim.sgd’ class

01:07:05 Gradient Descent with Numpy

01:09:15 RNNs with ‘lesson6rnn.ipynb’ notebook with Nietzsche, Swiftkey post on smartphone keyboard powered by Neural Networks

01:12:05 a Basic NN with single hidden layer (rectangle, arrow, circle, triangle), by Jeremy,
Image CNN with single dense hidden layer. 
01:23:25 Three char model, question on ‘in1, in2, in3’ dimensions

01:36:05 Test model with ‘get_next(inp)’,
Let’s create our first RNN, why use the same weight matrices ? 
01:48:45 RNN with PyTorch, question: “What the hidden state represents ?”

01:57:55 Multioutput model

02:05:55 Question on ‘sequence length vs batch size’

02:09:15 The Identity Matrix (init!), a paper from Geoffrey Hinton “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units”