Typing [1]
is a shortcut
Lesson 6 InClass Discussion
Regarding the fact that we will want to do inference on cpu since we donât want to deal with splitting the data into batches: what if the inference process is way too slow on CPU for it to be useful for our industrial purposes?
Muting Jeremy`s notebook might help to make video lectures without unusual sounds .
Understood Iâve been dealing with very overfitted models for one of the Kaggle competitions so probably being hypersensitive.
I meant to ask if we have a return sequences flag to keep the triangle outside the boxâŚbut on hindsight, itâs not a big deal pulling out the last element from the list and doesnât need any more code. Thanks!
Starting to really like PyTorch. Itâs much easier to go up and down the layers of abstraction. Canât believe Jeremy inspected different layers in the network by passing values directly to them to check on Shape and did the entire RNN with just linear layers and for
loop that was so clear and concise. That thing so much harder in static frameworks.
You can either use more CPUs, or switch to using GPU for inference.
@yinterian Can you please share the slides that were shown in the class  related to the entity embedding paper and also the simple diagrams at the end?
I have a few questions here
 In the RNN model, we are building an RNN model from scratch using pytorch. In that in the init function, we have
self.e = nn.Embedding(vocab_size, n_fac)
But in the forward function we use self.e(c1). How does this tally? c1 is of a certain sequence length and not equal to vocab_size. Need someone to explain how this fits in?
 Are we using md to feed in the various batches of c1,c2, c3?
md = ColumnarModelData.from_arrays(â.â, [1], np.stack([x1,x2,x3], axis=1), y, bs=512)

Towards the end of the class there was a question on sequence length and the initial sequence being a bunch of zeros. I didnât get the questions and hence the answer. It would be great if this can be explained as well.

Are we setting only the first hidden state layer weights to identity matrix by using the below code?
m.rnn.weight_hh_l0.data.copy_(torch.eye(n_hidden))
How does this help to contain exploding or vanishing gradients if the weights of other layers are different from this?
 Lastly, whatever I have known about RNNs involve LSTMs. I did not hear Jeremy mention that in the class. Are we going to see that in this part or in the next part?
From these forum threads a group of initiative students under Jeremy management can easily create a book: âDeep learning in fastaiâ.
Video timelines for Lesson 6

00:00:10 Review of articles and works
"Optimization for Deep Learning Highlights in 2017" by Sebastian Ruder,
âImplementation of AdamW/SGDW paper in Fastaiâ,
âImproving the way we work with learning rateâ,
âThe Cyclical Learning Rate techniqueâ 
00:02:10 Review of last week âDeep Dive into Collaborative Filteringâ with MovieLens, analyzing our model, âmovie biasâ, â@propertyâ, âself.models.modelâ, âlearn.modelsâ, âCollabFilterModelâ, âget_layer_groups(self)â, âlesson5movielens.ipynbâ

00:12:10 Jeremy: âI try to use Numpy for everything, except when I need to run it on GPU, or derivativesâ,
Question: âBring the model from GPU to CPU into production ?â, move the model to CPU with âm.cpu()â, âload_model(m, p)â, back to GPU with âm.cuda()â, âzip()â function in Python 
00:16:10 Sort the movies, John Travolta Scientology worst movie of all time âBattlefield Earthâ, âkey=itemgetter()jjâ, âkey=lambdaâ

00:18:30 Embedding interpration, using âPCAâ from âsklearn.decompositionâ for Linear Algebra

00:24:15 Looking at the âRossmann Retail / Storeâ Kaggle competition with the âEntity Embeddings of Categorical Variablesâ paper.

00:41:02 âRossmannâ Data Cleaning / Feature Engineering, using a Test set properly, Create Features (check the Machine Learning âML1â course for details), âapply_catsâ instead of âtrain_catsâ, âpred_test = m.predict(True)â, result on Kaggle Public Leaderboard vs Private Leaderboard with a poor Validation Set. Example: Statoil/Iceberg challenge/competition.

00:47:10 A mistake made by Rossmann 3rd winner, more on the Rossmann model.

00:53:20 âHow to write something that is different than Fastai libraryâ

PAUSE

00:59:55 More into SGD with âlesson6sgd.ipynbâ notebook, a Linear Regression problem with continuous outputs. âa*x+bâ & mean squared error (MSE) loss function with ây_hatâ

01:02:55 Gradient Descent implemented in PyTorch, âloss.backward()â, â.grad.data.zero_()â in âoptim.sgdâ class

01:07:05 Gradient Descent with Numpy

01:09:15 RNNs with âlesson6rnn.ipynbâ notebook with Nietzsche, Swiftkey post on smartphone keyboard powered by Neural Networks

01:12:05 a Basic NN with single hidden layer (rectangle, arrow, circle, triangle), by Jeremy,
Image CNN with single dense hidden layer. 
01:23:25 Three char model, question on âin1, in2, in3â dimensions

01:36:05 Test model with âget_next(inp)â,
Letâs create our first RNN, why use the same weight matrices ? 
01:48:45 RNN with PyTorch, question: âWhat the hidden state represents ?â

01:57:55 Multioutput model

02:05:55 Question on âsequence length vs batch sizeâ

02:09:15 The Identity Matrix (init!), a paper from Geoffrey Hinton âA Simple Way to Initialize Recurrent Networks of Rectified Linear Unitsâ
Deep Learning BrasĂlia  LiĂ§ĂŁo 6
Wiki: Lesson 6
I really didnât get the EmbeddingDotBias
object from this lesson and the âMovie biasâ part of the lesson5movielens notebook.
How is a bias embedding matrix able to tell us what is the best or the worse movie of all times? How can we infer bias = best/worse movie in our case? (btw we did a lookup on the top 3000 movies with movie_bias = to_np(m.ib(V(topMovieIdx)))
so how are we supposed to find the worse movies?)
Itâs even more confusing as @jeremy takes a different approach for âEmbedding interpretationâ where he goes from plotting the scores of reduced dim of the embedding and then guessing what the relationship between the top and bottom items.
Anyone can shed some lights about this? Thanks