Lesson 10 in class

Questions and comments here please - don’t forget to ‘like’ anything you want to see discussed in class.

Can we kill the lights to make the screen easier to see? (I’m happy to do this myself, but wanted to check if there was some reason they were on)


Is there a reason Jeremy is using max_workers = 16 when he has 20 threads available?

never-mind he answered this question!

1 Like

Not related to the lesson 10, but did anyone have any luck in buying the Nvidia GeForce GTX 1080 Ti Graphics card for building their own DL server? Its the latest graphics card released by Nvidia, and it is out of stock everywhere I looked. If you bought one of these cards then where did you buy it? Any info is appreciated.

1 Like

Have we talked in class about why we do the merge operation in some of these more complex models?

1 Like

If possible, could you discuss the relationship between preprocessing and the hidden layers? What is optimal?

Keep checking here for availability: http://www.nowinstock.net/computers/videocards/nvidia/gtx1080ti/


If I recall correctly, it allows the net to learn better even as the depth increases. This allows the gradient to passthrough and the net to learn faster (deeper the network, harder it is to learn as the gradients progagate slowly). Hope that helps.

Can you please clarify what you would do if you has more time and bigger hard drive? I am not sure I exactly understand the alternative.

Would it be possible to repeat the questions asked in the class, especially the ones without the microphone please?


@brendan In the case of ResNet, adding the identity lets us learn the residual.
Identity + Layer = Answer
Layer = Answer - Identity
Layer = Residual

1 Like

Are the word vectors from Imagenet? They are the word vectors for the training set?

@resdntalien The words are from ImageNet, and then are looked up in word2vec to get vectors

1 Like

But it doesn’t have to be – yeah? Like we can use some other arbitrary set of words and find the nearest neighbors in that set. E.g. we could just have the words “dog” and “cat” in set we want to find nearest neighbors too… this would kinda work cause the word2vec embedding learned the semantic meaning of the word.

Going back to the k-nearest neighbor problem, back in my GIS days we used a lot of quadtrees or octrees to pre-compute these sort of things. Is there an acceptable cousin of these structures when dimensions exceed 3? (Is this essentially what the Locality Sensitive Hashing Forest is?)

1 Like

@resdntalien Our model is taking images as inputs and outputting word vectors. For the training set, we need images with labels. Once its trained, we can run it on new images and look up new word vectors

Because of the earlier ::-1 problem, we were actually training this model to not learn about colors (we swapped colors instead of width pixels). Could it be that this model would have been better had it been able to learn that for instance trombones are brass color?

@rachel Thanks. Literally the next section of this lecture answered my question – only Jeremy went up in the number of words (from 1000->80K), whereas my question was going down (1000-> 2). I kind of see where this is going. We are really teaching this neural net to “understand and caption” images – e.g. label images using words in the english language.

1 Like

Have you ever tried using Dask for out of memory data?

Dask arrays work in Keras’ model.fit() without putting everything in memory, and think it works with bcolz…