Lesson 10 in class

jeremy · March 14, 2017, 12:52am

Questions and comments here please - don’t forget to ‘like’ anything you want to see discussed in class.

cody · March 14, 2017, 1:53am

Can we kill the lights to make the screen easier to see? (I’m happy to do this myself, but wanted to check if there was some reason they were on)

hamelsmu · March 14, 2017, 1:54am

Is there a reason Jeremy is using max_workers = 16 when he has 20 threads available?

hamelsmu · March 14, 2017, 2:01am

never-mind he answered this question!

thejaswi.hr · March 14, 2017, 2:07am

Not related to the lesson 10, but did anyone have any luck in buying the Nvidia GeForce GTX 1080 Ti Graphics card for building their own DL server? Its the latest graphics card released by Nvidia, and it is out of stock everywhere I looked. If you bought one of these cards then where did you buy it? Any info is appreciated.

brendan · March 14, 2017, 2:10am

Have we talked in class about why we do the merge operation in some of these more complex models?

mgarrus · March 14, 2017, 2:11am

If possible, could you discuss the relationship between preprocessing and the hidden layers? What is optimal?

nikesh · March 14, 2017, 2:12am

Keep checking here for availability: http://www.nowinstock.net/computers/videocards/nvidia/gtx1080ti/

Surya501 · March 14, 2017, 2:13am

If I recall correctly, it allows the net to learn better even as the depth increases. This allows the gradient to passthrough and the net to learn faster (deeper the network, harder it is to learn as the gradients progagate slowly). Hope that helps.

prabu · March 14, 2017, 2:14am

Can you please clarify what you would do if you has more time and bigger hard drive? I am not sure I exactly understand the alternative.

Surya501 · March 14, 2017, 2:16am

Would it be possible to repeat the questions asked in the class, especially the ones without the microphone please?

rachel · March 14, 2017, 2:17am

@brendan In the case of ResNet, adding the identity lets us learn the residual.
roughly:
Identity + Layer = Answer
Layer = Answer - Identity
Layer = Residual

resdntalien · March 14, 2017, 2:26am

Are the word vectors from Imagenet? They are the word vectors for the training set?

rachel · March 14, 2017, 2:27am

@resdntalien The words are from ImageNet, and then are looked up in word2vec to get vectors

resdntalien · March 14, 2017, 2:29am

But it doesn’t have to be – yeah? Like we can use some other arbitrary set of words and find the nearest neighbors in that set. E.g. we could just have the words “dog” and “cat” in set we want to find nearest neighbors too… this would kinda work cause the word2vec embedding learned the semantic meaning of the word.

timanglade · March 14, 2017, 2:32am

Going back to the k-nearest neighbor problem, back in my GIS days we used a lot of quadtrees or octrees to pre-compute these sort of things. Is there an acceptable cousin of these structures when dimensions exceed 3? (Is this essentially what the Locality Sensitive Hashing Forest is?)

rachel · March 14, 2017, 2:34am

@resdntalien Our model is taking images as inputs and outputting word vectors. For the training set, we need images with labels. Once its trained, we can run it on new images and look up new word vectors

paulm · March 14, 2017, 2:34am

Because of the earlier ::-1 problem, we were actually training this model to not learn about colors (we swapped colors instead of width pixels). Could it be that this model would have been better had it been able to learn that for instance trombones are brass color?

resdntalien · March 14, 2017, 2:38am

@rachel Thanks. Literally the next section of this lecture answered my question – only Jeremy went up in the number of words (from 1000->80K), whereas my question was going down (1000-> 2). I kind of see where this is going. We are really teaching this neural net to “understand and caption” images – e.g. label images using words in the english language.

davecg · March 14, 2017, 2:38am

Have you ever tried using Dask for out of memory data?

Dask arrays work in Keras’ model.fit() without putting everything in memory, and think it works with bcolz…