Lesson 12 in class

jeremy · March 28, 2017, 12:16am

Looking forward to your questions!

hamelsmu · March 28, 2017, 2:08am

Can you explain what happens if you don’t create InteractiveSession() ? What is interactiveSession() doing again?

garima.agarwal · March 28, 2017, 2:14am

How could we extrapolate this to other non-numeric data types such as words, images etc?

davecg · March 28, 2017, 2:14am

http://scikit-learn.org/stable/modules/clustering.html

Clustering performance evaluation

Particularly the silhouette coefficient (intracluster distance vs intercluster distance).

davecg · March 28, 2017, 2:20am

@garima.agarwal

If you can create a feature vector from it, you can cluster it.

e.g. Image --> VGG output vector

Word --> embedding vector

Sentence --> LSTM output

paulm · March 28, 2017, 2:27am

Because we evolve the centroids, and then pull points in that are closest to it, then re-assign the centroid point to be the center (average) of those points, we wind up with points assigned to centroids that they are not closest to (you can see it in the picture).

Ideas for maybe fixing this? Adding in some negative effect for how close the centroid is to things not in the group?

davecg · March 28, 2017, 2:40am

Not sure who that was talking about sarcoidosis and Kaiser, but feel free to message me. /radiologist

Kjeanclaude · March 28, 2017, 2:47am

Why is it important to sort the phonemes ?

kpatnaik · March 28, 2017, 2:58am

Is it a best practice to do dropout across time/sequence ? (dropout_W)

hamelsmu · March 28, 2017, 3:00am

can you explain consume_less = ‘gpu’

resdntalien · March 28, 2017, 3:02am

I am getting confused. Are we:

Mapping each set of phonemes (that is a word) into a 120 dim embedding of floating point numbers. What is the 70 here? It is the maximum length of a single word (in terms of phonemes?). We are padding words that are smaller than that to the 70?
Then taking a sequence of these phonemes (represented at 120 dim floating point vectors) and using an RNN to create a sequence of word2vec embeddings (which we will then reverse to actual words?)

hamelsmu · March 28, 2017, 3:11am

Does the bi-directional RNN concat 2 RNNs or does it “stack” them on top of eachother?

thunderingtyphoons · March 28, 2017, 3:16am

Why can’t we have returnSequences=True for the second bidirectional LSTM?

Even · March 28, 2017, 3:18am

Are we doing this because it’s not feasible to pass the entire stack of embeddings into a CNN?

I wouldn’t think a 16x120 dimensional input vector would be too large.

I guess this stems from an underlying question I have regarding why we don’t just treat text problems the same way we do images. Images have relationships between pixels and shapes that are complex and rely on positional information. Why doesn’t that work with word or phoneme embeddings?

janardhanp22 · March 28, 2017, 3:20am

Can you pls repeat why do we need bidirectional LSTM?

rachel · March 28, 2017, 3:21am

@janardhanp22 bidirectional is because you want to know the phonemes that came both before and after a given phoneme

hamelsmu · March 28, 2017, 3:22am

When return sequences = True, does the output have to be exactly 1 time step ahead? Can the RNN do n steps ahead instead?

hamelsmu · March 28, 2017, 3:24am

Also what happens exactly when you have two RNNs stacked on top of eachother with return sequences = True? Do you get nested sequences?

harveyslash · March 28, 2017, 3:25am

For translation problems, how does the network know when to stop?
Say my input is
PAD PAD hi how are you
is the output (in another language)
PAD PAD hi how are you PAD

Does the neural network have to learn when to stop?

rachel · March 28, 2017, 3:26am

@harveyslash yes, typically an end-of-sentence token EOS is added in the training set