Lesson 12 in class

Looking forward to your questions!

Can you explain what happens if you don’t create InteractiveSession() ? What is interactiveSession() doing again?

1 Like

How could we extrapolate this to other non-numeric data types such as words, images etc?


Clustering performance evaluation

Particularly the silhouette coefficient (intracluster distance vs intercluster distance).



If you can create a feature vector from it, you can cluster it.

e.g. Image --> VGG output vector

Word --> embedding vector

Sentence --> LSTM output

Because we evolve the centroids, and then pull points in that are closest to it, then re-assign the centroid point to be the center (average) of those points, we wind up with points assigned to centroids that they are not closest to (you can see it in the picture).

Ideas for maybe fixing this? Adding in some negative effect for how close the centroid is to things not in the group?

Not sure who that was talking about sarcoidosis and Kaiser, but feel free to message me. :slight_smile: :skull_crossbones: /radiologist

Why is it important to sort the phonemes ?

Is it a best practice to do dropout across time/sequence ? (dropout_W)

can you explain consume_less = ‘gpu’

I am getting confused. Are we:

  1. Mapping each set of phonemes (that is a word) into a 120 dim embedding of floating point numbers. What is the 70 here? It is the maximum length of a single word (in terms of phonemes?). We are padding words that are smaller than that to the 70?
  2. Then taking a sequence of these phonemes (represented at 120 dim floating point vectors) and using an RNN to create a sequence of word2vec embeddings (which we will then reverse to actual words?)

Does the bi-directional RNN concat 2 RNNs or does it “stack” them on top of eachother?

Why can’t we have returnSequences=True for the second bidirectional LSTM?

Are we doing this because it’s not feasible to pass the entire stack of embeddings into a CNN?

I wouldn’t think a 16x120 dimensional input vector would be too large.

I guess this stems from an underlying question I have regarding why we don’t just treat text problems the same way we do images. Images have relationships between pixels and shapes that are complex and rely on positional information. Why doesn’t that work with word or phoneme embeddings?

Can you pls repeat why do we need bidirectional LSTM?

@janardhanp22 bidirectional is because you want to know the phonemes that came both before and after a given phoneme

1 Like

When return sequences = True, does the output have to be exactly 1 time step ahead? Can the RNN do n steps ahead instead?

Also what happens exactly when you have two RNNs stacked on top of eachother with return sequences = True? Do you get nested sequences?

For translation problems, how does the network know when to stop?
Say my input is
PAD PAD hi how are you
is the output (in another language)
PAD PAD hi how are you PAD

Does the neural network have to learn when to stop?

@harveyslash yes, typically an end-of-sentence token EOS is added in the training set