Looking forward to your questions!
Can you explain what happens if you donât create InteractiveSession() ? What is interactiveSession() doing again?
How could we extrapolate this to other non-numeric data types such as words, images etc?
http://scikit-learn.org/stable/modules/clustering.html
Clustering performance evaluation
Particularly the silhouette coefficient (intracluster distance vs intercluster distance).
If you can create a feature vector from it, you can cluster it.
e.g. Image --> VGG output vector
Word --> embedding vector
Sentence --> LSTM output
Because we evolve the centroids, and then pull points in that are closest to it, then re-assign the centroid point to be the center (average) of those points, we wind up with points assigned to centroids that they are not closest to (you can see it in the picture).
Ideas for maybe fixing this? Adding in some negative effect for how close the centroid is to things not in the group?
Not sure who that was talking about sarcoidosis and Kaiser, but feel free to message me. /radiologist
Why is it important to sort the phonemes ?
Is it a best practice to do dropout across time/sequence ? (dropout_W)
can you explain consume_less = âgpuâ
I am getting confused. Are we:
- Mapping each set of phonemes (that is a word) into a 120 dim embedding of floating point numbers. What is the 70 here? It is the maximum length of a single word (in terms of phonemes?). We are padding words that are smaller than that to the 70?
- Then taking a sequence of these phonemes (represented at 120 dim floating point vectors) and using an RNN to create a sequence of word2vec embeddings (which we will then reverse to actual words?)
Does the bi-directional RNN concat 2 RNNs or does it âstackâ them on top of eachother?
Why canât we have returnSequences=True for the second bidirectional LSTM?
Are we doing this because itâs not feasible to pass the entire stack of embeddings into a CNN?
I wouldnât think a 16x120 dimensional input vector would be too large.
I guess this stems from an underlying question I have regarding why we donât just treat text problems the same way we do images. Images have relationships between pixels and shapes that are complex and rely on positional information. Why doesnât that work with word or phoneme embeddings?
Can you pls repeat why do we need bidirectional LSTM?
@janardhanp22 bidirectional is because you want to know the phonemes that came both before and after a given phoneme
When return sequences = True, does the output have to be exactly 1 time step ahead? Can the RNN do n steps ahead instead?
Also what happens exactly when you have two RNNs stacked on top of eachother with return sequences = True? Do you get nested sequences?
For translation problems, how does the network know when to stop?
Say my input is
PAD PAD hi how are you
is the output (in another language)
PAD PAD hi how are you PAD
Does the neural network have to learn when to stop?
@harveyslash yes, typically an end-of-sentence token EOS is added in the training set