This is a very recent research paper on the subject (released last week 21 Mar 2017) : Massive Exploration of Neural Machine Translation Architectures http://arxiv.org/abs/1703.03906
I have a question regarding mean-shift clustering.
Mean-shift clustering does not require the number of clusters, which
appears to be a significant advantage over k-means. However, it does
require the Gaussian kernel width (or bandwidth) parameter, which
indirectly determines the number of clusters. The question is: is that
really an advantage? First, you do have to provide a parameter, just
like for k-means. Second, this kernel width parameter seems actually
less intuitive than the number of clusters, and thus harder to set. So
is this really an advantage over k-means?
To follow-up on this, as Jeremy pointed out, you can choose bandwidth
automatically by deciding to cover 1/3 of the data in the dataset (or
by some other means). But then, we have to set the coverage parameter
(1/3), which is also somewhat arbitrary, and may again be harder to
choose than number of clusters.
Just to be clear, mean-shift clustering does have a big advantage in
that it does not assume spherical/elliptical clusters. So it seems
like a superior method. I just don’t know if it has an advantage with
regards to selecting number of clusters. One way or another, you end
up having to choose a relatively arbitrary parameter.
I think it’s a reasonable default for the parameter - my guess is that most problems will work well with that choice. Whereas there’s no obvious default for ‘k’ in k-means (although there are various algorithms you can use).
I think that’ll depend on the application area. For example, in medicine, cluster analysis is applied to gene expression data to identify disease types. In that case, the number of clusters is often approximately known from clinical practice (by observing distinct disease progression patterns).
One way to test this question would be to find out if mean-shift clustering with the default coverage of 1/3 would produce the clinically-expected numbers of clusters
Thanks Lin for your work, the transcript files are extremely helpful for searching up contents quickly. That is a big time saver when I want to review a specific topic. I am just wondering, do you have a list of all transcript URLs we have so far? I think it might be a lot more convenient to have them all listed in one place. Thanks!