I have a question regarding mean-shift clustering.
Mean-shift clustering does not require the number of clusters, which
appears to be a significant advantage over k-means. However, it does
require the Gaussian kernel width (or bandwidth) parameter, which
indirectly determines the number of clusters. The question is: is that
really an advantage? First, you do have to provide a parameter, just
like for k-means. Second, this kernel width parameter seems actually
less intuitive than the number of clusters, and thus harder to set. So
is this really an advantage over k-means?
To follow-up on this, as Jeremy pointed out, you can choose bandwidth
automatically by deciding to cover 1/3 of the data in the dataset (or
by some other means). But then, we have to set the coverage parameter
(1/3), which is also somewhat arbitrary, and may again be harder to
choose than number of clusters.
Just to be clear, mean-shift clustering does have a big advantage in
that it does not assume spherical/elliptical clusters. So it seems
like a superior method. I just don't know if it has an advantage with
regards to selecting number of clusters. One way or another, you end
up having to choose a relatively arbitrary parameter.