16.7 Issues in Clustering

  • Determining the number of clusters to retain
  • Cross-validation of clusters and cluster sizes
  • All or none decision process (Either in or out of a cluster)
  • What to do with observations that really don’t belong in any cluster
  • Consequences of choices among linkage, dissimilarity measure, cutting dendrogram

16.7.1 Recommendations

Perform clustering with different choices of parameters, and look at the full set of results in order to see what patterns consistently emerge

Since clustering can be non-robust, recommend to cluster subsets of the data and evaluate robustness of the clusters obtained

Most importantly, must be careful about how the results of a clustering analysis are reported.

Results should not be taken as the absolute truth about a data set.

Instead, results often constitute a starting point for the development of a scientific hypothesis and further study, preferably on an independent data set