Skip to main content

Table 1 Common agglomerative algorithms for forming clusters

From: Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis

Average-Linkage [39]

• The distance between 2 clusters is defined as the average distance between all pairs of the 2 clusters’ members

Centroid Method [39]

• Cluster centroids are defined as the mean values of the observation on the variables of the cluster

• The distance between 2 clusters is equal to the distance between the two centroids

Single-Linkage [4042]

• Also known as “nearest-neighbor” method

• Defines similarity between clusters as the shortest distance from any one object in one cluster to any object in the other

Complete-Linkage [43]

• Also known as the “farthest-neighbor” method

• Assumes the distance between 2 clusters is based on the maximum distance between any 2 members in the 2 clusters

Flexible-Beta [44, 45]

• Uses a weighted average distance between pairs of

objects in different clusters to decide how far apart they are

• User sets different levels of beta, and beta values less than zero optimize the dissimilarity between clusters

McQuitty’s Similarity [46]

• Assumes that each entity is a separate cluster

• When two clusters are be joined, the distance of the new cluster to any other cluster is calculated as the average of the distances of the soon to be joined clusters to that other cluster

• Merges together the pair of clusters that have the highest average similarity value

• Continues until a specified number of clusters is found, or until the similarity measure between every pair of clusters is less than a predefined cutoff

Ward’s Method [47]

• The similarity between two clusters is the sum of squares within the clusters summed over all variables

• Tends to join clusters with a small number of observations

• Strongly biased toward producing clusters with the same shape and with roughly the same number of observations