Skip to contents

Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.


  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,



a numeric vector or n x k matrix or data frame. If type = "manual", data must be a matrix containing a vector identifying cluster membership for each observation, to be passed to clusters argument.


a character string giving the type of clustering method to be used. See Details.


the desired number of clusters. Required if type is one of "hclust", "diana", "kmeans", or "pam".


radius of epsilon neighborhood. Required if type = "dbscan".


a logical indicating whether clustering model output should be returned. Defaults to FALSE.


a logical indicating whether the adjusted CPC should be calculated. Defaults to FALSE. Note that both CPC and adjusted CPC are automatically calculated and returned if model = TRUE.


columns of data to be used in CPC calculation. Only used if type = "manual".


column of data indicating cluster membership for each observation. Only used if type = "manual".


arguments passed to other functions.


If model = TRUE, CPC() returns a list with components containing output from the specified clustering function, all sums of squares, the CPC, the adjusted CPC, and associated standard errors. If model = FALSE, CPC() returns a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if adjust = TRUE).


type must take one of six values:
"hclust": agglomerative hierarchical clustering with hclust(),
"diana": divisive hierarchical clustering with diana(),
"kmeans": k-means clustering with kmeans(),
"pam": k-medoids clustering with pam(),
"dbscan": density-based clustering with dbscan(),
"manual": no clustering is necessary, researcher has specified cluster assignments.

For all clustering methods, additional arguments to fine-tune clustering performance, such as the specific algorithm to be used, should be passed to CPC() and will be inherited by the specified clustering function. In particular, if type = "kmeans", using a large number of random starts is recommended. This can be specified with the nstart argument to kmeans(), passed directly to CPC().

If type = "manual", data must contain a vector identifying cluster membership for each observation, and cols and clusters must be defined.


data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
#> [1] 0.8773686
CPC(data, "manual", cols = 1:2, clusters = 3)
#> [1] 0.8773686