Cluster-Polarization Coefficient

Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.

Usage

CPC(
  data,
  type,
  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,
  ...
)

Arguments

data: a numeric vector or n x k matrix or data frame. If type = "manual", data must be a matrix containing a vector identifying cluster membership for each observation, to be passed to clusters argument.
type: a character string giving the type of clustering method to be used. See Details.
k: the desired number of clusters. Required if type is one of "hclust", "diana", "kmeans", or "pam".
epsilon: radius of epsilon neighborhood. Required if type = "dbscan".
model: a logical indicating whether clustering model output should be returned. Defaults to FALSE.
adjust: a logical indicating whether the adjusted CPC should be calculated. Defaults to FALSE. Note that both CPC and adjusted CPC are automatically calculated and returned if model = TRUE.
cols: columns of data to be used in CPC calculation. Only used if type = "manual".
clusters: column of data indicating cluster membership for each observation. Only used if type = "manual".
...: arguments passed to other functions.

Value

If model = TRUE, CPC() returns a list with components containing output from the specified clustering function, all sums of squares, the CPC, the adjusted CPC, and associated standard errors. If model = FALSE, CPC() returns a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if adjust = TRUE).

Details

type must take one of six values:
"hclust": agglomerative hierarchical clustering with hclust(),
"diana": divisive hierarchical clustering with diana(),
"kmeans": k-means clustering with kmeans(),
"pam": k-medoids clustering with pam(),
"dbscan": density-based clustering with dbscan(),
"manual": no clustering is necessary, researcher has specified cluster assignments.

For all clustering methods, additional arguments to fine-tune clustering performance, such as the specific algorithm to be used, should be passed to CPC() and will be inherited by the specified clustering function. In particular, if type = "kmeans", using a large number of random starts is recommended. This can be specified with the nstart argument to kmeans(), passed directly to CPC().

If type = "manual", data must contain a vector identifying cluster membership for each observation, and cols and clusters must be defined.

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
#> [1] 0.8773686
CPC(data, "manual", cols = 1:2, clusters = 3)
#> [1] 0.8773686