Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.
Usage
CPC(
data,
type,
k = NULL,
epsilon = NULL,
model = FALSE,
adjust = FALSE,
cols = NULL,
clusters = NULL,
...
)Arguments
- data
a numeric vector or
n x kmatrix or data frame. Iftype = "manual",datamust be a matrix containing a vector identifying cluster membership for each observation, to be passed toclustersargument.- type
a character string giving the type of clustering method to be used. See Details.
- k
the desired number of clusters. Required if
typeis one of"hclust","diana","kmeans", or"pam".- epsilon
radius of epsilon neighborhood. Required if
type = "dbscan".- model
a logical indicating whether clustering model output should be returned. Defaults to
FALSE.- adjust
a logical indicating whether the adjusted CPC should be calculated. Defaults to
FALSE. Note that both CPC and adjusted CPC are automatically calculated and returned ifmodel = TRUE.- cols
columns of
datato be used in CPC calculation. Only used iftype = "manual".- clusters
column of
dataindicating cluster membership for each observation. Only used iftype = "manual".- ...
arguments passed to other functions.
Value
If model = TRUE, CPC() returns a list with components
containing output from the specified clustering function, all sums of squares, the
CPC, the adjusted CPC, and associated standard errors. If model = FALSE, CPC() returns
a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if
adjust = TRUE).
Details
type must take one of six values: "hclust": agglomerative hierarchical clustering with hclust(), "diana": divisive hierarchical clustering with diana(), "kmeans": k-means clustering with kmeans(), "pam": k-medoids clustering with pam(), "dbscan": density-based clustering with dbscan(), "manual": no clustering is necessary, researcher has specified cluster assignments.
For all clustering methods, additional arguments to fine-tune clustering
performance, such as the specific algorithm to be used, should be passed to
CPC() and will be inherited by the specified clustering function. In
particular, if type = "kmeans", using a large number of random starts is
recommended. This can be specified with the nstart argument to
kmeans(), passed directly to CPC().
If type = "manual", data must contain a vector identifying cluster
membership for each observation, and cols and clusters must be
defined.
