Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.
Usage
CPC(
data,
type,
k = NULL,
epsilon = NULL,
model = FALSE,
adjust = FALSE,
cols = NULL,
clusters = NULL,
...
)
Arguments
- data
a numeric vector or
n x k
matrix or data frame. Iftype = "manual"
,data
must be a matrix containing a vector identifying cluster membership for each observation, to be passed toclusters
argument.- type
a character string giving the type of clustering method to be used. See Details.
- k
the desired number of clusters. Required if
type
is one of"hclust"
,"diana"
,"kmeans"
, or"pam"
.- epsilon
radius of epsilon neighborhood. Required if
type = "dbscan"
.- model
a logical indicating whether clustering model output should be returned. Defaults to
FALSE
.- adjust
a logical indicating whether the adjusted CPC should be calculated. Defaults to
FALSE
. Note that both CPC and adjusted CPC are automatically calculated and returned ifmodel = TRUE
.- cols
columns of
data
to be used in CPC calculation. Only used iftype = "manual"
.- clusters
column of
data
indicating cluster membership for each observation. Only used iftype = "manual"
.- ...
arguments passed to other functions.
Value
If model = TRUE
, CPC()
returns a list with components
containing output from the specified clustering function, all sums of squares, the
CPC, the adjusted CPC, and associated standard errors. If model = FALSE
, CPC()
returns
a numeric vector of length 1 giving the CPC (if adjust = FALSE
) or adjusted CPC (if
adjust = TRUE
).
Details
type
must take one of six values: "hclust"
: agglomerative hierarchical clustering with hclust()
, "diana"
: divisive hierarchical clustering with diana()
, "kmeans"
: k-means clustering with kmeans()
, "pam"
: k-medoids clustering with pam()
, "dbscan"
: density-based clustering with dbscan()
, "manual"
: no clustering is necessary, researcher has specified cluster assignments.
For all clustering methods, additional arguments to fine-tune clustering
performance, such as the specific algorithm to be used, should be passed to
CPC()
and will be inherited by the specified clustering function. In
particular, if type = "kmeans"
, using a large number of random starts is
recommended. This can be specified with the nstart
argument to
kmeans()
, passed directly to CPC()
.
If type = "manual"
, data
must contain a vector identifying cluster
membership for each observation, and cols
and clusters
must be
defined.