Centroid Variance or Entropy Cluster Selection


The cluster centroid is the mean expression vector of a cluster. Often the centroid is used to characterize changes in gene expression for a set of elements in a cluster. Centroid Entropy or Variance Ranking cluster selection is an algorithm used to rank a set of clusters based on Centroid variability and then select candidate clusters meeting supplied criteria. This process will tend to find clusters that meet a minimum size and have relatively higher centroid variability over the expression measurements. The selected clusters will tend to have centroids that vary greatly over the measurements. Be certain to understand how using Centroid Variance versus using Centroid Entropy will affect the outcome.

Parameters

Desired Number of Clusters

This parameter indicates the number of clusters that should be selected from the input set. If during execution it turns out that the number of input clusters is smaller than the number of clusters desired, then all input clusters are returned as the result.

Minimum Cluster Population (# of elements)

The minimum cluster size describes the minimum number of genes or experiments that should be in the cluster. In some cases clusters may have low variability but are only composed of a couple of elements.

Rank Clusters on Centroid Variance

This measure takes the sum of squared errors for the centroid vector where the elements are the centroid values and each is compared to the centroid's mean value.

Rank Clusters on Centroid Entropy

Centroid entropy is a measure of the dispersion of centroid values over the observed range of centroid values. In this case, the high entropy centroids would have values that are widely dispersed between the extremes of the centroid. Note that the range of centroid values could be rather narrow for a centroid of high entropy where the values are evenly dispersed. This measure therefore may select centroids that don't have large expression variability in terms of range of values and hence appear rather flat across measurements.