KNNC: K-Nearest Neighbors Validation
Parameter Information
This option will validate the training set using leave-one-out cross validation, without classifying the unknowns.
In the following description "vector" refers to a given gene or experiment, depending on what is being classified.
An element of a vector is one of the expression values that consitutes that vector. For a gene vector, its elements would
consist of the expression values for that gene across all experiments, while for an experiment vector, its elements would
consist of all the gene expression values for that experiment.
Classify genes or experiments
Self-explanatory
Correlation filter
The correlation filter is used to filter out those vectors of the set to be classified, that are not significantly
correlated with at least one member of the training set. The significance of correlation is determined by the
p-value, which is calculated by a permutation test in which each vector is permuted a user-specified number of times.
KNN Classification parameters
This is where the user specifies the expected number of classes (which is also the number of classes present
in the training set). The number of neighbors is the number of vectors from the training set that are chosen as
neighbors to a given vector. Euclidean distance is used to determine the neighborhood. Let’s say we want to
classify a gene g. Gene g is assigned to the class that is most frequently represented among its k nearest
neighbors from the training set (where k is specified by the user). In case of a tie, gene g remains unassigned.
Create / import training set
If the user chooses to import a previously created training set, on hitting the “Next” button a file chooser is
displayed from which the training file can be chosen. If an appropriate file is chosen, the KNN classification editor
displayed with the class assignments from the file. If the option to create a new training set from data is chosen,
on hitting the “Next” button the classification editor is directly displayed with all vectors set to neutral.
Hierarchical Clustering
This checkbox selects whether to perform hierarchical clustering on the elements in each cluster created.