KNN Class Prediction: Two Data Sets

protocols

To build and test classifiers using the k-nearest-neighbors (KNN) class prediction method and two gene expression data sets:

Before you begin

Use one data set to train the classifier and the other to test it. Each gene expression data set consists of two files:

learn more:
file formats

Step 1: PreprocessDataset

Preprocess gene expression training data to remove platform noise and genes that have little variation. Note: If preprocessing the data removes relevant biological information, skip this step.

Do not preprocess the gene expression test data. The test data should contain all of the genes present in the training data.

Considerations
learn more:
PreprocessDataset

Step 2: KNNXValidation

KNNXValidation runs KNN class prediction iteratively against a known data set. For each iteration, it leaves one sample out, builds the classifier using the remaining samples, and then tests the classifier on the sample left out. It creates two files:

Choose the best parameter settings for the KNN class prediction method by running KNNXValidation with different parameter values. For example, set the num features parameter to 10, 20 and 30. Choose the parameter values that generate the most accurate classifier.

learn more:
KNNXValidation

Step 3: View results

To view the prediction results file (*.pred.odf), use the PredictionResultsViewer module. The viewer lists each sample with its actual and predicted class. Error rates for class predictions are averaged across all iterations.

To view the features results file (*.feat.odf), use the FeatureSummaryViewer module. The viewer ists each gene used in a class predictor and the number of times it was used in a predictor.

Considerations
learn more:
PredictionResultsViewer
FeatureSummaryViewer

Step 4: KNN

The KNN module builds and/or tests a classifer by running the KNN class prediction method:

Considerations
learn more:
KNN

Step 5: View results

To view the prediction results file (*.pred.odf), use the PredictionResultsViewer module. The viewer lists each sample with its actual and predicted class.

To view the model file (*.knn.model), click it. The model file that contains the classifier (or model) created from the training data set.

Considerations
learn more:
PredictionResultsViewer

Step 6: Determine the class of an unknown sample

To classify unknown samples using the KNN module:

The module uses the classifier to predict the class of each unknown sample and creates a prediction results file. Use the PredictionResultsViewer module to view the prediction results (*.pred.odf) file: