PNN Class Prediction

Before you begin

Generally, you use one data set to train the classifier and the other to test it. Each gene expression data set consists of two files:

GCT or RES file that contains gene expression data.
Example file (training data): all_aml_train.gct.
Example file (test data): all_aml_test.gct.
CLS file that identifies the class of each sample in the gene expression data.
Example file (training data): all_aml_train.cls.
Example file (test data): all_aml_test.cls.

learn more:
file formats

Step 1: PreprocessDataset

Preprocess gene expression training data to remove platform noise and genes that have little variation. Note: If preprocessing the data removes relevant biological information, skip this step.

Do not preprocess the gene expression test data. The test data should contain all of the genes present in the training data.

Open module in the GenePattern window.
Open module with example training data.

Considerations

PreprocessDataset can preprocess the data in one or more ways (in this order):
1. Set threshold and ceiling values. Any value lower/higer than the threshold/ceiling value is reset to the threshold/ceiling value.
2. Convert each expression value to the log base 2 of the value.
3. Remove genes (rows) if a given number of its sample values are less than a given threshold.
4. Remove genes (rows) that do not have a minimum fold change or expression variation.
5. Discretize or normalize the data.
When using ratios to compare gene expression between samples, convert values to log base 2 of the value to bring up- and down-regulated genes to the same scale. For example, ratios of 2 and .5 indicating two-fold changes for up- and down-regulated expression, respectively, are converted to +1 and -1.
If you did not generate the expression data, check whether preprocessing steps have already been taken before running the PreprocessDataset module.

learn more:
PreprocessDataset

Step 2: PNNXValidationOptimization

PNNXValidationOptimization runs PNN class prediction iteratively against a known data set. For each iteration, it leaves one sample out, builds the classifier using the remaining samples, and then tests the classifier on the sample left out. After testing various parameter settings, PNNXValidationOptimization creates an analysis result file (*.xvopt.odf) that contains the recommended parameter values. The result file is a binary (machine-readable) file that cannot be viewed, but can be used as input to the PNN module.

Considerations

Use the num features parameter to specify a range of values to try. The module automatically selects the best parameter settings. You do not need to run it multiple times.
To experiment with alternative parameter settings, use the PNN module rather than the PNNXValidationOptimization module.

learn more:
PNNXValidationOptimization

Step 3: PNN

The PNN module builds and/or tests a classifer by running the PNN class prediction method:

To build a classifier, specify the training data set. The module creates a classifier (*.model).
To test a previously built classifier, specify the classifier (*.model) and the test data set. The module creates a prediction results file (*.pred.odf) that assesses the accuracy of the predictor.
To build and test a classifier, specify both the training and test data sets. The module creates a classifier and a prediction results file.

learn more:
PNN

Step 4: View results

To view the prediction results file (*.pred.odf), use the PredictionResultsViewer module. For each sample, the viewer lists the actual class, predicted class, and prediction error rates.

The classifier (*.model) is a binary (machine-readable) file. It cannot be viewed, but can be used as input to the PNN module.

Considerations

The PredictionResultsViewer provides an absolute error rate (incorrect cases/total cases) and an ROC error rate (fraction of true positives versus the fraction of false positives). Use the ROC error rate for comparing results across data sets.

learn more:
PredictionResultsViewer

Step 5: Determine the class of an unknown sample

To classify unknown samples using the PNN module:

Use the saved model filename parameter to specify a previously generated classifier (*.model file).
Use the test filename parameter to specify an expression data set that contains the unknown samples.
The test class filename is a required parameter that specifies the class of each sample in the expression data set. For the unknown samples, create a class file that assigns some class (for example, "unknown") to each sample.

The module uses the classifier to predict the class of each unknown sample and creates a prediction results file. Use the PredictionResultsViewer module to view the prediction results (*.pred.odf) file:

The viewer lists each sample with its actual and predicted class.
Ignore the actual class, which was unknown.
Ignore the error rates, which are evaluating the class predictor against "known" data.