CART Class Prediction: Two Data Sets
Use the CART module to build classifiers using the Classification And Regression Trees (CART) class prediction method, test previously generated CART classifiers, or classify unknown samples using previously generated CART classifiers.
Before you begin
Use one data set to train the classifier and the other to test it. Each gene expression data set consists of two files:
The data sets must contain the same genes or CART displays an error message.
Step 1: CART
The CART module builds and/or tests a classifer by running the CART class prediction method:
- To build a classifier, specify the training data set. The module
creates a classifier (*.cart.model).
- To test a previously built classifier, specify the classifier (*.cart.model) and
the test data set. The module creates a
prediction results file (*.pred.odf) that assesses the accuracy of the predictor.
- To build and test a classifier, specify both the training and test
data sets. The module creates a classifier and a prediction results file.
Step 2: View results
To view the prediction results file (*.pred.odf), use the PredictionResultsViewer module.
The viewer lists each sample, its actual class, its predicted class, and prediction error rates.
The classifier (*.cart.model) is a binary (machine-readable) file. However, a matching pdf (*.tree.pdf) file shows the classification tree. To view the pdf file, click it.
Considerations
- The PredictionResultsViewer provides an absolute error rate (incorrect cases/total cases) and an
ROC error rate (fraction of true positives
versus the fraction of false positives). Use the ROC error rate for comparing results across data sets.
Step 3: Determine the class of an unknown sample
To classify unknown samples using the CART module:
- Use the saved model filename parameter to specify a previously
generated classifier (*.cart.model file).
- Use the test filename parameter to specify an expression data set
that contains the unknown samples.
- The test class filename is a required parameter that specifies the
class of each sample in the expression data set. For the unknown samples,
create a class file that assigns some class (for example, "unknown") to each
sample.
The module uses the classifier to predict the class
of each unknown sample and creates a prediction results file. Use the PredictionResultsViewer module to view the prediction results (*.pred.odf) file:
- The viewer lists each sample with its actual and predicted class.
- Ignore the actual class, which was unknown.
- Ignore the error rates, which are evaluating the class predictor against "known" data.