Differential Expression Analysis
Find genes that are significantly differentially expressed between classes of samples.
Step 1: PreprocessDataset
Preprocess gene expression data
to remove platform noise and genes that have little variation.
Considerations
- PreprocessDataset can preprocess the data in one or more ways (in this order):
- Set threshold and ceiling values. Any value lower/higher than the threshold/ceiling
value is reset to the threshold/ceiling value.
- Convert each expression value to the log base 2 of the value.
- Remove genes (rows) if a given number of its sample values are less than
a given threshold.
- Remove genes (rows) that do not have a minimum fold change or expression
variation.
- Discretize or normalize the data.
- ComparativeMarkerSelection expects non-log-transformed data. Some calculations, such as Fold Change, will produce incorrect results on log transformed data.
- If you did not generate the expression data,
check whether preprocessing steps have already been taken before
running the PreprocessDataset module.
Step 2: ComparativeMarkerSelection
ComparativeMarkerSelection computes differential gene expression.
For each gene, it uses a test statistic to calculate the
difference in gene expression between classes and then computes a p-value to estimate the
significance of the test statistic score.
Because testing tens of thousands of genes simultaneously
increases the possibility of mistakenly identifying a non-marker gene as a marker gene (a false positive),
ComparativeMarkerSelection corrects for multiple hypothesis testing by computing both false discovery
rates (FDR) and family-wise error rates (FWER).
Considerations
- If the data set includes at least 10 samples per class, use the default value of 1000 permutations
to ensure accurate p-values. If the data set includes fewer than 10 samples in any class, permuting the
samples cannot give an accurate p-value; specify 0 permutations to use asymptotic p-values
instead.
- If the data set includes more than two classes, use the phenotype test parameter
to analyze each class against all others (one-versus-all) or all class pairs
(all pairs).
Step 3: ComparativeMarkerSelectionViewer
Run the ComparativeMarkerSelectionViewer module to view the results.
The viewer displays the test statistic score, its p value, two FDR statistics and three FWER statistics for each gene.
Considerations
- Generally,
researchers identify marker genes based on FDR rather than the more conservative FWER.
- Often, marker genes are identified based on an FDR cutoff value of .05, which
indicates that a gene identified as a marker gene has a 1 in 20 (5%) chance of being a false positive.
Select Edit>Filter Features>Custom Filter to filter results
based on that criteria (or any other).
- Select File>Save Derived Dataset to create a GCT file that contains
a subset of the expression data.
Reference
Gould, J., Getz, G., Monti, S., Reich, M., and Mesirov, J.P. 2006.
Comparative gene marker selection suite. Bioinformatics 22(15):1924-1925.