protocols |
Non-negative matrix factorization (NMF) finds a small number of metagenes, each defined as a positive linear combination of the genes in the expression data. It then groups samples into clusters based on the gene expression pattern of these metagenes.
Gene expression data must be in a GCT or RES file.
Example file: all_aml_test.gct.
The gene expression data must contain only positive values. If your data contains negative values, see the NMFConsensus documentation for instructions.
learn more: file formats |
Preprocess gene expression data to remove platform noise and genes that have little variation. Although researchers generally preprocess data before clustering if doing so removes relevant biological information, skip this step.
learn more: PreprocessDataset |
NMFConsensus uses the basic principle of dimensionality reduction via non-negative matrix factorization (NMF) to find a small number of metagenes, each defined as a positive linear combination of the genes in the expression data. It then groups samples into clusters based on the gene expression pattern of the samples as positive linear combinations of these metagenes. NMFConsensus repeatedly runs the clustering algorithm against perturbations of the gene expression data and creates a consensus matrix to assesses the stability of the resulting clusters.
3-4 hours: Running this example on the GenePattern public server takes several hours. The results are provided here for your convenience: NMFConsensus_Results.zip.
To do this in MATLAB, execute the following statement:
anew=[max(a,0);-min(a,0)];
where a is the original data.
learn more: NMFConsensus |
Plots of the results are written to .pdf files. Cluster membership results are written to GCT files. View the result files by clicking on them.
learn more: NMFConsensus |
Brunet, J-P., Tamayo, P., Golub, T.R., and Mesirov, J.P. 2004. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 101(12):4164�4169.