Hierarchical Clustering
Cluster genes and/or samples based on
how close they are to one another. The result is a tree structure, referred to as dendrogram.
Step 1: PreprocessDataset
Preprocess gene expression data
to remove platform noise and genes that have little variation.
Although researchers generally preprocess data before clustering if doing so removes relevant biological information, skip this step.
Considerations
- PreprocessDataset can preprocess the data in one or more ways (in this order):
- Set threshold and ceiling values. Any value lower/higer than the threshold/ceiling
value is reset to the threshold/ceiling value.
- Convert each expression value to the log base 2 of the value.
- Remove genes (rows) if a given number of its sample values are less than
a given threshold.
- Remove genes (rows) that do not have a minimum fold change or expression
variation.
- Discretize or normalize the data.
- When using ratios to compare gene expression between samples,
convert values to log base 2 of the value to
bring up- and down-regulated genes to the same scale.
For example, ratios of 2 and .5 indicating two-fold changes for up- and
down-regulated expression, respectively, are converted to +1 and -1.
- If you did not generate the expression data,
check whether preprocessing steps have already been taken before
running the PreprocessDataset module.
Step 2: HierarchicalClustering
Run hierarchical clustering on genes and/or samples to create
dendrograms for the clustered genes (*.gtr) and/or
clustered samples (*.atr), as well as a file (*.cdt) that contains
the original gene expression data ordered to reflect the clustering.
Considerations
- Best practice is to normalize (row/column normalize parameters) and
center (row/column center parameters) the data being clustered.
- The CDT output file must be converted to a GCT file before it
can be used as an input file for another GenePattern module (other than
HierachicalClusteringViewer). For instructions on converting
a CDT file to a GCT file, see Creating Input Files.
Step 3: HierarchicalClusteringViewer
Display a heat map of the clustered gene expression data, with
dendrograms showing how the genes and/or samples were clustered.
Considerations
- Select File>Save Image to save the heat map and dendrograms to an
image file. Supported formats include bmp, eps, jpeg, png, and tiff.