K-means Clustering

protocols

Cluster genes and/or samples into a specified number of clusters. The result is k clusters, each centered around a randomly selected data point.

Before you begin

Gene expression data must be in a GCT or RES file.
Example file: all_aml_test.gct.

learn more:
file formats

Step 1: PreprocessDataset

Preprocess gene expression data to remove platform noise and genes that have little variation. Although researchers generally preprocess data before clustering if doing so removes relevant biological information, skip this step.

Considerations
learn more:
PreprocessDataset

Step 2: KMeansClustering

Run k-means clustering on genes (rows) or samples (columns). The module creates a GCT file for each cluster and a GCT file that organizes all of the expression data by cluster.

learn more:
KMeansClustering

Step 3: HeatMapViewer

For an overview of the results, use a heatmap to display the expression data organized by cluster.

Considerations
learn more:
HeatMapViewer