SNP Copy Number and Loss of Heterozygosity Estimation
Compute SNP copy number and loss of heterozygosity (LOH) based on
Affymetrix SNP chip data for paired target/normal samples.
In cancer genomics, copy number change is one of the hallmarks of the genetic instability common
to most human cancers and LOH of tumor suppressor genes is a crucial step in the development of
sporadic and hereditary cancer (Monti, 2005).
Before You Begin
- CEL files from the Affymetrix 500K Array Chip Set (250K Sty, 250K NSP) or
100K Array Chip Set (50K Xba, 50K Hind).
Example file: GISTIC_Hind_subset.zip.
- Optionally, for each CEL file, a TXT file containing the genotype calls for the SNP array.
Example file: GISTIC_Hind_subset.zip.
- A tab-delmited text file (sample information file format) that describes the SNP array. The array must include target/normal paired samples for copy number and LOH determination.
Example file: sample_info_subset.txt.
Step 1: SNPFileCreator
SNPFileCreator converts the CEL files from an array into a GenePattern .snp file.
Raw data for the probes in each SNP probe set are converted to a
single intensity value per SNP using one of four modeling algorithms:
Average Difference, PM/MM Difference Model (dChip, the default), Median Probe, or Trimmed Mean.
20-30 minutes: Processing this example on the GenePattern public server takes time.
The example source data and resulting SNP file are provided here for your convenience:
GISTIC_Hind_subset.zip, GISTIC_Hind_subset.snp.
Considerations
- SNPFileCreator accepts CEL files from the 500K Array Chip Set (250K Sty, 250K NSP) or
100K Array Chip Set (50K Xba, 50K Hind). Each chip set uses two unique high density arrays
to genotype over 500,000 and 100,000 SNPs in one experiment, respectively.
The module converts the CEL files for one array into a .snp file.
To create a .snp file for a chip set,
use the MergeRows module to combine the .snp files for the two arrays.
- SNPFileCreator can transfer the CEL files to the GenePattern server for processing or
read the files from a network directory. Due to the size of the files, best practice is to
store the CEL files in a network directory and process them from that directory.
- SNPFileCreator writes the generated .snp file to a network directory or to the
GenePattern server. Typically, writing the file to the GenePattern server provides
greater flexibility and makes the file available for use in GenePattern pipelines.
- SNPFileCreator creates a .snp file in one of two formats:
Non Allele-Specific (default) or Allele-Specific.
For each sample, the Non Allele-Specific format contains an intensity value
and a genotype call; the Allele-Specific format contains an
intensity value for allele A, intensity value for allele B, and genotype call.
All GenePattern modules accept the Non Allele-Specific format; many do not yet accept the Allele-Specific format.
- SNPFileCreator uses the Human Genome of May 2004 (hg17) to include
Chromosome and Physical Location columns in the .snp file. By default, it sorts the SNPs
by chromosome and physical location, as required by the IGV module.
Step 2: XChromosomeCorrect
For gender-specific samples, run the XChromosomeCorrect module to correct intensity values for SNPs on the X chromosome.
For each sample from a male donor, the module doubles the intensity value for SNPs on the X chromosome.
The sample information file must
include a column labeled Gender that contains a value of M or F for each sample.
Step 3: CopyNumberDivideByNormals
CopyNumberDivideByNormals computes the raw copy number of each target SNP by dividing its intensity value
by the mean intensity value of all normal SNPs. This calculation is referred to as
copy number normalization or normalization with respect to normals.
CopyNumberDivideByNormals creates one of two files:
- .cn (default) does not include genotype calls.
- .xcn includes genetype calls.
Step 4: IGV
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
By default, IGV displays all chromosomes. To zoom in on a chromosome, select it from the chromosome tool bar.
Note for Mac Users: The IGV Launcher from GenePattern uses Java Web Start.
This is not recommended for Mac OS X Mountain Lion or higher.
We recommend that you download the Mac App from the
IGV download page.
You can load GenePattern files into IGV with the
Load from URL feature.
Reference
Monti, S. 2005. Class slides: SNP microarrays and high-density genotyping.
http://www.chip.org/teaching/hst950/slides/class6.pdf.