STAR always first maps the reads to the genome. If splice junction annotation is available it is only later used to decide which potential splice junctions to accept, the criteria being more lax for annotated junctions.
This feature is disabled if STAR.aligner is run in two-pass mode since the run needs a personal version of the index to merge the extra annotation into. It is also disabled for generating sorted BAM output (and for generating "wiggle" output) since the sorting needs an unpredictable amount of extra memory.
Name | Description | Allowed values | Default |
---|---|---|---|
input | |||
star index | A STAR index. Select a prebuilt index or upload your own as a ZIP file (you can make one using STAR.indexer). | selection from dynamic list or valid file | |
reads pair 1 | Unpaired reads or first mate for paired reads, as files in fastA or fastQ format. You can provide several files. | valid file(s) (is required) |
|
reads pair 2 | Second mate for paired reads. The files and the sequences inside the files must be in the same order as for the first mates. |
valid file(s) | |
mapping and reporting of mapped reads | |||
max reads to align | Set this if you want to map only a selected set of reads at the top of the input (mainly useful for testing). | min = 1 | |
align read end to end | Align reads end to end and count all mistmatches. The default is to use instead "soft clipping" at both 5'- and 3'-end, that means to make a local alignment and ignore the contribution of the ends of the read to the final score if this improves the score. This can be useful for ChIP-seq and other DNA sequencing applications with reads that have already been quality-trimmed. | no | |
max number mismatches | Maximum number of mismatches per read. Note that a mate pair is counted as one read. | min = 0 | 10 |
max fraction mistaches | Maximum number of mismatches per read, expressed as the proportion between the number of mismatches and the mapped length of the read. Note that a mate pair is counted as one read. | min = 0 max = 1 |
0.3 |
min overhang annotated read | Minimum length that read must map at both sides of splice junction in order to accept mapping to a splice junction that is annotated in GTF of tab file. | min = 1 | 3 |
min overhang not annotated read | Minimum length that read must map at both sides of splice junction in order to accept de novo discovery of splice junction. | min = 1 | 5 |
min intron length | Minimum size of intron. A gap in the alignment between a read and the genome that is smaller is considered a deletion, not an intron. | min = 1 | 21 |
max intron length | Maximum size of intron. If a read aligns to the genome with a gap larger than this it is considered a chimeric read. The default value of 500,000 is fine-tuned to mammalian genomes, for plant and yeast genomes you will have to decrease it. | min = 1 | 500000 |
mates max gap | Maximum distance between mate pair reads. If reads map to the genome farther apart the fragment is considered to be chimeric. The default value of 500,000 is fine-tuned to mammalian genomes, for plant and yeast genomes you will have to decrease it. | min = 1 | 500000 |
secondary mapping mismatches range | By default STAR only reports reads that map to multiple locations on the genome when they map with the highest possible score. You can ask STAR to report also secondary mappings with up to this much more mismatches than the primary mappings. | min = 0 | 0 |
max multimapping | Do not report reads that map to more than that many different locations on the genome. | min = 1 | 10 |
min report canonical junction overhang | Criterium for reporting de novo predicted splice junctions in the SJ.tab.out file : for canonical splice sites at least one read with at least this overhang is needed. | min = 1 | 12 |
min report noncanonical junction overhang | Criterium for reporting de novo predicted splice junctions in the SJ.tab.out file : for noncanonical splice sites at least 3 reads with at least this overhang are needed. | min = 1 | 30 |
map only reported juntions | By default all mapped reads are output in the SAM/BAM file. Set this to yes if you want to bring the SAM/BAM file in agreement with the SJ.tab.out file, by outputting only reads that are not spliced or map to splice junctions that are annotated or have been de novo predicted. |
|
no |
postprocessing and supplementary output | |||
tow pass | Run STAR in 2-pass mode, that is, run STAR a first time, merge the found splice junctions with the splice junction annotation in the index, and run STAR a second time. Consult the Description section for more explanation. |
|
no |
detect chimeric transcripts |
|
no | |
output unmapped reads |
|
no | |
quantify genes | Write a table with number of reads mapped per gene and a supplementary BAM file with mappings to transcriptome instead of to genome coordinates. |
|
no |
output wiggle file | utput a "wiggle" file for viewing in viewers like IGV or UCSC genome browser. |
|
none |
wiggle signal | From which bases to generate the signal for the "wiggle" file. By default they are all used, but you can choose to use only the bases at the 5'-end of the 1st read (useful for CAGE/RAMPAGE) or only from the 2nd mate from read pairs. |
|
all |
output | |||
output format |
|
SAM unsorted | |
HI flag | Number of "hits" with higest score to label as "primary" in the SAM/BAM file. If you choose "only one" the others are labeled as "secondary" (all reads that map with lower than the highest score are labeled "secondary"). |
|
only one |
output prefix | The prefix to use for the output file names. | STAR |
The reads can be provided in fastQ format or in fastA format. The reads can be spread over several files.
It is possible to provide several files with reads. For mate pair experiments it is necessary to provide two sets of input files and it is important that the files as well as the reads inside each file are in the right order so that STAR can find the corresponding partners of each mate pair.
The genome must be provided as an index, there is no need for the original sequences and the original annotation. STAR.aligner has access to a series of prebuilt indexes. Alternatively, the user can provide an index of his own. The index must be in a ZIP file. GenePattern has a tool STAR.indexer to make the index from a series of fastA files.
By default STAR.aligner does not search for fusion reads, that are reads that map to regions of the genome that are located on different chromosomes, on different strands of the same chromosome or on widely distant regions of the same chromosome, so that they likely derive from sequencing chimeric transcripts. If you request it, STAR.aligner will output a separate file in SAM format <basename>.Chimeric.out.sam.
You can request STAR.aligner to write files <basename>.Unmapped.out.mate1 (and eventually <basename>.Unmapped.out.mate2) with the reads that could not be mapped, in the same fastA or fastQ format as the input.
When you request STAR.aligner to quantify genes it will write 2 supplementary files :
You can request the output of "wiggle" files , which are useful for vizualization of the RNA-seq signal on genomic browsers as the UCSC genomic browser or IGV. The signal represents the number of reads crossing each genomic base. There are separate files for the two strands and there are separate files for uniquely and for multimappnig reads ; in the latter case the contribution of the multimappers will be divided by the number of loci they map to. You can choose between output in BedGraph or in WIG format. STAR.aligner will write output files with respectively names <basename>.Signal.Unique.str1.out.bg, <basename>.Signal.Unique.str2.out.bg, <basename>.Signal.UniqueMultiple.str1.out.bg, <basename>.Signal.UniqueMultiple.str2.out.bg or <basename>.Signal.Unique.str1.out.wig, <basename>.Signal.Unique.str2.out.wig, <basename>.Signal.UniqueMultiple.str1.out.wig, <basename>.Signal.UniqueMultiple.str2.out.wig. Since the generation of the "wiggle" files demands reads sorted by coordinate, asking for "wiggle" output sets STAR.aligner automatically into making sorted BAM output.
The STAR software is developed by a team of programmers headed by Alexander Dobin at Cold Spring Harbor Laboratory.
Version | Release date | Description |
---|---|---|
1 | 2016-08-29 | for STAR 2.5.2a |