-
Notifications
You must be signed in to change notification settings - Fork 514
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
N characters in --soloAdapterSequence are not counted as mismatches, …
…allowing for multiple adapters (e.g. ddSeq). SJ.out.tab is sym-linked as features.tsv for Solo SJ output. Issue #882: 3rd field is now optional in Solo Gene features.tsv with --soloOutFormatFeaturesGeneField3. Issue #936: Throw an error if an empty whitelist is provided to STARsolo.
- Loading branch information
Showing
24 changed files
with
3,519 additions
and
3,420 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,7 +34,7 @@ | |
|
||
\newcommand{\sechyperref}[1]{\hyperref[#1]{Section \ref{#1}. \nameref{#1}}} | ||
|
||
\title{STAR manual 2.7.4a} | ||
\title{STAR manual 2.7.5a} | ||
\author{Alexander Dobin\\ | ||
[email protected]} | ||
\maketitle | ||
|
@@ -177,16 +177,42 @@ \subsection{Basic options.} | |
For bzip2-compressed files, use | ||
\code{\opt{readFilesCommand} \optv{bunzip2 -c}}. | ||
|
||
Multiple samples can be mapped in one job. For single-end reads use a comma separated list (no spaces around commas), e.g. | ||
\opt{readFilesIn} \optv{sample1.fq,sample2.fq,sample3.fq}. For paired-end reads, use comma separated list for read1 /space/ comma separated list for read2, e.g.: \opt{readFilesIn} \optv{sample1read1.fq,sample2read1.fq,sample3read1.fq sample1read2.fq,sample2read2.fq,sample3read2.fq}. | ||
|
||
\end{itemize} | ||
|
||
\subsection{Advanced options.} | ||
There are many advanced options that control STAR mapping behavior. All options are briefly described in the Section \sechyperref{Description_of_all_options}. | ||
|
||
\subsubsection{Mapping multiple files in one run.} | ||
Multiple samples can be mapped in one run with a single output. This is equivalent to concatenating the read files before mapping, except that distinct read groups can be used in \opt{outSAMattrRGline} command to keep track of reads from different files. For single-end reads use a comma separated list (no spaces around commas), e.g.: | ||
|
||
\opt{readFilesIn} \optv{sample1.fq,sample2.fq,sample3.fq} | ||
|
||
For paired-end reads, use comma separated list | ||
for read1, followed by space, followed by comma separated list for read2, e.g.: | ||
|
||
\opt{readFilesIn}~\optv{s1read1.fq,s2read1.fq,s3read1.fq s1read2.fq,s2read2.fq,s3read2.fq} | ||
|
||
For multiple read files, the corresponding read groups can be supplied with space/comma/space-separated list in \opt{outSAMattrRGline}, e.g. | ||
|
||
\opt{outSAMattrRGline} \optv{ID:sample1 , ID:sample2 , ID:sample3} | ||
|
||
Note that this list is separated by commas surrounded by spaces (unlike \opt{readFilesIn} list). | ||
|
||
Another option for mapping multiple reads files, especially convenient for a very large number of files, is to create a file manifest and supply it in \opt{readFilesManifest} \optv{/path/to/manifest.tsv}. | ||
The manifest file should contain 3 tab-separated columns, paired-end reads: | ||
|
||
\ofilen{read1-file-name $tab$ read2-file-name $tab$ read-group-line} | ||
|
||
For single-end reads, the 2nd column should contain the dash -: | ||
|
||
\ofilen{read1-file-name $tab$ - $tab$ read-group-line} | ||
|
||
Spaces, but not tabs are allowed in the file names. | ||
If read-group-line does not start with ID:, it can only contain one ID field, and ID: will be added to it. | ||
If read-group-line starts with ID:, it can contain several fields separated by $tab$, and all the fields will be copied verbatim into SAM @RG header line. | ||
|
||
\subsubsection{Using annotations at the mapping stage.} | ||
Since 2.4.1a, the annotations can be included on the fly at the mapping step, without including them at the genome generation step. You can specify \opt{sjdbGTFfile} \optvr{/path/to/ann.gtf} and/or \opt{sjdbFileChrStartEnd} \optvr{/path/to/sj.tab}, as well as \opt{sjdbOverhang}, and any other \opt{sjdb*} options. The genome indices can be generated with or without another set of annotations/junctions. In the latter case the new junctions will added to the old ones. STAR will insert the junctions into genome indices on the fly before mapping, which takes 1~2 minutes. The on the fly genome indices can be saved (for reuse) with \opt{sjdbInsertSave} \optv{All}, into \optvr{\_STARgenome} directory inside the current run directory. | ||
Since 2.4.1a, the annotations can be included on the fly at the mapping step, without including them at the genome generation step. You can specify \opt{sjdbGTFfile} \optvr{/path/to/ann.gtf} and/or \opt{sjdbFileChrStartEnd} \optvr{/path/to/sj.tab}, as well as \opt{sjdbOverhang}, and any other \opt{sjdb*} options. The genome indices can be generated with or without another set of annotations/junctions. In the latter case the new junctions will added to the old ones. STAR will insert the junctions into genome indices on the fly before mapping, which takes 1~2 minutes. The on the fly genome indices can be saved (for reuse) with \opt{sjdbInsertSave} \optv{All}, into \optvr{\_STARgenome} directory inside the current run directory. | ||
|
||
\subsubsection{ENCODE options} | ||
An example of ENCODE standard options for long RNA-seq pipeline is given below: | ||
|
@@ -285,7 +311,7 @@ \subsubsection{SAM attributes.} | |
\item[] | ||
\optv{vG} : genomic coordiante of the variant overlapped by the read | ||
\item[] | ||
\optv{vW} : 0/1 - alignment does not pass / passes WASP filtering. Requires --waspOutputMode SAMtag | ||
\optv{vW} : WASP filtering tag, see detailed description in Section \ref{section:WASP}==. Requires --waspOutputMode SAMtag | ||
\item[] | ||
\optv{CR CY UR UY} : STARsolo: sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing, not error corrected | ||
\item[] | ||
|
@@ -474,7 +500,7 @@ \section{Counting number of reads per gene.} | |
|
||
\section{2-pass mapping.} | ||
|
||
For the most sensitive novel junction discovery,I would recommend running STAR in the 2-pass mode. It does not increase the number of detected novel junctions, but allows to detect more splices reads mapping to novel junctions. The basic idea is to run 1st pass of STAR mapping with the usual parameters, then collect the junctions detected in the first pass, and use them as "annotated" junctions for the 2nd pass mapping. | ||
For the most sensitive novel junction discovery, it is recommended to run STAR in the 2-pass mode. It does not significantly increase the number of detected novel junctions, but allows to detect more splices reads mapping to novel junctions. The basic idea is to run 1st pass of STAR mapping with the usual parameters, then collect the junctions detected in the first pass, and use them as "annotated" junctions for the 2nd pass mapping. | ||
|
||
\subsection{Multi-sample 2-pass mapping.} | ||
For a study with multiple samples, it is recommended to collect 1st pass junctions from all samples. | ||
|
@@ -518,7 +544,7 @@ \section{Detection of personal variants overlapping alignments.} | |
SAM attribute vG outputs the genomic coordinate of the variant, allowing for identification of the variant. | ||
SAM attribute vA outputs which allele is detected in the read: $1$ or $2$ match one of the genotype alleles, $3$ - no match to genotype. | ||
|
||
\section{WASP filtering of allele specific alignments.} | ||
\section{WASP filtering of allele specific alignments.} \label{section:WASP} | ||
This is re-implementation of the original WASP algorithm by Bryce van de Geijn, Graham McVicker, Yoav Gilad and Jonathan K Pritchard. Please cite the original WASP paper: Nature Methods 12, 1061–1063 (2015) \url{https://www.nature.com/articles/nmeth.3582}. | ||
WASP filtering is activated with \opt{waspOutputMode} \optv{SAMtag}, which will add \optv{vW} tag to the SAM output: | ||
\optv{vW:i:1} means alignment passed WASP filtering, and all other values mean it did not pass: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.