0% found this document useful (0 votes)

10 views20 pages

SortMeRNA User Manual v2.1

Uploaded by

Alexis Torres

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

SortMeRNA User Manual v2.1

Uploaded by

Alexis Torres

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

SortMeRNA User Manual

Evguenia Kopylova
[email protected]

Feb 2016, version 2.1

1
Contents
1 Introduction 3

2 Installation 3
2.1 Install from tarball release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Install development version from git . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Install from precompiled code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Uninstall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Databases 5

4 How to run SortMeRNA 5

4.1 Index the rRNA database: command ‘indexdb rna’ . . . . . . . . . . . . . . . . . . 5
4.1.1 Example 1: indexdb rna using one database . . . . . . . . . . . . . . . . . . . 6
4.1.2 Example 2: indexdb rna using multiple databases . . . . . . . . . . . . . . . . 7
4.2 A guide to choosing ‘sortmerna’ parameters for filtering and read mapping . . . . . 8
4.3 Filter rRNA reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3.1 Example 3: multiple databases and the fastest alignment option . . . . . . . 11
4.3.2 Filtering paired-end reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.3 Example 4: forward-reverse paired-end reads (2 input files) . . . . . . . . . . 15
4.4 Read mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4.1 Mapping reads for classification . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4.2 Example 5: mapping reads against the 16S Greengenes 97% id database with
multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5 OTU-picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 SortMeRNA advanced options 19

6 Help 20

7 Citation 20

2
1 Introduction
Copyright (C) 2012-2016 Bonsai Bioinformatics Research Group
(LIFL - Université Lille 1), CNRS UMR 8022, INRIA Nord-Europe, France
http://bioinfo.lifl.fr/RNA/sortmerna/
Copyright (C) 2014-2016 Knight Lab
Department of Pediatrics, UCSD School of Medicine, La Jolla, California, USA
https://knightlab.colorado.edu
SortMeRNA is a local sequence alignment tool for filtering, mapping and OTU-picking. The core
algorithm is based on approximate seeds and allows for fast and sensitive analyses of NGS reads.
The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. Additional
applications include OTU-picking and taxonomy assignation available through QIIME v1.9+ (http:
//qiime.org). SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple
rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user.
SortMeRNA works with Illumina, 454, Ion Torrent and PacBio data, and can produce SAM and
BLAST-like alignments.
For questions & help, please contact:
1. Evguenia Kopylova [email protected]
2. Laurent Noe [email protected]
3. Helene Touzet [email protected]
4. Rob Knight [email protected]
Important: This user manual is strictly for SortMeRNA version 2.1.

2 Installation

2.1 Install from tarball release

1. Download sortmerna- 2.1.tar.gz from https://github.com/biocore/sortmerna/releases

2. Extract the source code package into a directory of your choice, enter sortmerna- 2.1 direc-
tory and type,
> bash ./build.sh

3. At this point, two executables indexdb rna and sortmerna will be located in the sortmerna-
2.1 directory. If the user would like to install the executables into their default installation
directory (/usr/local/bin for Linux or /opt/local/bin for Mac) then type,
> make install (with root permissions)

4. To begin using SortMeRNA, type ‘indexdb rna -h’ or ‘sortmerna -h’. Databases must first
be indexed using indexdb rna.

3
Figure 1: sortmerna- 2.1 directory tree

sortmerna- 2.1
alp
cmph
src

include
scripts

tests

rRNA databases
silva-bac-16s-id90.fasta
...

sortmerna

indexdb rna

2.2 Install development version from git

1. Clone the sortmerna directory to your local system

> git clone https://github.com/biocore/sortmerna.git

2. Build sortmerna
> cd sortmerna
> bash ./build.sh

2.3 Install from precompiled code

1. Download the latest binary distribution of SortMeRNA from http://bioinfo.lifl.fr/RNA/

sortmerna
2. Extract the source code package into a directory of your choice,
> tar -xvf sortmerna- 2.1.tar.gz
> cd sortmerna- 2.1

3. To begin using SortMeRNA, type ‘indexdb rna -h’ or ‘sortmerna -h’. The user must firstly
index the databases with the command indexdb rna before they can run the command
sortmerna.

4
2.4 Uninstall

If the user installed SortMeRNA using the command ‘make install’, then they can use the com-
mand ‘make uninstall’ to uninstall SortMeRNA (with root permissions).

3 Databases
SortMeRNA comes prepackaged with 8 databases,

representative database %id # seq (clustered) origin # seq (original)

silva-bac-16s-id90 90 12798 SILVA SSU Ref NR v.119 464618
silva-arc-16s-id95 95 3193 SILVA SSU Ref NR v.119 18797
silva-euk-18s-id95 95 7348 SILVA SSU Ref NR v.119 51553
silva-bac-23s-id98 98 4488 SILVA LSU Ref v.119 43822
silva-arc-23s-id98 98 251 SILVA LSU Ref v.119 629
silva-euk-28s-id98 98 4935 SILVA LSU Ref v.119 13095
rfam-5s-id98 98 59513 RFAM 116760
rfam-5.8s-id98 98 13034 RFAM 225185

HMMER 3.1b1 and SumaClust v1.0.00 were used to reduce the size of the original databases to the
similarity listed in column 2 (%id) of the table above (see /sortmerna/rRNA databases/README.txt
for a list of complete steps).
These representative databases were specifically made for fast filtering of rRNA. Approximately the
same number of rRNA will be filtered using silva-bac-16s-id90 (12802 rRNA) as using Greengenes
97% (99322 rRNA), but the former will run significantly faster.
id %: members of the cluster must have identity at least this % id with the representative sequence

Remark: The user must first index the fasta database by using the command indexdb rna and
then filter/map reads against the database using the command sortmerna.

4 How to run SortMeRNA

4.1 Index the rRNA database: command ‘indexdb rna’

The executable indexdb rna indexes an rRNA database.

To see the man page for indexdb rna,

>> indexdb_rna -h

Program: SortMeRNA version 2.1, 01/02/2016

5
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, [email protected]
Laurent Noe, [email protected]
Helene Touzet, [email protected]

usage: ./indexdb_rna --ref db.fasta,db.idx [OPTIONS]:

--------------------------------------------------------------------------------------------------------
| parameter value description default |
--------------------------------------------------------------------------------------------------------
--ref STRING,STRING FASTA reference file, index file mandatory
(ex. --ref /path/to/file1.fasta,/path/to/index1)
If passing multiple reference sequence files, separate
them by ’:’,
(ex. --ref /path/to/file1.fasta,/path/to/index1:/path/to/file2.fasta,path/to/index2)
[OPTIONS]:
--fast BOOL suggested option for aligning ~99% related species off
--sensitive BOOL suggested option for aligning ~75-98% related species on
--tmpdir STRING directory where to write temporary files
-m INT the amount of memory (in Mbytes) for building the index 3072
-L INT seed length 18
--max_pos INT maximum number of positions to store for each unique L-mer 10000
(setting --max_pos 0 will store all positions)
-v BOOL verbose
-h BOOL help

There are eight rRNA representative databases provided in the ‘sortmerna- 2.1/rRNA databases’
folder. All databases were derived from the SILVA SSU and LSU databases (release 119) and the
RFAM databases using HMMER 3.1b1 and SumaClust v1.0.00. Additionally, the user can index
their own database.

4.1.1 Example 1: indexdb rna using one database

>> ./indexdb_rna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db -v

Program: SortMeRNA version 2.1, 01/02/2016

Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab, Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, [email protected]
Laurent Noe, [email protected]
Helene Touzet, [email protected]

Parameters summary:
K-mer size: 19
K-mer interval: 1
Maximum positions to store per unique K-mer: 10000

Total number of databases to index: 1

6
Begin indexing file ./rRNA_databases/silva-bac-16s-id90.fasta under index name ./index/silva-bac-16s-db:
Collecting sequence distribution statistics .. done [1.133206 sec]

start index part # 0:

(1/3) building burst tries .. done [23.643256 sec]
(2/3) building CMPH hash .. done [22.306709 sec]
(3/3) building position lookup tables .. done [54.958680 sec]
total number of sequences in this part = 12798
writing kmer data to ./index/silva-bac-16s-db.kmer_0.dat
writing burst tries to ./index/silva-bac-16s-db.bursttrie_0.dat
writing position lookup table to ./index/silva-bac-16s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-bac-16s-db.stats
done.

4.1.2 Example 2: indexdb rna using multiple databases

Multiple databases can be indexed simultaneously by passing them as a ‘:’ separated list to --ref
(no spaces allowed).
>> ./indexdb_rna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\
./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:\
./rRNA_databases/silva-arc-16s-id95.fasta,./index/silva-arc-16s-db:\
./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-db:\
./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-db:\
./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s:\
./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:\
./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db

7
4.2 A guide to choosing ‘sortmerna’ parameters for filtering and read
mapping

In SortMeRNA version 1.99 beta and up, users have the option to output sequence alignments for
their matching rRNA reads in the SAM or BLAST-like formats. Depending on the desired quality of
alignments, different parameters choices must be set. Table 1 presents a guide to setting parameters
choices for most use cases. In all cases, output alignments are always guaranteed to reach the
threshold E-value score (default E-value=1). An E-value of 1 signifies that one random alignment
is expected for aligning all reads against the reference database. The E-value in SortMeRNA is
computed for the entire search space, not per read.

Table 1: SortMeRNA alignment parameter guide

option speed description
Output the first alignment passing E-value
Very fast for INT = 1 threshold (best choice if only filtering is
needed)
Higher INT signifies more alignments will be
Speed decreases for higher value INT
--num-alignments INT made & output
All alignments reaching the E-value threshold
are reported (this option is not suggested for
Very slow for INT = 0 high similarity rRNA databases, due to many
possible alignments per read causing a very
large file output)
Only one high-candidate reference sequence
will be searched for alignments (determined
Fast for INT = 1 heuristically using a Longest Increasing Sub-
sequence of seed matches). The single best
--best INT
alignment of those will be reported
Higher INT signifies more alignments will be
Speed decreases for higher value INT made, though only the best one will be re-
ported
All high-candidate reference sequences will be
Very slow for INT = 0 searched for alignments, though only the best
one will be reported

8
4.3 Filter rRNA reads

The executable sortmerna can filter rRNA reads against an indexed rRNA database.

To see the man page for sortmerna,

>> ./sortmerna -h

Program: SortMeRNA version 2.1, 01/02/2016

usage: ./sortmerna --ref db.fasta,db.idx --reads file.fa --aligned base_name_output [OPTIONS]:

-------------------------------------------------------------------------------------------------------------
| parameter value description default |
-------------------------------------------------------------------------------------------------------------
--ref STRING,STRING FASTA reference file, index file mandatory
(ex. --ref /path/to/file1.fasta,/path/to/index1)
If passing multiple reference files, separate
them using the delimiter ’:’,
(ex. --ref /path/to/file1.fasta,/path/to/index1:/path/to/file2.fasta,path/to/index2
--reads STRING FASTA/FASTQ reads file mandatory
--aligned STRING aligned reads filepath + base file name mandatory
(appropriate extension will be added)

[COMMON OPTIONS]:
--other STRING rejected reads filepath + base file name
(appropriate extension will be added)
--fastx BOOL output FASTA/FASTQ file off
(for aligned and/or rejected reads)
--sam BOOL output SAM alignment off
(for aligned reads only)
--SQ BOOL add SQ tags to the SAM file off
--blast STRING output alignments in various Blast-like formats
‘0’ - pairwise
‘1’ - tabular (Blast -m 8 format)
‘1 cigar’ - tabular + column for CIGAR
‘1 cigar qcov’ - tabular + columns for CIGAR
and query coverage
‘1 cigar qcov qstrand’ - tabular + columns for CIGAR,
query coverage and strand
--log BOOL output overall statistics off
--num_alignments INT report first INT alignments per read reaching E-value -1
(--num_alignments 0 signifies all alignments will be output)
or (default)
--best INT report INT best alignments per read reaching E-value 1
by searching --min_lis INT candidate alignments
(--best 0 signifies all candidate alignments will be searched)
--min_lis INT search all alignments having the first INT longest LIS 2
LIS stands for Longest Increasing Subsequence, it is
computed using seeds’ positions to expand hits into
longer matches prior to Smith-Waterman alignment.

9
--print_all_reads BOOL output null alignment strings for non-aligned reads off
to SAM and/or BLAST tabular files
--paired_in BOOL both paired-end reads go in --aligned fasta/q file off
(interleaved reads only, see Section 4.2.4 of User Manual)
--paired_out BOOL both paired-end reads go in --other fasta/q file off
(interleaved reads only, see Section 4.2.4 of User Manual)
--match INT SW score (positive integer) for a match 2
--mismatch INT SW penalty (negative integer) for a mismatch -3
--gap_open INT SW penalty (positive integer) for introducing a gap 5
--gap_ext INT SW penalty (positive integer) for extending a gap 2
-N INT SW penalty for ambiguous letters (N’s) scored as --mismatch
-F BOOL search only the forward strand off
-R BOOL search only the reverse-complementary strand off
-a INT number of threads to use 1
-e DOUBLE E-value threshold 1
-m INT INT Mbytes for loading the reads into memory 1024
(maximum -m INT is 4096)
-v BOOL verbose off

[OTU PICKING OPTIONS]:

--id DOUBLE %id similarity threshold (the alignment must 0.97
still pass the E-value threshold)
--coverage DOUBLE %query coverage threshold (the alignment must 0.97
still pass the E-value threshold)
--de_novo_otu BOOL FASTA/FASTQ file for reads matching database < %id off
(set using --id) and < %cov (set using --coverage)
(alignment must still pass the E-value threshold)
--otu_map BOOL output OTU map (input to QIIME’s make_otu_table.py) off

[ADVANCED OPTIONS] (see SortMeRNA user manual for more details):

--passes INT,INT,INT three intervals at which to place the seed on the read L,L/2,3
(L is the seed length set in ./indexdb_rna)
--edges INT number (or percent if INT followed by % sign) of 4
nucleotides to add to each edge of the read
prior to SW local alignment
--num_seeds INT number of seeds matched before searching 2
for candidate LIS
--full_search BOOL search for all 0-error and 1-error seed off
matches in the index rather than stopping
after finding a 0-error match (<1% gain in
sensitivity with up four-fold decrease in speed)
--pid BOOL add pid to output file names off

[HELP]:
-h BOOL help
--version BOOL SortMeRNA version number

The user can adjust the amount of memory allocated for loading the reads through the command
option -m. By default, -m is set to be high enough for 1GB. If the reads file is larger than 1GB,
then sortmerna internally divides the file into partial sections of 1GB and executes one section at
a time. Hence, if a user has an input file of 15GB and only 1GB of RAM to store it, the file will
be processed in partial sections using mmap without having to physically split it prior to execution.
Otherwise, the user can increase -m to map larger portions of the file. The limit for -m is given by
typing sortmerna -h.

10
4.3.1 Example 3: multiple databases and the fastest alignment option

>> time ./sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\

Program: SortMeRNA version 2.1, 01/02/2016

Computing read file statistics ... done [2.16 sec]

size of reads file: 35238748 bytes
partial section(s) to be executed: 1 of size 35238748 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 1

Begin mmap reads section # 1:

Time to mmap reads and set up pointers [0.11 sec]

Begin analysis of: ./rRNA_databases/silva-bac-16s-id90.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.602397
Gumbel K = 0.328927
Minimal SW score based on E-value = 54
Loading index part 1/1 ... done [4.67 sec]
Begin index search ... done [83.53 sec]
Freeing index ... done [0.87 sec]

Begin analysis of: ./rRNA_databases/silva-bac-23s-id98.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.603075
Gumbel K = 0.330488
Minimal SW score based on E-value = 53
Loading index part 1/1 ... done [3.63 sec]
Begin index search ... done [94.76 sec]
Freeing index ... done [0.41 sec]

11
Begin analysis of: ./rRNA_databases/silva-arc-16s-id95.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.596230
Gumbel K = 0.322143
Minimal SW score based on E-value = 52
Loading index part 1/1 ... done [1.14 sec]
Begin index search ... done [22.63 sec]
Freeing index ... done [0.14 sec]

Begin analysis of: ./rRNA_databases/silva-arc-23s-id98.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.597749
Gumbel K = 0.325630
Minimal SW score based on E-value = 49
Loading index part 1/1 ... done [0.50 sec]
Begin index search ... done [13.27 sec]
Freeing index ... done [0.06 sec]

Begin analysis of: ./rRNA_databases/silva-euk-18s-id95.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612228
Gumbel K = 0.334926
Minimal SW score based on E-value = 52
Loading index part 1/1 ... done [3.23 sec]
Begin index search ... done [30.28 sec]
Freeing index ... done [0.45 sec]

Begin analysis of: ./rRNA_databases/silva-euk-28s-id98.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612068
Gumbel K = 0.344763
Minimal SW score based on E-value = 53
Loading index part 1/1 ... done [3.43 sec]
Begin index search ... done [35.69 sec]
Freeing index ... done [0.48 sec]

Begin analysis of: ./rRNA_databases/rfam-5s-database-id98.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.616617
Gumbel K = 0.341306
Minimal SW score based on E-value = 51
Loading index part 1/1 ... done [1.77 sec]
Begin index search ... done [13.50 sec]
Freeing index ... done [0.22 sec]

Begin analysis of: ./rRNA_databases/rfam-5.8s-database-id98.fasta

Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.617817
Gumbel K = 0.340589
Minimal SW score based on E-value = 49
Loading index part 1/1 ... done [0.60 sec]
Begin index search ... done [8.78 sec]
Freeing index ... done [0.07 sec]
Total number of reads mapped (incl. all reads file sections searched): 104243
Writing aligned FASTA/FASTQ ... done [1.13 sec]

12
Writing not-aligned FASTA/FASTQ ... done [0.10 sec]

The option ‘--log’ will create an overall statistics file,

>> cat SRR105861_rRNA.log

Time and date

Command: sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\

./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:\
./rRNA_databases/silva-arc-16s-id95.fasta,./index/silva-arc-16s-db:\
./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-db:\
./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-db:\
./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s:\
./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:\
./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db\
--reads /Users/jenya/Downloads/SRR106861.fasta --sam --num_alignments 1\
--fastx --aligned SRR105861_rRNA --other SRR105861_non_rRNA.fasta fasta -v
Process pid = 1957
Parameters summary:
Index: ./index/silva-bac-16s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.602397
Gumbel K = 0.328927
Minimal SW score based on E-value = 54
Index: ./index/silva-bac-23s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.603075
Gumbel K = 0.330488
Minimal SW score based on E-value = 53
Index: ./index/silva-arc-16s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.596230
Gumbel K = 0.322143
Minimal SW score based on E-value = 52
Index: ./index/silva-arc-23s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.597749
Gumbel K = 0.325630
Minimal SW score based on E-value = 49
Index: ./index/silva-euk-18s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612228
Gumbel K = 0.334926
Minimal SW score based on E-value = 52
Index: ./index/silva-euk-28s
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612068
Gumbel K = 0.344763
Minimal SW score based on E-value = 53
Index: ./index/rfam-5s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.616617

13
Gumbel K = 0.341306
Minimal SW score based on E-value = 51
Index: ./index/rfam-5.8s-db
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.617817
Gumbel K = 0.340589
Minimal SW score based on E-value = 49
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 1
Reads file = SRR106861.fasta

Results:
Total reads = 113128
Total reads passing E-value threshold = 104243 (92.15%)
Total reads failing E-value threshold = 8885 (7.85%)
Minimum read length = 59
Maximum read length = 1253
Mean read length = 267
By database:
./rRNA_databases/silva-bac-16s-id90.fasta 25.73%
./rRNA_databases/silva-bac-23s-id98.fasta 64.37%
./rRNA_databases/silva-arc-16s-id95.fasta 0.00%
./rRNA_databases/silva-arc-23s-id98.fasta 0.00%
./rRNA_databases/silva-euk-18s-id95.fasta 0.00%
./rRNA_databases/silva-euk-28s-id98.fasta 0.00%
./rRNA_databases/rfam-5s-database-id98.fasta 2.04%
./rRNA_databases/rfam-5.8s-database-id98.fasta 0.00%

4.3.2 Filtering paired-end reads

When writing aligned and non-aligned reads to FASTA/Q files, sometimes the situation arises where
one of the paired-end reads aligns and the other one doesn’t. Since SortMeRNA looks at each read
individually, by default the reads will be split into two separate files. That is, the read that aligned
will go into the --aligned FASTA/Q file and the pair that didn’t align will go into the --other
FASTA/Q file.
This situation would result in the splitting of some paired reads in the output files and not optimal
for users who require paired order of the reads for downstream analyses.
For users who wish to keep the order of their paired-ended reads, two options are available. If one
read aligns and the other one not then,
(1) --paired-in will put both reads into the file specified by --aligned
(2) --paired-out will put both reads into the file specified by --other
The first option, --paired-in is optimal for users that want all reads in the --other file to be
non-rRNA. However, there are small chances that reads which are non-rRNA will also be put into
the --aligned file.

14
The second option, --paired-out is optimal for users that want only rRNA reads in the --aligned
file. However, there are small chances that reads which are rRNA will also be put into the --other
file.
If neither of these two options is added to the sortmerna command, then aligned and non-aligned
reads will be properly output to the --aligned and --other files, possibly breaking the order for
a set of paired reads between two output files.
It’s important to note that regardless of the options used, the --log file will always report the
true number of reads classified as rRNA (not the number of reads in the --aligned file).

4.3.3 Example 4: forward-reverse paired-end reads (2 input files)

FASTQ forward reads FASTQ reverse reads

@SEQUENCE ID 1/1 @SEQUENCE ID 1/2

ACTT.. GTAC..
pair # 1
+ +
QUALITY 1/1 QUALITY 1/2
@SEQUENCE ID 2/1 @SEQUENCE ID 2/2
GTTA.. CCAC..
+ pair # 2 +
QUALITY 2/1 QUALITY 2/2
.. ..

Figure 2: Forward and reverse reads in paired-end sequencing format

FASTQ paired-end reads

@SEQUENCE ID 1/1
ACTT..
+
QUALITY 1/1
@SEQUENCE ID 1/2
pair # 1
GTAC..
+
QUALITY 1/2
..

Figure 3: Paired-end read format accepted by SortMeRNA

SortMeRNA accepts only 1 file as input for the reads. If a user has two input files, in the case for
the foward and reverse paired-end reads (see Figure 2), they may use the merge-paired-reads.sh
script found in ‘sortmerna/scripts’ folder to interleave the paired reads into the format of Fig-
ure 3.

The command for merge-paired-reads.sh is the following,

> bash ./merge-paired-reads.sh forward-reads.fastq reverse-reads.fastq outfile.fastq

15
Now, the user may input outfile.fastq to SortMeRNA for analysis.
Similarly, for unmerging the paired reads back into two separate files, use the command,
> bash ./unmerge-paired-reads.sh merged-reads.fastq forward-reads.fastq reverse-reads.fastq

Important: unmerge-paired-reads.sh should only be used if one of the options --paired in or

--paired out was used during filtering. Otherwise it may give incorrect results if a paired-read
was split during alignment (one read aligned and the other one not).

16
4.4 Read mapping

4.4.1 Mapping reads for classification

Although SortMeRNA is very sensitive with the small rRNA databases distributed with the source
code, these databases are not optimal for classification since often alignments with 75-90% identity
will be returned (there are only several thousand rRNA in most of the databases, compared to the
original SILVA or Greengenes databases containing millions of rRNA). Classification at the species
level generally considers alignments at 97% and above, so it is suggested to use a larger database is
species classification is the main goal.
Moreover, SortMeRNA is a local alignment tool, so it’s also important to look at the query coverage
% for each alignment. In the SAM output format, neither % id or query coverage are reported.
If the user wishes for these values, then the Blast tabular format with CIGAR + query coverage
option (--blast ’1 cigar qcov’) is the way to go.

4.4.2 Example 5: mapping reads against the 16S Greengenes 97% id database with
multithreading

This example will generate SAM and BLAST tabular output files. Alignments are classified as
significant based on the E-value cutoff (default 1). SortMeRNA’s E-value takes into consideration
the full size of the reference database as well as the query file, thus the E-value is higher than
BLAST’s (ex. equivalent to BLAST’s 1e-5).
>> sortmerna --ref 97_otus_gg_13_8.fasta,./index/97_otus_gg_13_8\
--reads SRR106861.fasta --blast ’1 cigar qcov’ --sam --log --aligned SRR106861_gg_rRNA -a 20 -v

Program: SortMeRNA version 2.1, 01/02/2016

Computing read file statistics ... done [0.44 sec]

Begin mmap reads section # 1:

Time to mmap reads and set up pointers [0.10 sec]

17
Begin analysis of: 97_otus_gg_13_8.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.600470
Gumbel K = 0.327880
Minimal SW score based on E-value = 57
Loading index part 1/1 ... done [10.76 sec]
Begin index search ... done [23.75 sec]
Freeing index ... done [1.44 sec]
Total number of reads mapped (incl. all reads file sections searched): 29089
Writing alignments ... done [7.71 sec]

This is almost the same number of 16S rRNA as identified by SortMeRNA using the smaller provided
database,

>> cat SRR106861_gg_rRNA.log

Date and time

Command: sortmerna --ref 97_otus_gg_13_8.fasta,./index/97_otus_gg_13_8\

--reads SRR106861.fasta --blast ’1 cigar qcov’ --sam --log --aligned SRR106861_gg_rRNA -a 20 -v
Process pid = 44246
Parameters summary:
Index: ./index/97_otus_gg_13_8
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.600470
Gumbel K = 0.327880
Minimal SW score based on E-value = 57
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 20
Reads file = SRR106861.fasta

Results:
Total reads = 113128
Total reads passing E-value threshold = 29089 (25.71%)
Total reads failing E-value threshold = 84039 (74.29%)
Minimum read length = 59
Maximum read length = 1253
Mean read length = 267
By database:
97_otus_gg_13_8.fasta 25.71%

18
4.5 OTU-picking

SortMeRNA is implemented in QIIME’s closed-reference and open-reference OTU-picking work-

flows. The readers are referred to QIIME’s tutorials for an in-depth discussion of these methods
http://qiime.org/tutorials/otu_picking.html.

5 SortMeRNA advanced options

--num seeds INT

The threshold number of seeds required to match in the primary seed-search filter before moving
on to the secondary seed-cluster filter. More specifically, the threshold number of seeds required
before searching for a longest increasing subsequence (LIS) of the seeds’ positions between the read
and the closest matching reference sequence. By default, this is set to 2 seeds.

--passes INT,INT,INT

In the primary seed-search filter, SortMeRNA moves a seed of length L (parameter of indexdb rna)
across the read using three passes. If at the end of each pass a threshold number of seeds (defined
by --num seeds) did not match to the reference database, SortMeRNA attempts to find more seeds
by decreasing the interval at which the seed is placed along the read by using another pass. In
default mode, these intervals are set to L, L/2, 3 for Pass 1, 2 and 3, respectively. Usually, if the
read is highly similar to the reference database, a threshold number of seeds will be found in the
first pass.

--edges INT(%)

The number (or percentage if followed by %) of nucleotides to add to each edge of the alignment
region on the reference sequence before performing Smith-Waterman alignment. By default, this is
set to 4 nucleotides.

--full search FLAG

During the index traversal, if a seed match is found with 0-errors, SortMeRNA will stop searching
for further 1-error matches. This heuristic is based upon the assumption that 0-error matches are
more significant than 1-error matches. By turning it off using the --full search flag, the sensitivity
may increase (often by less than 1%) but with up to four-fold decrease in speed.

--pid FLAG

The pid of the running sortmerna process will be added to the output files in order to avoid
over-writing output if the same --aligned STRING base name is provided for different runs.

19
6 Help
Any issues or bug reports should be reported to https://github.com/biocore/sortmerna/issues
or by e-mail to the authors (see list of e-mails in Section 1 of this document). Comments and
suggestions are also always appreciated!

7 Citation
If you use SortMeRNA please cite,
Kopylova E., Noé L. and Touzet H., “SortMeRNA: Fast and accurate filtering of ribosomal RNAs
in metatranscriptomic data”, Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Bioinformatics For Evolutionary Biologists A Problems Approach Springer
No ratings yet
Bioinformatics For Evolutionary Biologists A Problems Approach Springer
410 pages
Rnaseq by Example
No ratings yet
Rnaseq by Example
163 pages
Bio Python
100% (1)
Bio Python
357 pages
Biopython Tutorial and Cookbook
No ratings yet
Biopython Tutorial and Cookbook
324 pages
Geneious Prime Manual
No ratings yet
Geneious Prime Manual
322 pages
Bio Python
No ratings yet
Bio Python
374 pages
Bio Python Tutorial
No ratings yet
Bio Python Tutorial
331 pages
Exercise 7 Bioinformatics
No ratings yet
Exercise 7 Bioinformatics
8 pages
3S03 OnLineText
No ratings yet
3S03 OnLineText
228 pages
Blast
100% (1)
Blast
21 pages
1 Project Description: Hospital - DBR
No ratings yet
1 Project Description: Hospital - DBR
19 pages
Module 1 - Introduction To Computer Networks
No ratings yet
Module 1 - Introduction To Computer Networks
9 pages
Introduction To The Command Line For Genomics
No ratings yet
Introduction To The Command Line For Genomics
10 pages
Module 4 - Measurement of Angles and Directions
No ratings yet
Module 4 - Measurement of Angles and Directions
12 pages
CSI 4500 Datasheet PDF
No ratings yet
CSI 4500 Datasheet PDF
16 pages
Quran Fonts
0% (1)
Quran Fonts
8 pages
Instruction Manual: Programmable Automatic Shift System
No ratings yet
Instruction Manual: Programmable Automatic Shift System
25 pages
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
No ratings yet
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
35 pages
ScRNA Seq Course
100% (1)
ScRNA Seq Course
337 pages
jModelTest 2 Manual v0.1.11
No ratings yet
jModelTest 2 Manual v0.1.11
27 pages
Fire Fighting Techniques
No ratings yet
Fire Fighting Techniques
3 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
STAR Alignment Manual
No ratings yet
STAR Alignment Manual
62 pages
CLC Genomics Workbench User Manual
No ratings yet
CLC Genomics Workbench User Manual
776 pages
User Manual
No ratings yet
User Manual
1,221 pages
SortMeRNA User Manual v1.9
No ratings yet
SortMeRNA User Manual v1.9
16 pages
User Manual PDF
No ratings yet
User Manual PDF
1,032 pages
Q4 MATH 9-WEEK 3-Solving Right Triangle Using Trigonometric Ratios
No ratings yet
Q4 MATH 9-WEEK 3-Solving Right Triangle Using Trigonometric Ratios
39 pages
Tutorial
No ratings yet
Tutorial
365 pages
Combined
No ratings yet
Combined
417 pages
STAR Manual 2.7.3a: Alexander Dobin Dobin@cshl - Edu October 8, 2019
No ratings yet
STAR Manual 2.7.3a: Alexander Dobin Dobin@cshl - Edu October 8, 2019
54 pages
CLC Main Workbench User Manual
No ratings yet
CLC Main Workbench User Manual
573 pages
STARmanual
No ratings yet
STARmanual
50 pages
Intro 2 RNAseq
No ratings yet
Intro 2 RNAseq
98 pages
Andreas G Skifjeld Master
No ratings yet
Andreas G Skifjeld Master
159 pages
16S Metagenomic Analysis Tutorial
No ratings yet
16S Metagenomic Analysis Tutorial
9 pages
Genomics
No ratings yet
Genomics
90 pages
CLC Genomics Workbench User Manual Subset
No ratings yet
CLC Genomics Workbench User Manual Subset
222 pages
BioPython Cookbook
No ratings yet
BioPython Cookbook
310 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Unit Iv - Blast
No ratings yet
Unit Iv - Blast
21 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
CLC Sequence Viewer: User Manual
No ratings yet
CLC Sequence Viewer: User Manual
178 pages
CLCFreeWorkbench46 Manual A4
No ratings yet
CLCFreeWorkbench46 Manual A4
179 pages
NEW BMS Software Requirement Specification1
No ratings yet
NEW BMS Software Requirement Specification1
135 pages
Jmodeltest-2 1 6-Manual PDF
No ratings yet
Jmodeltest-2 1 6-Manual PDF
24 pages
Structure Doc
No ratings yet
Structure Doc
39 pages
Bioinfo Final Practical
No ratings yet
Bioinfo Final Practical
66 pages
Biopython - Tutorial and Cookbook
No ratings yet
Biopython - Tutorial and Cookbook
206 pages
بحث المعلوماتية الحيوية
No ratings yet
بحث المعلوماتية الحيوية
39 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Intro R Biologists
No ratings yet
Intro R Biologists
29 pages
Computational Genomics Tutorial计算基因组学
No ratings yet
Computational Genomics Tutorial计算基因组学
90 pages
Biopython Tutorial PDF
No ratings yet
Biopython Tutorial PDF
332 pages
Rast Tutorial
No ratings yet
Rast Tutorial
10 pages
Bio Tools Booklet
No ratings yet
Bio Tools Booklet
5 pages
Manual Get Homologues-Est
No ratings yet
Manual Get Homologues-Est
42 pages
Bfast Book
No ratings yet
Bfast Book
77 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
Jalview 2.8: A Manual and Introductory Tutorial
No ratings yet
Jalview 2.8: A Manual and Introductory Tutorial
89 pages
Homer: Mapping Reads To The Genome
No ratings yet
Homer: Mapping Reads To The Genome
5 pages
Genomics For Beginner
No ratings yet
Genomics For Beginner
9 pages
Biopython Tutorial
No ratings yet
Biopython Tutorial
237 pages
Helping Hand - An Advance Way To Communicate With An Orphanage Organization
No ratings yet
Helping Hand - An Advance Way To Communicate With An Orphanage Organization
3 pages
Bioinformatics Softwares: by Rifat Shahriyar Student No: 100705037P
No ratings yet
Bioinformatics Softwares: by Rifat Shahriyar Student No: 100705037P
20 pages
Globe Telecom Accounting Case Study
No ratings yet
Globe Telecom Accounting Case Study
20 pages
000-1toc64469 Toc
100% (1)
000-1toc64469 Toc
5 pages
VISTA EXPLODIDA Lei SA
No ratings yet
VISTA EXPLODIDA Lei SA
56 pages
EATON SMP 4DP Manual
No ratings yet
EATON SMP 4DP Manual
2 pages
Musculoskeletal Assessment Joint Motion and Muscle Testing 3rd Edition
No ratings yet
Musculoskeletal Assessment Joint Motion and Muscle Testing 3rd Edition
311 pages
Quotation of Classroom Block at Springs Educational Services 2024
No ratings yet
Quotation of Classroom Block at Springs Educational Services 2024
2 pages
Design Rotor V-Shape Permanent Magnets-Good
No ratings yet
Design Rotor V-Shape Permanent Magnets-Good
4 pages
Ks2 Mathematics 2001 Marking Scheme
No ratings yet
Ks2 Mathematics 2001 Marking Scheme
30 pages
Yeast: Cytology
No ratings yet
Yeast: Cytology
22 pages
0 - A Manual For The Part-Compositor Framework
No ratings yet
0 - A Manual For The Part-Compositor Framework
10 pages
Full Ordinary Differential Equations Principles and Applications Cambridge IISc Series 1st Edition A. K. Nandakumaran PDF All Chapters
No ratings yet
Full Ordinary Differential Equations Principles and Applications Cambridge IISc Series 1st Edition A. K. Nandakumaran PDF All Chapters
65 pages
0 Circuit Designer
No ratings yet
0 Circuit Designer
2 pages
Fin Coil Radiator Manual
No ratings yet
Fin Coil Radiator Manual
48 pages
Litefinance Partner Agreement en
No ratings yet
Litefinance Partner Agreement en
16 pages
Com - Upgadata.up7723 Logcat
No ratings yet
Com - Upgadata.up7723 Logcat
47 pages
Paperscan V3: User Guide
No ratings yet
Paperscan V3: User Guide
53 pages
Karl George EMG
No ratings yet
Karl George EMG
2 pages
2011 - Improving A Natural Enzyme Activity Through Incorporation of Unnatural Amino Acids - SI
No ratings yet
2011 - Improving A Natural Enzyme Activity Through Incorporation of Unnatural Amino Acids - SI
14 pages
Curriculum Vitae: Nguyen Viet Anh
No ratings yet
Curriculum Vitae: Nguyen Viet Anh
7 pages
?simplify Allocations With SAP Analytics Cloud?
No ratings yet
?simplify Allocations With SAP Analytics Cloud?
15 pages
Analysing Descriptive, Prescriptive, Predictive & Diagnostic Framework at Workplace
No ratings yet
Analysing Descriptive, Prescriptive, Predictive & Diagnostic Framework at Workplace
11 pages
Nature 19769
No ratings yet
Nature 19769
16 pages
2013 - Engineering Protein Thermostability Using A Generic Activity-Independent Biophysical Screen Inside The Cell - SI
No ratings yet
2013 - Engineering Protein Thermostability Using A Generic Activity-Independent Biophysical Screen Inside The Cell - SI
13 pages
Defined Media Optimization For Growth of Recombinant Escherichia Coli X90 - 1992
No ratings yet
Defined Media Optimization For Growth of Recombinant Escherichia Coli X90 - 1992
10 pages
Genene Proposal
No ratings yet
Genene Proposal
32 pages
2013 - Engineering Protein Thermostability Using A Generic Activity-Independent Biophysical Screen Inside The Cell
No ratings yet
2013 - Engineering Protein Thermostability Using A Generic Activity-Independent Biophysical Screen Inside The Cell
8 pages
Bl21-Codonplus Competent Cells: Instruction Manual
No ratings yet
Bl21-Codonplus Competent Cells: Instruction Manual
19 pages
Lecture 2 - Problem Solving Process
No ratings yet
Lecture 2 - Problem Solving Process
32 pages
Steps For Price Bid and EPublsih
No ratings yet
Steps For Price Bid and EPublsih
39 pages
Identification of Lipoprotein Homologues of Pneumococcal PsaA in The Equine Pathogens Streptococcus Equi and Streptococcus Zooepidemicus
No ratings yet
Identification of Lipoprotein Homologues of Pneumococcal PsaA in The Equine Pathogens Streptococcus Equi and Streptococcus Zooepidemicus
4 pages
Expression of Recombinant Proteins in The Methylotrophic Yeast Pichia Pastoris
No ratings yet
Expression of Recombinant Proteins in The Methylotrophic Yeast Pichia Pastoris
5 pages
Social Media Influences To Teenagers: June 2020
No ratings yet
Social Media Influences To Teenagers: June 2020
12 pages
2011 - Improving A Natural Enzyme Activity Through Incorporation of Unnatural Amino Acids
No ratings yet
2011 - Improving A Natural Enzyme Activity Through Incorporation of Unnatural Amino Acids
8 pages
Placa N°1 28 01 2020
No ratings yet
Placa N°1 28 01 2020
3 pages
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
No ratings yet
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
4 pages
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Linux Kernel Programming: A comprehensive and practical guide to kernel internals, writing modules, and kernel synchronization
From Everand
Linux Kernel Programming: A comprehensive and practical guide to kernel internals, writing modules, and kernel synchronization
Kaiwan N. Billimoria
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet

SortMeRNA User Manual v2.1

Uploaded by

SortMeRNA User Manual v2.1

Uploaded by

SortMeRNA User Manual

Feb 2016, version 2.1

4 How to run SortMeRNA 5

5 SortMeRNA advanced options 19

2.1 Install from tarball release

1. Download sortmerna- 2.1.tar.gz from https://github.com/biocore/sortmerna/releases

2.2 Install development version from git

1. Clone the sortmerna directory to your local system

2.3 Install from precompiled code

1. Download the latest binary distribution of SortMeRNA from http://bioinfo.lifl.fr/RNA/

representative database %id # seq (clustered) origin # seq (original)

4 How to run SortMeRNA

4.1 Index the rRNA database: command ‘indexdb rna’

The executable indexdb rna indexes an rRNA database.

To see the man page for indexdb rna,

Program: SortMeRNA version 2.1, 01/02/2016

usage: ./indexdb_rna --ref db.fasta,db.idx [OPTIONS]:

4.1.1 Example 1: indexdb rna using one database

>> ./indexdb_rna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db -v

Program: SortMeRNA version 2.1, 01/02/2016

Total number of databases to index: 1

start index part # 0:

4.1.2 Example 2: indexdb rna using multiple databases

Table 1: SortMeRNA alignment parameter guide

To see the man page for sortmerna,

Program: SortMeRNA version 2.1, 01/02/2016

usage: ./sortmerna --ref db.fasta,db.idx --reads file.fa --aligned base_name_output [OPTIONS]:

[OTU PICKING OPTIONS]:

[ADVANCED OPTIONS] (see SortMeRNA user manual for more details):

>> time ./sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\

Program: SortMeRNA version 2.1, 01/02/2016

Computing read file statistics ... done [2.16 sec]

Begin mmap reads section # 1:

Begin analysis of: ./rRNA_databases/silva-bac-16s-id90.fasta

Begin analysis of: ./rRNA_databases/silva-bac-23s-id98.fasta

Begin analysis of: ./rRNA_databases/silva-arc-23s-id98.fasta

Begin analysis of: ./rRNA_databases/silva-euk-18s-id95.fasta

Begin analysis of: ./rRNA_databases/silva-euk-28s-id98.fasta

Begin analysis of: ./rRNA_databases/rfam-5s-database-id98.fasta

Begin analysis of: ./rRNA_databases/rfam-5.8s-database-id98.fasta

The option ‘--log’ will create an overall statistics file,

>> cat SRR105861_rRNA.log

Command: sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\

4.3.2 Filtering paired-end reads

4.3.3 Example 4: forward-reverse paired-end reads (2 input files)

FASTQ forward reads FASTQ reverse reads

@SEQUENCE ID 1/1 @SEQUENCE ID 1/2

Figure 2: Forward and reverse reads in paired-end sequencing format

FASTQ paired-end reads

Figure 3: Paired-end read format accepted by SortMeRNA

The command for merge-paired-reads.sh is the following,

Important: unmerge-paired-reads.sh should only be used if one of the options --paired in or

4.4.1 Mapping reads for classification

Program: SortMeRNA version 2.1, 01/02/2016

Computing read file statistics ... done [0.44 sec]

Begin mmap reads section # 1:

>> cat SRR106861_gg_rRNA.log

Command: sortmerna --ref 97_otus_gg_13_8.fasta,./index/97_otus_gg_13_8\

SortMeRNA is implemented in QIIME’s closed-reference and open-reference OTU-picking work-

5 SortMeRNA advanced options

--num seeds INT

--full search FLAG

You might also like