0% found this document useful (0 votes)

5 views

aafUserManual

Uploaded by

mohd.aqib17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

aafUserManual

Uploaded by

mohd.aqib17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

AAF User Manual

Huan Fan, Anthony Ives, Yann Surget-Groba and Chuck Cannon

[email protected]

Feb 2016

Table of Contents
Requirements 2
Installation 2
Usage and options 3
1) aaf_phylokmer.py 3
2) aaf_distance.py 4
3) aaf_tip.py 5
4) nonparametric_bootstrap.py 5
5) parametric_bootstrap.R 6
Tutorial with dummy dataset 7
Description of output files 7
Parameter Selection 8
a. Optimal k 8
b. Filter or not 10
c. Tip trimming (optional) 10
d. Bootstrap 10
Reference 12
Introduction
AAF (alignment and assembly-free) is a free software package that reconstructs phylogeny from next-
generation sequencing data without assembly and alignment. It takes raw sequencing reads from each
sample altogether and generates a distance matrix based on the proportion of shared k-mers between
each sample and reconstruct a phylogeny based on the distance matrix.

AAF is mainly designed for big Eukaryotes genomes. Therefore we divided the whole reconstruction
process into two major steps: 1) k-mer counting and 2) distance calculation and phylogeny
reconstruction. We have two separate python scripts taking care of the two steps respectively: 1)
aaf_phylokmer.py 2) aaf_distance.py. There are 3 more optional scripts in the AAF package. One for
trimming excessive tips of the phylgenye generated due to sequencing error and imcomplete coverage
(aaf_tip.py); two for doing bootstraps for the phylogeny constructed (nonparametric_bootstrap.py and
parametric_bootstrap.R). In the rest of the manual we will introduce their usage and options
respectively.

We have included two tutorials in this manual. One is a dummy dataset with 10 species and short
genomes and the other is a real dataset with 21 tropical tree genomes as described in the AAF paper.
The first one is used to showcase how to organize data and run the first scripts. The second one is used
to demonstrated possible issues while dealing with real and big dataset, with special focus on parameter
selection including k and filtering. There is a separate section detailing the reasoning for optimal
parameter selection as well.

For the most recent version of AAF, please visit https://sourceforge.net/projects/aaf-phylogeny/

Requirements
AAF can be used on a UNIX system (Linux, OsX...) with Python 2.6 and higher 2.X versions (NOT
Python 3.0+), and g++/gcc compilers. Biopython (http://biopython.org/wiki/Main_Page) is required for
the non-parametric bootstrap, and R (http://cran.r-project.org/) and the R package 'ape' are required for
the parametric bootstrap.

Installation
0. Decompress the zip file downloaded from http://sourceforge.net/projects/aaf-phylogeny with the
most recent version.

1. Compile kmer_count(x) and kmer_merge as follows. "path_to_AAF" stands for your path to the
AAF folder generated by decompressing AAF.tar.gz.
a. path_to_AAF/AAF$ cd phylokmer
b. path_to_AAF/AAF/phylokmer$ make
c. Add kmer_count(x) and kmer_merge to your PATH or working directory
2. Compile fitch_kmerX, consense and treedist
a. path_to_AAF/AAF$ cd phylip_src
b. path_to_AAF/AAF/phylip_src$ make all
c. Add fitch_kmerX and consense to your PATH or working directory

Usage and options

(See tutorials below for examples):

1) aaf_phylokmer.py
Usage: aaf_phylokmer.py [options]

Options:
--version show program's version number and exit

-h, --help show this help message and exit

-k KLEN k-mer length, default = 25

-t NTHREADS number of threads to use, default = 1

-n FILTER k-mer filtering threshold, default = 1

-f SEQFORMAT format of input files, FA|FQ, default = FA

-o OUTFILE output file, default = phylokmer.dat.gz

-d DATADIR directory containing the data, default = data/

-G MEMSIZE total memory limit (in GB), default = 4

-W withKmer include k-mers in the shared k-mer table

-s only print commands, do not run them

Detailed description of options:

-k KLEN: k-mer size. Larger k will decrease the probability of two identical k-mers from different parts
of the genome (k-mer homoplasy) while increase the probability of k-mers containing sequencing
errors and multiple evolutionary events such as substitutions or indels. See more details in the
parameter selection section. Set at 25 by default.

-t NTHREADS: number of threads to use. Depends on how many cores are available on your machine.
Set at 1 by default.

-n FILTER: how many times a k-mer needs to be in the sample to be counted as present. This serves as
the filter for singletons, which could be the result of sequencing error. See more details about the
parameter selection section. Set at 1 by default.

-f SEQFORMAT: format of the sequence files, FA or FQ. The default is set as FA.
-o OUTFILE: output filename. If you would like your output file to be compressed, provide a name
that ends with .gz. Otherwise it will not be compressed. The default output file is phylokmer.dat.gz,
which is compressed.

-d DATADIR: directory containing the data. Users should strictly follow the data structure required by
AAF. Sequence files for each sample need to be in one directory named after that sample. Therefore,
there will be N directories for N samples and the name of the directories will be the names displayed in
the final phylogenetic tree. All the sample folders should be placed into the same directory, which will
be your data directory requested by aaf_phylokmer.py. Accepted extensions for sequence files
include: .fa(sta)(.gz), .fq(.gz), and .fastq(.gz). See the “data” directory in the package as an example.

-G MEMSIZE: the total memory allowance. Each kmer_count thread has G/t memory allowance. Set at
4G by default.

-W WITHKMER: to include k-mers in the shared k-mer table. When the final goal is to construct a
phylogeny, we do not need to know the specific patterns of each k-mer. Therefore by default in the
shared k-mer table only the frequencies of k-mers are kept. However if there’s downstream analysis of
k-mers with a certain pattern, k-mers need to be kept. Use -W to keep the k-mers.

-s SIM: This will print out the commands that are going to run without executing them.

2) aaf_distance.py
Usage: aaf-distance.py [options] -i <input filename>

Options:
--version show program's version number and exit

-h, --help show this help message and exit

-i IPTF input file, default = phylokmer.dat.gz

-t NTHREADS number of threads to use, default = 1

-G MEMSIZE max memory to use (in GB), default = 1

-o OTPF prefix of the output files, default = aaf

-f COUNTF k-mer diversity file, default = phylokmer.dat.wc

Detailed description of options:

-i IPTF: input file. The shared k-mer table generated from aaf_phylokmer.py. The default file is
phylokmer.dat(.gz)

-t NTHREADS: number of threads to use. Depends on how many cores are available on your machine.
Set at 1 by default.

-G MEMSIZE: the total memory allowance in GB. Set at 4G by default.

-o OTPF: prefix of the output files, including the distance matrix(.dist) and the phylogenetic tree(.tre).
Default is set as “aaf”.

-f COUNTF: wc file generated from kmer_count. This file contains the k-mer diversity of each sample.
The default is phylokmer.dat.wc

3) aaf_tip.py
Usage: aaf_tip.py [options] -i <input tree file> -k <kmer size> --tip <information for tip correction>

Options:
--version show program's version number and exit

-h, --help show this help message and exit

-i IPTF tree file to be trimmed

-k KLEN k-mer size used for constructing the input tree

--tip=TIP_FILE tip setting file, default = tip_file_test.txt

-n k-mer filtering was on for tree construction

-f COUNTF k-mer diversity file, default = phylokmer.dat.wc

Detailed description of options:

-i IPTF: input tree file. The tree file whose tips you would like to trim.

-k KLEN: the k that was used to construct the input tree.

--tip TIP_FILE: To trim the excess tips caused by incomplete coverage and sequencing errors requires
additional info on the average coverage, read length and sequencing error of each sample. Put this
information into a tab delimited text file in the format of tip_info_test.txt. See suggestions on
estimation of coverage and sequencing error in Parameter Selection section.

-n: add it to the command if filter was used during the tree construction.

-f COUNTF: wc file generated from kmer_count. This file contains the k-mer diversity of each sample.
The default is phylokmer.dat.wc

4) nonparametric_bootstrap.py
Usage: nonparametric_bootstrap.py [options]

Options:
-h, --help show this help message and exit

-k KLEN k-mer length, default = 25

-t NTHREADS number of threads to use, default = 1

-n FILTER k-mer filtering threshold, default = 1

-f SEQFORMAT format of input files, FA|FQ, default = FA

-o OUTFILE k-mer table name, default = phylokmer.dat.gz

-d DATADIR directory containing the data, default = data/

-G MEMSIZE total memory limit (in GB), default = 4

--S1=STAGE1 number of resampling of the reads, default = 0

--S2=STAGE2 number of resampling of each total kmer table, default = 0

-s only print commands, do not run them

Detailed description of options:

-k KLEN: k-mer size. Set at 25 by default.

-t NTHREADS: number of threads to use. Depends on how many cores are available on your machine.
Set at 1 by default.

-n FILTER: how many times a k-mer needs to be in the sample to be counted as present. Set at 1 by
default.

-f SEQFORMAT: Format of the sequence files, FA or FQ. The default is set as FA.

-o OUTFILE: file name of the merged k-mer table. If you would like your k-mer table to be
compressed, provide a name that ends with .gz. Otherwise it will not be compressed. The default output
file is phylokmer.dat.gz, which is compressed.

-d DATADIR: directory containing the data.

-G MEMSIZE: the total memory allowance. Each kmer_count thread has G/t memory allowance. Set at
4G by default.

--S1: number of times to resample the reads for each sequence file. This is the first stage of our two-
stage bootstrap. This bootstrap result shows the variance in sequencing error and incomplete coverage.
Set at 0 by default, which means skip the first stage of bootstrap and only resample the k-mer table.

--S2: number of times to resample the total k-mer table generated from one instance of resampling of
the reads. If --S1 is set to be 0, the resampling is on the real k-mer table generated from the original
data. Set at 0 by default, which means skipping this step.

5) parametric_bootstrap.R
When it takes too long to bootstrap over large datasets, switch to the parametric bootstrap. This R script
provides estimation of the variances in the two steps. It requires:
info file: containing read length, sequencing error and coverage and used in aaf_tip.py, default =
tip_info_test.txt
nshare file: containing the number of shared kmers generated by aaf_distance.py (ends with
_nshare.csv), default = test_nshare.csv
nreadboot: number of replicates, default = 10
k: k-mer length used in previous steps, default = 21
i.filter: filter threshold used, default =1.

Tutorial with dummy dataset

1) Decompress the pipeline and the test data (It will become available as soon as this paper is accepted
from the AAF project page http://sourceforage.projects/AAF-phylogeny):
$tar xvfz AAF.tar.gz

2) Move to the phylokmer directory and compile kmer_count, kmer_countx, and kmer_merge
path_to_AAF/AAF$ cd phylokmer
path_to_AAF/AAF/phylokmer$ make
path_to_AAF/AAF/phylokmer$ cp kmer_count kmer_countx kmer_merge ../

3) Compile fitch_kmerX
path_to_AAF/AAF/phylokmer$ cd ../phylip_src
path_to_AAF/AAF/phylip_src$ make all
path_to_AAF/AAF/phylip_src$ cp fitch_kmerX consense ../
path_to_AAF/AAF/phylip_src$ cd ..

4) k-mer counting
path_to_AAF/AAF/$ python aaf_phylokmer.py -k 21 -d data -G 2

5) Constructing the phylogenetic tree

path_to_AAF/AAF/$ python aaf_distance.py -i phylokmer.dat.gz -o test -t 2 -G 2 -f phylokmer.dat.wc

6) Tip correction (optional)

path_to_AAF/AAF/$ python aaf_tip.py -i test.tre -k 21 --tip tip_info_test.txt -f phylokmer.dat.wc

7) Non-parametric bootstrap (optional)

path_to_AAF/AAF/$ python nonparametric_bootstrap.py -k 21 -t 2 -d data --S1 2 --S2 2

8) Parametric bootstrap (optional)

Set your working directory to the AAF folder and change the parameters in the “set parameters” section,
including nboot, k, filter, info.file and n.table.file.
Within R console or terminal:
> source(“parametric_bootstrap.R”)

Description of output files

1) phylokmer.dat.gz (aaf_phylokmer.py): This output file will be inside the data folder. It starts
with header information including k-mer size, filter frequency and sample list. After the header is a
table with frequencies of a given k-mer from each sample on one line in alphabetical order of the
sample names.
2) phylokmer.dat.wc (aaf_phylokmer.py): Inside the data folder. It contains the total k-mer
diversity for each sample per line in alphabetical order of the sample names.
3) test_nshare.csv (aaf_distance.py): In the current working directory. It contains the number of
shared k-mers for each pair of samples. This file is needed for the parametric bootstrap.
4) test.tre (aaf_distance.py): In the current working directory. It is the phylogeny you want!
5) test.dist (aaf_distance.py): In the current working directory. It is the distance matrix upon
which fitch_kmerX infers the phylogeny (test.tre in this case).
6) tip_test.tre (aaf_tip.py): In the current working directory. It is the tree after tip correction.
7) consensus_trees_read_nonparametric (nonparametric_bootstrap.py): In the current working
directory. This file contains all the trees that were generated after each resampling of the reads.
8) consensus_trees_total_nonparametric (nonparametric_bootstrap.py): In the current working
directory. This file contains all the trees that were generated after both resampling of the reads and the
k-mer table counted from those reads.
9) consensus_trees_table_nonparametric (nonparametric_bootstrap.py): In the current working
directory. This file contains all the trees that were generated after each resampling of the real k-mer
table calculated from the original reads when the resampling of reads is skipped.
10) consensus_read_nonparametric.tre (nonparametric_bootstrap.py): In the current working
directory. This is the consensus tree generated from consensus_trees_read_nonparametric by consense
in PHYLIP.
11) consensus_total_nonparametric.tre (nonparametric_bootstrap.py): In the current working
directory. This is the consensus tree generated from consensus_trees_total_nonparametric by consense
in PHYLIP.
12) consensus_table_nonparametric.tre (nonparametric_bootstrap.py): In the current working
directory. This is the consensus tree generated from consensus_trees_table_nonparametric by consense
in PHYLIP.
13) consensus_outfile_read_nonparametric (nonparametric_bootstrap.py): In the current working
directory. This file contains the bootstrap ratio for the branches in consensus_read_nonparametric.tre
14) consensus_outfile_total_nonparametric (nonparametric_bootstrap.py): In the current working
directory. This file contains the bootstrap ratio for the branches in consensus_total_nonparametric.tre
15) consensus_outfile_table_nonparametric (nonparametric_bootstrap.py): In the current working
directory. This file contains the bootstrap ratio for the branches in consensus_table_nonparametric.tre
16) consensus_*_*_parametric (parametric_boostrap.R): In the current working directory. See
descriptions of their nonparametric counterparts.

Parameter Selection
Here we provide some guidelines in parameter selection using the dataset with 21 tropical trees as an
example.

a. Optimal k
As we described in the manuscript, the selection of k is a trade-off between avoiding multiple
mutations on one k-mer (which favors shorter k) and decreasing the chances of k-mer homoplasy
(which favors longer k). For the primate dataset in the manuscript, we plot the theoretical predictions of
the proportion of shared k-mers, ph, calculated from the observed frequency distribution of k-mers and
the ph calculated without homoplasy (Fig. 2D) to help view the effect of different choices of k. This
procedure led to the selection of k that corresponded to an accurate phylogeny. Therefore this figure
serves as a good indicator for optimal k, and this choice can be further proved by constructing
phylogeny with k-mer lengths larger than optimal, in order to check the phylogenetic consistency.
To plot the ph vs. k figure for your dataset, here is a checklist for the genome information that is
needed:
i. Sample names
ii. Coverage
iii. Genome size
iv. GC content
v. d (genetic distance)
vi. Qk

We are aware that this information might not be all available, and we provide coarse calculation
methods for some of the categories.

ii. Coverage
There are multiple ways of estimating the sequencing coverage of your next-gen sequencing data. (1) If
the genome size is known, coverage = total bp / genome size. (2) If the genome size is unknown, we
can estimate the coverage by plotting the k-mer frequency distribution: “if a large fraction of k-mers
occur c times, we can estimate the sequencing coverage to be approximately c and derive an estimate of
the genome size from c and the total length of the reads.” (Marcais and Kingsford 2011)). c will be the
k-mer coverage. To get the base pair coverage, you need correct c using base_coverage = c *
read_length / (read_length - k-mer_size + 1) (see https://groups.google.com/forum/#!topic/bgi-
soap/xKS39Nz4SCE). (3) When the coverage is low or sequencing error rate is high, there will be no
clear peak in the k-mer frequency distribution at c. This is actually the case for all the tropical tree
species in our dataset except Ficus vasculosa (FV). A coarse estimation of the k-mer coverage will be
the total number of k-mers (including multiple copies of the same k-mer) divided by k-mer diversity
(number of k-mer that shows up at least once). Some assemblers (such as velvet, SOAPdenovo) report
estimation of k-mer coverage as well.
Coverage information is also needed for tip correction.

iii. Genome size

You can try to check the genome size in databases such as Plant DNA C-values Database
(http://data.kew.org/cvalues/) or Animal Genome Size Database (http://www.genomesize.com/). If it is
not available for your species, you can do a rough estimate using total base pair divided by coverage.

iv. GC content
There are many tools to calculate the GC content of your samples. In the AAF package we have
provided our own, gc.py in the utils folder. Biopython needs to be preinstalled.

v. d (Genetic distance)
The genetic distance of the group (average number of mutations per base pair) is used to set the scale of
the vertical axis in the ph vs. k figure. Because the figure is used to find k on the horizontal axis, the
conclusions from this figure about selecting k are mostly independent of the selection of d, so this
selection does not need to be very accurate. A reasonable strategy is to guess d, or use the default 0.1,
to select k. The subsequent phylogeny construction will give a good estimate of d from the distance
matrix, which then could be used to plot the figure.
vi. Qk
There is more than one way of calculating the frequency distribution of k-mers. One of the easiest ways
is to turn on the --stats option while counting the k-mers using jellyfish(Marcais and Kingsford 2011).
However the maximum k that jellyfish can handle is 31. For k>31, use kmer_countx to count the k-
mers, then calculate the frequency distribution of k-mers from the pkdat files (the output file of
kmer_count(k≤25) and kmer_countx(k>25)) using the pkdat2hist.py in the utils folder that is provided
in the tutorial folder.

After gathering all the information, we generated the ph vs. k figure for the 21 tropical trees dataset
(Fig. S6) using the R code phVSk.R in the utils folder. The trend for all the red lines (estimated ph
based on the Qk for each species) stabilized for k ≥ 25, and the difference between the red lines and the
black dashed line continued to decrease with larger k. Therefore, we constructed phylogenies for k from
25 to 31, and because the phylogenies were identical for k ≥ 27 (Fig.7), we selected 27 as the optimal k
for the tropical trees dataset. The same phylogenetic topology was also obtained when k-mers were
filtered to remove singletons. For k greater than 31, the topology within the Ficus group showed some
small changes. We suspect that this is due to the loss of sensitivity to evolutionary changes when
selecting k-mer lengths too long, especially for relatively small genomes (as the Ficus group has
genome sizes less than half those of the other species).
To plot the ph vs. k figure for your dataset, simply replace the genome information for the
tropical trees with the information for your own dataset in the beginning of the R script phVSk.R.

b. Filter or not
In deciding whether or not to filter k-mers (i.e., only including k-mers if they occur at least twice in a
taxon), it is necessary to know the balance between loss of information through false k-mers that
caused by sequencing errors if there is no filtering, and loss of information through removing true
singletons if there is filtering (Fig. 5 in the manuscript). If there is a large range of coverages among
taxa within a dataset, it is best to decide whether or not to filter based upon the taxon with the lowest
coverage, because there is a large negative consequence of filtering low-coverage taxa (Fig. 5). For the
tropical trees, we chose not to filter because more than half of the species have coverage less than 5.

c. Tip trimming (optional)

Information needed for tip trimming/correction includes: coverage, read length, and sequencing error.
You should be able to get the performance of your sequencing platform from your sequencing
company. For example Illumina claims that the error rate for Genome Analyzer is about 1%
(http://res.illumina.com/documents/products/datasheets/datasheet_genomic_sequence.pdf).
For formatting the required file, see tip_info_tt.txt in the tropical_trees folder containing the tip
information for the tropical trees.

d. Bootstrap
i. Nonparametric vs. parametric bootstraps
Nonparametric bootstrapping can be computationally intensive when the dataset is large (>100G in
total). If you think it takes too long, switch to the parametric one. Also with large genomes, the
bootstrap value tends to stay as 100%.

ii. Correction factor

Set to be 2. The correction factor in the R script is from Equation 11 that estimates the variation caused
by sequencing error and incomplete coverage. Detailed simulations showed that the formula sometimes
under-estimates and sometimes over-estimates the true standard deviation in the distance between taxa
by as much as 50%. This occurs because of the complexities of accounting mathematically for the
correlations among k-mers that occur on the same reads. The correction factor set to 2 provides a
conservative bootstrap (i.e., one that is not going to improperly inflate the support for nodes), be
insuring that the true variation caused by sequencing error and incomplete coverage is never
underestimated.
1.0
0.8
0.6
ph

0.4

d = 0.1
0.2
0.0

15 20 25 30 35
k

Figure S6. Theoretical predictions of the proportion of shared k-mers, ph, calculated from the observed
frequency distribution of k-mers, Qk, for the tropical trees dataset ranging in genome size from 250M to
2Gbp assuming the true distance between taxa is d = 0.1 (divergence time 94Mya).
Reference:
Marcais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences
of k-mers. Bioinformatics 27: 764–770.

Lab 3
No ratings yet
Lab 3
6 pages
Phylogenetic Trees Made Easy A How-To Manual
100% (1)
Phylogenetic Trees Made Easy A How-To Manual
231 pages
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
No ratings yet
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
17 pages
Quantitative Clad Is Tics
100% (1)
Quantitative Clad Is Tics
49 pages
Newick Utilities Tutorial
No ratings yet
Newick Utilities Tutorial
74 pages
Vimal Roll No 2211022 ANALYSIS TOOL. PHYLIPpptx
No ratings yet
Vimal Roll No 2211022 ANALYSIS TOOL. PHYLIPpptx
27 pages
Phylocom Manual
No ratings yet
Phylocom Manual
39 pages
Phylocom Manual PDF
No ratings yet
Phylocom Manual PDF
39 pages
Newick Utilities Tutorial: Polio1A CO XA18
No ratings yet
Newick Utilities Tutorial: Polio1A CO XA18
109 pages
Phylomatic Cheat Sheet
No ratings yet
Phylomatic Cheat Sheet
6 pages
Phylofriend User Guide: Dirk Struve Phylofriend at Projectory - de
No ratings yet
Phylofriend User Guide: Dirk Struve Phylofriend at Projectory - de
26 pages
Structure Doc
No ratings yet
Structure Doc
39 pages
A Quick Guide To The Commands: Help He
No ratings yet
A Quick Guide To The Commands: Help He
6 pages
Plink Doc 1.07
No ratings yet
Plink Doc 1.07
293 pages
PhyML-3.1 - Manual Bioinformatics
No ratings yet
PhyML-3.1 - Manual Bioinformatics
39 pages
Structure - Distruct Tutorial
No ratings yet
Structure - Distruct Tutorial
6 pages
Phylogeny_Notes
No ratings yet
Phylogeny_Notes
14 pages
Ancestral Reconstruction/Discrete Phylogeography With BEAST 2.3.x
No ratings yet
Ancestral Reconstruction/Discrete Phylogeography With BEAST 2.3.x
21 pages
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
No ratings yet
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
9 pages
Computational and Systems Biology Assignment Help
100% (1)
Computational and Systems Biology Assignment Help
15 pages
J Model Test
No ratings yet
J Model Test
3 pages
phyloseq
No ratings yet
phyloseq
87 pages
jModelTest 2 Manual v0.1.11
No ratings yet
jModelTest 2 Manual v0.1.11
27 pages
Exercises For Phylogeny: Exercise 1. Parsimony and Rooted Versus Unrooted Trees
No ratings yet
Exercises For Phylogeny: Exercise 1. Parsimony and Rooted Versus Unrooted Trees
11 pages
Send E-Mail To Darwin Team
No ratings yet
Send E-Mail To Darwin Team
116 pages
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Learning Linux Binary Analysis: Learning Linux Binary Analysis
From Everand
Learning Linux Binary Analysis: Learning Linux Binary Analysis
Ryan "elfmaster" O'Neill
4/5 (1)
Unit IV
No ratings yet
Unit IV
11 pages
Using Random Forests v4.0
No ratings yet
Using Random Forests v4.0
33 pages
Kmerfreq: Fanagislab
No ratings yet
Kmerfreq: Fanagislab
2 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
48 pages
Goloboff Et Al 2008 TNT PDF
No ratings yet
Goloboff Et Al 2008 TNT PDF
13 pages
Repeatanalyzer Quickstart Guide: Direct Download (Recommended)
No ratings yet
Repeatanalyzer Quickstart Guide: Direct Download (Recommended)
17 pages
Bioinformatics and Computational Biology With Biopython: 3.1 Running BLAST
No ratings yet
Bioinformatics and Computational Biology With Biopython: 3.1 Running BLAST
2 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Jmodeltest-2 1 6-Manual PDF
No ratings yet
Jmodeltest-2 1 6-Manual PDF
24 pages
Treemix Manual 10 1 2012
No ratings yet
Treemix Manual 10 1 2012
11 pages
Use of DARwin For Dendrogram Analysis
100% (1)
Use of DARwin For Dendrogram Analysis
2 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
A Practical Guide Wireshark Forensics
From Everand
A Practical Guide Wireshark Forensics
alasdair gilchrist
5/5 (4)
User Manual For TreeMix v1.0. Joseph K. Pickrell, Jonathan K. Pritchard
No ratings yet
User Manual For TreeMix v1.0. Joseph K. Pickrell, Jonathan K. Pritchard
8 pages
6.2 MEGA Workshop
No ratings yet
6.2 MEGA Workshop
3 pages
paup.lab
No ratings yet
paup.lab
5 pages
Phylogenetic Analyses: Kirsi Kostamo
No ratings yet
Phylogenetic Analyses: Kirsi Kostamo
33 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
SMART-Aptamer-Manual
No ratings yet
SMART-Aptamer-Manual
4 pages
Maximum Parsimony Using PAUP and TNT
No ratings yet
Maximum Parsimony Using PAUP and TNT
9 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Ms - A Program For Generating Samples Under Neutral Models: Richard R. Hudson October 16, 2017
No ratings yet
Ms - A Program For Generating Samples Under Neutral Models: Richard R. Hudson October 16, 2017
22 pages
Ouch
No ratings yet
Ouch
26 pages
Manual Pyrad
0% (1)
Manual Pyrad
25 pages
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
No ratings yet
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
19 pages
Phy Tools
No ratings yet
Phy Tools
252 pages
Fesetup Installation
No ratings yet
Fesetup Installation
14 pages
Ape
No ratings yet
Ape
290 pages
Download Complete Phylogenomics A Primer 1st Edition Rob Desalle (Author) PDF for All Chapters
100% (3)
Download Complete Phylogenomics A Primer 1st Edition Rob Desalle (Author) PDF for All Chapters
51 pages
Fortinet FCP - FortiAnalyzer 7.4 Analyst Exam Preparation
From Everand
Fortinet FCP - FortiAnalyzer 7.4 Analyst Exam Preparation
Georgio Daccache
No ratings yet
Bioinformatics For Evolutionary Biologists A Problems Approach Springer
No ratings yet
Bioinformatics For Evolutionary Biologists A Problems Approach Springer
410 pages
FCP - FortiAnalyzer 7.4 Administrator Exam Preparation
From Everand
FCP - FortiAnalyzer 7.4 Administrator Exam Preparation
Georgio Daccache
No ratings yet
BPGA User Manual
No ratings yet
BPGA User Manual
9 pages
Alfred Robinson - Student Sheet - Photosynthesis in Leaf Disks Lab
No ratings yet
Alfred Robinson - Student Sheet - Photosynthesis in Leaf Disks Lab
2 pages
Antisense RNA Technology
No ratings yet
Antisense RNA Technology
16 pages
Circadian Clock During Plant Development
No ratings yet
Circadian Clock During Plant Development
9 pages
General Pharmacology Notes
100% (1)
General Pharmacology Notes
76 pages
Collection of Systematic Bacteriology (Mansoura Dentistry)
No ratings yet
Collection of Systematic Bacteriology (Mansoura Dentistry)
16 pages
GC Derivatization Methods
No ratings yet
GC Derivatization Methods
15 pages
Elementary Science 5E+ Lesson Plan Cycle
No ratings yet
Elementary Science 5E+ Lesson Plan Cycle
3 pages
FUTMINNA Post-UTME Practice Questions
100% (1)
FUTMINNA Post-UTME Practice Questions
69 pages
Heatstroke Risk Informing System
No ratings yet
Heatstroke Risk Informing System
8 pages
Cambridge International AS & A Level: BIOLOGY 9700/53
No ratings yet
Cambridge International AS & A Level: BIOLOGY 9700/53
12 pages
DIDACTICS-ELECTROLYTES
No ratings yet
DIDACTICS-ELECTROLYTES
11 pages
Feeding and Sleeping Problems in Infancy
No ratings yet
Feeding and Sleeping Problems in Infancy
16 pages
What Are The Tasks That Cause The Greatest Risk?: Static Work
No ratings yet
What Are The Tasks That Cause The Greatest Risk?: Static Work
1 page
Biology Sense Organs
No ratings yet
Biology Sense Organs
3 pages
Plant Kingdom - DPP
No ratings yet
Plant Kingdom - DPP
13 pages
Research Article: Learning Based Genetic Algorithm For Task Graph Scheduling
No ratings yet
Research Article: Learning Based Genetic Algorithm For Task Graph Scheduling
16 pages
Biology 10th Imp Short & Long Questions 2024
0% (1)
Biology 10th Imp Short & Long Questions 2024
7 pages
Letter Marinduque
No ratings yet
Letter Marinduque
1 page
General Biology 1: Quarter 1 - Module 3: Cell Cycle: Mitosis
No ratings yet
General Biology 1: Quarter 1 - Module 3: Cell Cycle: Mitosis
26 pages
Hasil Pemeriksaan Laboratorium: Laboratory Test Result
No ratings yet
Hasil Pemeriksaan Laboratorium: Laboratory Test Result
1 page
G10-IG2 Lab. Report 1. Comparing Visual and Auditory Reaction Time
No ratings yet
G10-IG2 Lab. Report 1. Comparing Visual and Auditory Reaction Time
3 pages
1st Periodical Exam in BIOTECHNOLOGY
No ratings yet
1st Periodical Exam in BIOTECHNOLOGY
1 page
Introduction To Neonatology Final
No ratings yet
Introduction To Neonatology Final
66 pages
Genbio 1 Reviewer
No ratings yet
Genbio 1 Reviewer
14 pages
Regulation of Gene Expression
No ratings yet
Regulation of Gene Expression
16 pages
Non-Mendelian Inheritance
No ratings yet
Non-Mendelian Inheritance
6 pages
2022 05 31 494115v2 Full
No ratings yet
2022 05 31 494115v2 Full
34 pages
Overview of Anatomy and Physiology Lesson 1
No ratings yet
Overview of Anatomy and Physiology Lesson 1
28 pages
Operation Guide For Whole Blood Sample Hemoglobin A1c Test - V1.0 - EN PDF
No ratings yet
Operation Guide For Whole Blood Sample Hemoglobin A1c Test - V1.0 - EN PDF
2 pages
Energy Conversion and Management
No ratings yet
Energy Conversion and Management
9 pages

aafUserManual

Uploaded by

aafUserManual

Uploaded by

AAF User Manual

Huan Fan, Anthony Ives, Yann Surget-Groba and Chuck Cannon

For the most recent version of AAF, please visit https://sourceforge.net/projects/aaf-phylogeny/

Usage and options

-h, --help show this help message and exit

-k KLEN k-mer length, default = 25

-t NTHREADS number of threads to use, default = 1

-n FILTER k-mer filtering threshold, default = 1

-f SEQFORMAT format of input files, FA|FQ, default = FA

-o OUTFILE output file, default = phylokmer.dat.gz

-d DATADIR directory containing the data, default = data/

-G MEMSIZE total memory limit (in GB), default = 4

-W withKmer include k-mers in the shared k-mer table

-s only print commands, do not run them

Detailed description of options:

-h, --help show this help message and exit

-i IPTF input file, default = phylokmer.dat.gz

-t NTHREADS number of threads to use, default = 1

-G MEMSIZE max memory to use (in GB), default = 1

-f COUNTF k-mer diversity file, default = phylokmer.dat.wc

Detailed description of options:

-G MEMSIZE: the total memory allowance in GB. Set at 4G by default.

-h, --help show this help message and exit

-i IPTF tree file to be trimmed

-k KLEN k-mer size used for constructing the input tree

--tip=TIP_FILE tip setting file, default = tip_file_test.txt

-n k-mer filtering was on for tree construction

-f COUNTF k-mer diversity file, default = phylokmer.dat.wc

Detailed description of options:

-k KLEN: the k that was used to construct the input tree.

-k KLEN k-mer length, default = 25

-t NTHREADS number of threads to use, default = 1

-f SEQFORMAT format of input files, FA|FQ, default = FA

-o OUTFILE k-mer table name, default = phylokmer.dat.gz

-d DATADIR directory containing the data, default = data/

-G MEMSIZE total memory limit (in GB), default = 4

--S1=STAGE1 number of resampling of the reads, default = 0

--S2=STAGE2 number of resampling of each total kmer table, default = 0

-s only print commands, do not run them

Detailed description of options:

-d DATADIR: directory containing the data.

Tutorial with dummy dataset

5) Constructing the phylogenetic tree

6) Tip correction (optional)

7) Non-parametric bootstrap (optional)

8) Parametric bootstrap (optional)

Description of output files

iii. Genome size

c. Tip trimming (optional)

ii. Correction factor

You might also like