Vanilla 1.2 (April 15, 2001)
Copyright (c) 1999-2001 Korbinian Strimmer
This package may be distributed under the terms of the
GNU
General Public License
Vanilla is a collection of command line programs written in Java to access some selected features in PAL, a Java library for molecular evolution and phylogenetics (see http://www.cebl.auckland.ac.nz/pal-project/ for details on PAL). Note that Vanilla does not provide all functionality available in PAL.
In fact, for the vast majority of things you still need to program yourself! In that respect, this package is provided in the hope that it may serve as a useful guide and introduction to learn how to use PAL in your own programs.
Vanilla 1.2 is only a minor update from Vanilla 1.1 to accommodate the changes from PAL 1.1 to PAL 1.2. It runs best in a shell environment (Unix, Windows, MacOS X) and requires Java 1.1 or better (see Download and Installation section).
Vanilla includes programs to compute maximum-likelihood distances for nucleotides and amino acid data, to estimate of maximum-likelihood branch lengths on trees (incl. clock trees and dated tips), for statistical (e.g., Shimodaira-Hasegawa) and topological (Robinson-Foulds) comparison of trees, to infer demographic parameters from trees (based on the coalescent), and also utility programs to reformat and modify alignments.
If you wish to cite this package please use a phrase like "the Vanilla 1.2 frontend to PAL 1.2", or similar, and refer to
Vanilla applications offer plain and simple text-only interfaces and are intended to be run in a shell environment. Be warned - there is no graphical user interface, Vanilla is definitely not Mac/Windows-like.
Applications with a PHYLIP-style character user interface (self-explanatory menus):
GENERAL OPTIONS d Sequence data type? Nucleotides e Estimate model parameters? No k Compute observed or ML distances? ML distances SUBSTITUTION MODEL m Model of substitution? F84 (Felsenstein 1984, PHYLIP) t PHYLIP Ts/Tv parameter? 1.1 (expected Ts/Tv ratio) f Nucleotide frequencies? As observed in data set w Model of rate heterogeneity? Uniform rate Quit [q], confirm [y], or change [menu] settings:Computes pairwise maximum-likelihood distance matrices for nucleotide, amino acid, and two-state data under a variety of substitution models (GTR,TN, HKY, F84, F81, JC; Dayhoff, JTT, MTREV24, BLOSUM, VT, WAG, CPREV) and in incorporating rate variation over sites (uniform rate or discrete Gamma model). MLDIST also computes observed distances and can obtain approximate estimates of unknown model parameters, e.g, the Ts/Tv ratio. The default file name for the sequences is "indata" (PHYLIP or CLUSTAL W format).
GENERAL OPTIONS d Sequence data type? Nucleotides e Estimate model parameters? No o Optimize branch lengths? Yes b Branch lengths constraints? Clock-like tree with dated tips i Write out site log-likelihoods? No x Tree comparison test? None SUBSTITUTION MODEL m Model of substitution? F84 (Felsenstein 1984, PHYLIP) t PHYLIP Ts/Tv parameter? 2.0 (expected Ts/Tv ratio) f Nucleotide frequencies? As observed in data set w Model of rate heterogeneity? Uniform rate Quit [q], confirm [y], or change [menu] settings:Computes likelihoods of trees (in "intree") under a variety of substitution models (see MLDIST). The likelihood can be computed using the user-provided branch lengths or, alternatively, branch lengths can be optimized as well (for unconstrained trees, clock trees, and clock trees with dated tips). If more than one tree is present in an "intree" file the trees can be statistically compared (using the Kishino-Hasegawa, Shimadoira-Hasegawa, and expected Akaike weights). Note, however, that MLTREE presently does not do any tree searches.
GENERAL OPTIONS d Sequence data type? Nucleotides n Number of output data sets? 1 s Number of sites? 1000 SUBSTITUTION MODEL m Model of substitution? F84 (Felsenstein 1984, PHYLIP) t PHYLIP Ts/Tv parameter? 1.1 (expected Ts/Tv ratio) f Nucleotide frequencies? Not yet specified w Model of rate heterogeneity? Uniform rate Quit [q], confirm [y], or change [menu] settings:Generates artificial data sets for a tree (in "intree") and a model of sequence evolution (available models are the same as in MLTREE and MLDIST).
GENERAL OPTIONS u Reconstruction method? Least-squares (on user trees) o Optimise branch lengths? Yes b Branch lengths constraints? None l Chi-square distance? Weighted (Fitch-Margoliash 1967) Quit [q], confirm [y], or change [menu] settings:Tree methods dealing with distance matrices ("indist"). If user specified tree topologies (in "intree") are available, weighted and unweighted LS branch lengths (for unconstrained trees, clock trees, and clock trees with dated tips) can be estimated. In addition, NJ and UPGMA trees can be computed from the distance matrix.
GENERAL OPTIONS v Sequence output format? PHYLIP Interleaved g Drop gaps from input data? No z Constant/non-informative sites? Keep sites j Jumble sequence order? No h Bootstrap sites? No n Number of output data sets? 1 Quit [q], confirm [y], or change [menu] settings:Reads a data set (in "indata") and reformats it into various formats (PHYLIP Sequential, Interleaved, Plain, Clustal W). Also removes sites with gaps, removes constant and non-informative sites, jumbles sequence order, and produces bootstrapped data sets. Applications with a command line interface:
To get a list of available options simply call the program without any parameter. On a Macintosh you will be asked to enter the command line parameters.
Usage: treecomp intree1 intree2Reads two trees, each from a different file, and computes the Robinson Foulds score (topological distance measure based on splits). If the two trees are identical a score of zero is returned.
Usage: demograph intree [epsilon] outfileEstimates demographic parameters (effective population size, growth rate) by maximizing the coalescent prior for a single given clocklike tree. Also computes the classic skyline plot, a non-parametric estimate of population size through time (O. G. Pybus et al. 2000. Genetics 155:1429-1437), and its extension, the generalized skyline plot (K. Strimmer and O. G. Pybus. 2001. Submitted). Epsilon is the smoothing parameter of the generalized skyline plot.
Usage: concat infile1 infile2 .. outfileConcatenates a set of sequence alignments.
Usage: showtree treefile [outfile]Reads treefile (NH format), generates ASCII picture, and computes distance matrix induced by branch length. If no outfile is specified the ASCII picture of the tree is printed to stdout.
Usage: reroot node treefile outfileReads a treefile (NH format) and reroots the tree, making node the new root.
Usage: splitcodon infile outfileSeparate 1st, 2nd, 3rd codon positions of an alignment into separate file.
Hint: In many shell environments typing "v_" followed by a tab will
provide you with a list of all Vanilla applications.
Vanilla requires Java 1.1 (or better). It is strongly recommended to install a Java Virtual Machine with a JIT ("just in time") compiler. On Linux installation of Sun Java 1.3 for Linux is recommend because it includes a very fast runtime system (based on a so-called hot-spot compiler). On the Macintosh please install a recent version of the Macintosh Runtime for Java (MRJ). However, probably the best way to run Vanilla on a Mac is to install MacOS X and then install the UNIX version.
This version of Vanilla relies on PAL 1.2. PAL is (c) 1999-2001 by the PAL Core Development Team and is licensed under the GNU General Public License. Sources and documentation for PAL are separately available from the PAL web page at http://www.cebl.auckland.ac.nz/pal-project/.
Note that a Java compiler and download of PAL sources is not necessary as the Vanilla package already contains all necessary classes in precompiled form.
Vanilla 1.2 is distributed in two variants (for convenience only, functionality and sources are 100% identical across platforms):
Due to the arrival of Mac OS X (=Unix), a separate Macintosh version is no longer supported.
After uncompressing using appropriate tools the following simple directory structure will be created on all platforms:
vanilla-1.2 _______|______ _____ ______ __________ | | | | | bin data doc classes src ____|____ | | | | vanilla.zip unix win mac pal.zipThe "bin" folder contains the Vanilla "binaries", i.e. shell scripts (Unix), batch files (Windows) or JBindary files (Macintosh, no longer supported). The "data" directory contains some example data sets. The "doc" folder contains this page and some other documentation. The "classes" folder contains the precompiled Java class files for Vanilla and for PAL, and the "src" folder contains the Vanilla source code.
To be able to run the Vanilla programs the system needs to be told where the binaries and the Java bytecode sit on the hard drive. On Unix and Windows this is done by setting the PATH and CLASSPATH variables appropriately, on a Mac the Java classes need to be put in a special directory. In doubt please ask your local Unix/Windows/Mac expert for help:
%sh, bash
VDIR=$HOME/vanilla-1.2
export PATH=$VDIR/bin/unix:$PATH
export CLASSPATH=$VDIR/classes/vanilla.zip:$VDIR/classes/pal.zip:$CLASSPATH
Vanilla can also be compiled into native code (as can PAL) using the GNU gcj compiler, makefiles are included with the distributions of both Vanilla and PAL.
I thank Oliver Pybus and Andrew Rambaut for discussion. The name Vanilla was suggested
by Alexei Drummond. The prefixing trick (application names) is due to Catherine
Letondal. This work is supported by an Emmy-Noether-Fellowship of the DFG
to K.S.