Overview:
This is a list intended to facilitate comparison of open source software for analyzing mass spectrometry data. The list comprises R packages and some other software and contains links to the home pages and a short description of the respective features.
R packages
CRAN (http://cran.r-project.org)
Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Miscellaneous | Authors |
MALDIquant | GPL (≥3) | 1.19 | raw data (mass and intensity); tab, csv, Bruker Daltonics *flex-series format, Ciphergen XML, mzXML, mzML, msd, imzML, Analyze 7.5 and CDF (via MALDIquantForeign) | SNIP, TopHat, Convex Hull, Moving Median | local maxima over SNR*noise (noise estimation by MAD or Friedman's Super Smoother) | intensity transformation and smoothing; total-ion-current/probability-quotient-normalization/median calibration | first landmark peaks are identified that occur in most spectra and subsequently, a warping function is computed for each spectrum by fitting a local regression to the matched reference peaks | trim spectra, monoisotopic peak detection, peak labeling, diverse plots for calibrated mass spectra and peaks, merge technical replicates, peak filtering, intensity matrix creation, create and plot MSI slices | Sebastian Gibb |
baseline | GPLv2 | 1.2 | raw data (2-column matrix) | Asymmetric Least Squares, Fill peaks, Iterative Restricted Least Squares, Low-pass FFT filter, Median window, Modified polynomial fitting, Simultaneous Peak Detection and Baseline Correction, Robust Baseline Estimation, Rolling ball | local maxima | NA | NA | GUI available | Kristian Hovde Liland and Bjørn-Helge Mevik |
enviPick | GPLv3 | 1.3 | mzXML | NA | Data partitioning, EIC clustering, peak extraction by penalizing intensity reversions | NA | NA | Martin Loos | |
MSeasy | GPLv2 | 5.3.3 | ASCII, Agilent *.D; mzXML and CDF via xcms | NA | local maxima (highest point in a moving window of size 7) | NA | unsupervised clustering (partitional and hierarchical algorithms with different distance metrics and link methods) | focus on GC/MS; GUI available; file output for NIST/ARISTO search | Elodie Courtois, Yann Guitton, Florence Nicole |
Spectrino | GPL (≥2) | 2.0 | tab | removal of constant threshold and replacing negative intensities with zero | NA | normalization to 1e6 | binning by rounding to the nearest integer or split the intensity to its proportion to it the two closest integers | trim spectra, average spectra, spectra grouping/rearranging, visualisation, GUI but windows-only, connections to Java, python, ... possible | Teodor Krastev |
Peaks | LGPL | 0.2 | vector of intensities | SNIP | gaussian deconvolution | NA | NA | Miroslav Morhac |
Bioconductor mass spectrometry packages (http://bioconductor.org/packages/release/BiocViews.html#MassSpectrometry)
Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Miscellaneous | Authors |
MassSpecWavelet | LGPL (≥2) | 1.34.0 | vector of intensities | NN; additional Savitzky-Golay Algorithm | continuous wavelet | NA | NA | Pan Du, Warren Kibbe, Simon Lin | |
MSnbase | Artistic-2.0 | 1.18.0 | mgf, netCDF, mzData, mzML, mzXML (mostly via mzR) | NA | local maxima (see MALDIquant) | sum, max, quantiles, vsn | NA | a lot of annotations are possible; methods for cleaning spectra; quantitation (labeled and labelfree), integration of identification data | Laurent Gatto |
PROcess | Artistic-2.0 | 1.46.0 | 2-column matrix (cols: 1st: m/z 2nd: intensities) | local minimum (or user-defined quantile) + LOESS | local maximum (optional smoothing (moving average)) | median of all TIC | intersection graphs | Xiaochun Li | |
TargetSearch | GPL (≥ 2) | 1.26.0 | NetCDF | divide spectrum in subparts, calculate standard deviation, a user-definied percentage above the standard deviations become true signal | smooth spectrum, determine sign changes; using PPC | equalize max peak intensity | NA | GC/MS | Alvaro Cuadros-Inostroza, Jan Lisec, Henning Redestig , Matt Hannah |
xcms | GPL (≥ 2) | 1.46.0 | NetCDF/mzXML/mzData/mzML files | constant threshold; Savitzky-Golay Algorithm or no one (depending on PD method) | centroid base wavelet (for LC/MS); continuous wavelet (using MassSpecWavelet, for MS) | NA | construct a master peak list and align by best match; heuristically clustering; nearest peak | database search possible; write support for mzData and NetCDF; a lot of functions for LC/MS | Colin A. Smith, Ralf Tautenhahn , Steffen Neumann , Paul Benton |
Other R packages (not on CRAN/Bioconductor)
Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
DanteR | GPL (≥2) | 0.2 | Excel, SQLite DB, Access, CSV, tab-delimited text | NA | NA | eigenvalues, linear regression, LOESS, quantile | NA | NA | impute data (k-nearest-neighbor, row means, ANOVA, ...); interactive plots, histogram, QQ, boxplots, 3D plots, Venn diagram, PCA plots; please see full feature list at the DanteR website | Tom Taverner and Ashoka Polpitiya |
MASDA | GPL (≥2) | 0.6 | CSV | PROcess (local minimum + LOESS), LOWESS, Friedman's super smoother, cubic smooth spline | sign change in first derivation | (intensities - offset)/scale [different combinations of mean, median, sd, mad, range etc. for offset and scale] | hierarchical clustering | ANOVA, Kruskal-Wallis | Wouter Meuleman | |
PPC | GPLv2 | 1.02 | CSV | NN | local maximum above noise estimated by Friedman's super smoother | log-transformation + linear transformation (10th percentile becomes 0; 90th, 1) | hierarchical clustering | nearest shrunken centroids (PPC) | Balasubramanian Narasimhan, R. Tibshirani, T. Hastie | |
ProSpect | KI | 0.3.6 | CSV | LOESS, rsmooth | finding regions of interest (significance) | NA | NA | NA | Andreas Quandt, Tan Chuen Seng, Alexander Ploner, Stefano Calza, Yudi Pawitan |
Obsolete R packages
Package | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
caMassClass | GPLv3 | 1.9 | CSV, mzXML | see PROcess | see PROcess, additional: faster (uses C), different use of SNR, no AUC | adjust peak high to min-max: min=0, max=1 for each spectrum; avr-std: mean=0, unit variance; med-mad: median=0, unit median absolute deviation | based on Peakminer algorithm (Virginia Prostate Center), bins peaks with similarly mass (±constant value) | LogitBoost from caTools; lda and qda from MASS; rpart from rpart | Jarek Tuszynski | |
msProcess | GPLv2 | 1.0.6 | Ciphergen XML, vectors of m/z and intensity values | determine local minima and apply one of the following R functions: loess.smooth (default), spline, supsmu, approx, cummin, msSmoothMRD (msProcess, wavelet based) | local maxima; local maxima higher than estimated background (msPeaksSearch); continous wavelets; discrete wavelets | TIC; Standard Normal Variate (SNV) transformation; max intensity or count quantification | hierarchical clustering; cluster by distance (smaller than threshold); vote; mrd (histogram smoothing) (for details see msProcess documentation) | NA | in silico spectrometer; a lot of denoising functions; additional data packages: msBreast, msDilution, msProstate | Lixin Gong, William Constantine, Yu Alex Chen |
pkDACLASS | LGPL | 1.0 | 2-column dataframe | see PROcess | monoisotopic peak detection (poisson-distribution+EM-algorithm) | NA | round non-integers mass to integer and using decimal fraction to weight their intensity | using randomForest | contains some datasets | Juliet Ndukum, Mourad Atlas, Susmita Datta |
rTOFsPRO | GPL (≥2) | 1.4.1 | lists generated by WMBrukerParser | estimate baseline by a linear, exponential or gaussian model; substract a constant value | peak detection on the average spectrum (to use high-precision peak detection you have to contact the authors) | smoothing (moving average) | align peaks against peak list of the average spectrum (to use a global align+binning you have to contact the authors) | NA | everything is controlled by text files => very difficult interface | Dariya Malyarenko, Maureen Tracy, William Cooke |
Package | License | Version | Authors | Miscellaneous |
isopat | GPLv2 | 1.0 | Martin Loos | superseeded by enviPat |
R packages for importing mass spectrometry data files
Package | License | Version | File Formats | Miscellaneous | Authors |
MALDIquantForeign | GPLv3 | 0.12 | tab, csv, Bruker Daltonics *flex series format, Ciphergen XML, mzXML, mzML, msd, imzML, Analyze 7.5, CDF | Sebastian Gibb | |
mzID | GPLv2 | 1.8.0 | mzIdentML | Thomas Lin Pedersen | |
mzR | Artistic-2.0 | 2.4 | mzXML, mzData, mzML, mzIdentML, NetCDF | Bernd Fischer, Steffen Neumann, Laurent Gatto | |
readBrukerFlexData | GPLv3 | 1.8.2 | fid files of Bruker Datlonics *flex series | Sebastian Gibb | |
readMzXmlData | GPLv3 | 2.8.1 | mzXML | Sebastian Gibb |
R packages for Mass Spectrometry Imaging
Package | License | Version | Miscellaneous | Authors |
MALDIquant | GPL (≥3) | 1.19 | Sebastian Gibb | |
Cardinal | Artistic-2.0 | 1.2.0 | Kyle D. Bemis |
R packages for calculation of isotopic pattern/distribution
Package | License | Version | Authors |
enviPat | GPLv2 | 2.0 | Martin Loos, Christian Gerber |
BRAIN | GPLv2 | 1.16 | Piotr Dittwald |
Rdisop | GPL | 1.30 | Anton Pervukhin, Steffen Neumann |
Obsolete R packages for calculation of isotopic pattern/distribution
Package | License | Version | Authors | Miscellaneous |
isopat | GPLv2 | 1.0 | Martin Loos | superseeded by enviPat |
R packages to detect isotopic patterns
Package | License | Version | Miscellaneous | Authors |
nontarget | GPLv2 | 1.7 | also screening for peaks related by different adducts and/or homologue series | Martin Loos |
IPPD | GPL (≥2) | 1.18.0 | Martin Slawski |
R packages for annotation of mass spectrometry data
Package | License | Version | Miscellaneous | Authors |
CAMERA | GPLv2 | 1.26 | annotation of peaklist generated by xcms | Carsten Kuhl, Ralf Tautenhahn, Steffen Neumann |
R packages to handle mass spectrometry libraries
Package | License | Version | Miscellaneous | Authors |
RMassBank | Artistic-2.0 | 1.12.0 | preparation of MS/MS spectra for a MassBank submission | Michael Stravs, Emma Schymanski, Steffen Neumann, Erik Mueller |
Non-R tools
Application | Programming Language | Operating Systems | License | Version | Input Data | Baseline Correction | Peak Detection | Normalization | Peak Alignment | Classification | Miscellaneous | Authors |
eMZed | Python | L, M, W | GPLv3 | 2.22.2 | LC-MS development framework based on OpenMS/TOPP and xcms | Patrick Kiefer and Uwe Schmitt | ||||||
Mass-Up | Java/R | L, W | GPL | 1.0.7 | CSV, mzML, mzXML | see MALDIquant | see MALDIquant and see MassSpecWavelet | see MALDIquant | Forward algorithm or see MALDIquant | PCA, SVM | various tools and plots for Quality Control, Biomarker Discovery; Hierachical Clustering, Biclustering | Florentino Fdez-Riverola, Daniel Glez-Peña, Miguel Reboiro-Jato, José Luís Capelo-Martínez, Hugo López-Fernández |
massXpert | C++ | L, M, W | GPLv3 | 6.0.2 | various tools for polymer editing/simulation/calculation; please visit: http://msxpertsuite.org/wiki/pmwiki.php/Main/Massxpert | Filippo Rusconi | ||||||
mineXpert2 | C++ | L, M, W | GPLv3 | 6.0.2 | full-depth data visualization and mining of MS^n mass spectrometric data; please visit: http://msxpertsuite.org/wiki/pmwiki.php/Main/Minexpert2 | Filippo Rusconi | ||||||
mMass | Python | L, M, W | GPLv3 | 5.5.0 | mzData, mzXML, mzML, ASCII, CSV, fid (Bruker Daltonics' compassXport has to be installed (W only)) | median of all intensities minus median of absolute deviations (additional you can add a relative offset and smooth the baseline); gaussian smoothing | local maximum above (relative and absolute) intensity threshold | intensity*1/max_intensity, (range: 0-1) | NA | NA | deisotoping function, connections to a lot of protein databases, batch processing, please see also: complete feature list | Martin Strohalm |
MZmine2 | Java | L, M, W | GPLv2 | Matej Orešič et al (full list) | ||||||||
OpenChrom | Java | L, M, W | EPL | 1.1.0 | NetCDF, mzXML, CSV, D (Agilent Technologies), Bruker Daltonics *flex-series format, own file format *.chrom | moving minimum, SNIP | zero of first derivation of TIC signal | NA | NA | NA | batch processing; smoothing filter: Savitzky-Golay; extendable by plugins; database based identification possible (as plugin, NIST-DB); | Philip Wenig |
OpenMS/TOPP | C++ | L, M, W | LGPL | Knut Reinert, Oliver Kohlbacher, Andreas Hildebrandt and many others (full list) | ||||||||
ProteoWizard | C++ | L, M, W | Apache v2 | 3.0 | mzXML, mzML, mzIdentML, ... (only on Windows a lot of different vendor specific raw formats: AB Sciex, Aglient, Bruker, Thermo, Waters) | framework for rapid development of data analysis tools; supports various methods for accessing metadata, plotting,smoothing, peak peaking, etc. | Robert Burke, Matt Chambers, Brendan MacLean and many others (full list) |
Non-R tools for importing mass spectrometry data files
Application | License | Version | File Formats | Miscellaneous | Authors |
pymzML | LGPL | 0.7.6 | mzML | Python 2.6.5/Python 3 | Till Bald, Johannes Barth, Anna Niehues, Michael Specht, Michael Hippler, Christian Fufezan |
Abbreviations:
AUC | area under the curve |
BC | baseline correction |
DN | denoising |
PA | peak alignment |
PD | peak detection |
SNR | signal to noise ratio |
TIC | total ion current/total ion count |
GC/MS | gas chromatography/mass spectrometry |
LC/MS | liquid chromatography/mass spectrometry |
MS | mass spectrometry |
IMS | imaging mass spectrometry |
NA | not available |
NN | not needed |
L | Linux |
M | Mac OS X |
W | Microsoft Windows |