Back to index

Overview:

This is a list intended to facilitate comparison of open source software for analyzing mass spectrometry data. The list comprises R packages and some other software and contains links to the home pages and a short description of the respective features.

R packages

CRAN (http://cran.r-project.org)

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Miscellaneous Authors
MALDIquant GPL (≥3) 1.19 raw data (mass and intensity); tab, csv, Bruker Daltonics *flex-series format, Ciphergen XML, mzXML, mzML, msd, imzML, Analyze 7.5 and CDF (via MALDIquantForeign) SNIP, TopHat, Convex Hull, Moving Median local maxima over SNR*noise (noise estimation by MAD or Friedman's Super Smoother) intensity transformation and smoothing; total-ion-current/probability-quotient-normalization/median calibration first landmark peaks are identified that occur in most spectra and subsequently, a warping function is computed for each spectrum by fitting a local regression to the matched reference peaks trim spectra, monoisotopic peak detection, peak labeling, diverse plots for calibrated mass spectra and peaks, merge technical replicates, peak filtering, intensity matrix creation, create and plot MSI slices Sebastian Gibb
baseline GPLv2 1.2 raw data (2-column matrix) Asymmetric Least Squares, Fill peaks, Iterative Restricted Least Squares, Low-pass FFT filter, Median window, Modified polynomial fitting, Simultaneous Peak Detection and Baseline Correction, Robust Baseline Estimation, Rolling ball local maxima NA NA GUI available Kristian Hovde Liland and Bjørn-Helge Mevik
enviPick GPLv3 1.3 mzXML NA Data partitioning, EIC clustering, peak extraction by penalizing intensity reversions NA NA Martin Loos
MSeasy GPLv2 5.3.3 ASCII, Agilent *.D; mzXML and CDF via xcms NA local maxima (highest point in a moving window of size 7) NA unsupervised clustering (partitional and hierarchical algorithms with different distance metrics and link methods) focus on GC/MS; GUI available; file output for NIST/ARISTO search Elodie Courtois, Yann Guitton, Florence Nicole
Spectrino GPL (≥2) 2.0 tab removal of constant threshold and replacing negative intensities with zero NA normalization to 1e6 binning by rounding to the nearest integer or split the intensity to its proportion to it the two closest integers trim spectra, average spectra, spectra grouping/rearranging, visualisation, GUI but windows-only, connections to Java, python, ... possible Teodor Krastev
Peaks LGPL 0.2 vector of intensities SNIP gaussian deconvolution NA NA Miroslav Morhac

Bioconductor mass spectrometry packages (http://bioconductor.org/packages/release/BiocViews.html#MassSpectrometry)

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Miscellaneous Authors
MassSpecWavelet LGPL (≥2) 1.34.0 vector of intensities NN; additional Savitzky-Golay Algorithm continuous wavelet NA NA Pan Du, Warren Kibbe, Simon Lin
MSnbase Artistic-2.0 1.18.0 mgf, netCDF, mzData, mzML, mzXML (mostly via mzR) NA local maxima (see MALDIquant) sum, max, quantiles, vsn NA a lot of annotations are possible; methods for cleaning spectra; quantitation (labeled and labelfree), integration of identification data Laurent Gatto
PROcess Artistic-2.0 1.46.0 2-column matrix (cols: 1st: m/z 2nd: intensities) local minimum (or user-defined quantile) + LOESS local maximum (optional smoothing (moving average)) median of all TIC intersection graphs Xiaochun Li
TargetSearch GPL (≥ 2) 1.26.0 NetCDF divide spectrum in subparts, calculate standard deviation, a user-definied percentage above the standard deviations become true signal smooth spectrum, determine sign changes; using PPC equalize max peak intensity NA GC/MS Alvaro Cuadros-Inostroza, Jan Lisec, Henning Redestig , Matt Hannah
xcms GPL (≥ 2) 1.46.0 NetCDF/mzXML/mzData/mzML files constant threshold; Savitzky-Golay Algorithm or no one (depending on PD method) centroid base wavelet (for LC/MS); continuous wavelet (using MassSpecWavelet, for MS) NA construct a master peak list and align by best match; heuristically clustering; nearest peak database search possible; write support for mzData and NetCDF; a lot of functions for LC/MS Colin A. Smith, Ralf Tautenhahn , Steffen Neumann , Paul Benton

Other R packages (not on CRAN/Bioconductor)

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
DanteR GPL (≥2) 0.2 Excel, SQLite DB, Access, CSV, tab-delimited text NA NA eigenvalues, linear regression, LOESS, quantile NA NA impute data (k-nearest-neighbor, row means, ANOVA, ...); interactive plots, histogram, QQ, boxplots, 3D plots, Venn diagram, PCA plots; please see full feature list at the DanteR website Tom Taverner and Ashoka Polpitiya
MASDA GPL (≥2) 0.6 CSV PROcess (local minimum + LOESS), LOWESS, Friedman's super smoother, cubic smooth spline sign change in first derivation (intensities - offset)/scale [different combinations of mean, median, sd, mad, range etc. for offset and scale] hierarchical clustering ANOVA, Kruskal-Wallis Wouter Meuleman
PPC GPLv2 1.02 CSV NN local maximum above noise estimated by Friedman's super smoother log-transformation + linear transformation (10th percentile becomes 0; 90th, 1) hierarchical clustering nearest shrunken centroids (PPC) Balasubramanian Narasimhan, R. Tibshirani, T. Hastie
ProSpect KI 0.3.6 CSV LOESS, rsmooth finding regions of interest (significance) NA NA NA Andreas Quandt, Tan Chuen Seng, Alexander Ploner, Stefano Calza, Yudi Pawitan

Obsolete R packages

Package License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
caMassClass GPLv3 1.9 CSV, mzXML see PROcess see PROcess, additional: faster (uses C), different use of SNR, no AUC adjust peak high to min-max: min=0, max=1 for each spectrum; avr-std: mean=0, unit variance; med-mad: median=0, unit median absolute deviation based on Peakminer algorithm (Virginia Prostate Center), bins peaks with similarly mass (±constant value) LogitBoost from caTools; lda and qda from MASS; rpart from rpart Jarek Tuszynski
msProcess GPLv2 1.0.6 Ciphergen XML, vectors of m/z and intensity values determine local minima and apply one of the following R functions: loess.smooth (default), spline, supsmu, approx, cummin, msSmoothMRD (msProcess, wavelet based) local maxima; local maxima higher than estimated background (msPeaksSearch); continous wavelets; discrete wavelets TIC; Standard Normal Variate (SNV) transformation; max intensity or count quantification hierarchical clustering; cluster by distance (smaller than threshold); vote; mrd (histogram smoothing) (for details see msProcess documentation) NA in silico spectrometer; a lot of denoising functions; additional data packages: msBreast, msDilution, msProstate Lixin Gong, William Constantine, Yu Alex Chen
pkDACLASS LGPL 1.0 2-column dataframe see PROcess monoisotopic peak detection (poisson-distribution+EM-algorithm) NA round non-integers mass to integer and using decimal fraction to weight their intensity using randomForest contains some datasets Juliet Ndukum, Mourad Atlas, Susmita Datta
rTOFsPRO GPL (≥2) 1.4.1 lists generated by WMBrukerParser estimate baseline by a linear, exponential or gaussian model; substract a constant value peak detection on the average spectrum (to use high-precision peak detection you have to contact the authors) smoothing (moving average) align peaks against peak list of the average spectrum (to use a global align+binning you have to contact the authors) NA everything is controlled by text files => very difficult interface Dariya Malyarenko, Maureen Tracy, William Cooke
Package License Version Authors Miscellaneous
isopat GPLv2 1.0 Martin Loos superseeded by enviPat

R packages for importing mass spectrometry data files

Package License Version File Formats Miscellaneous Authors
MALDIquantForeign GPLv3 0.12 tab, csv, Bruker Daltonics *flex series format, Ciphergen XML, mzXML, mzML, msd, imzML, Analyze 7.5, CDF Sebastian Gibb
mzID GPLv2 1.8.0 mzIdentML Thomas Lin Pedersen
mzR Artistic-2.0 2.4 mzXML, mzData, mzML, mzIdentML, NetCDF Bernd Fischer, Steffen Neumann, Laurent Gatto
readBrukerFlexData GPLv3 1.8.2 fid files of Bruker Datlonics *flex series Sebastian Gibb
readMzXmlData GPLv3 2.8.1 mzXML Sebastian Gibb

R packages for Mass Spectrometry Imaging

Package License Version Miscellaneous Authors
MALDIquant GPL (≥3) 1.19 Sebastian Gibb
Cardinal Artistic-2.0 1.2.0 Kyle D. Bemis

R packages for calculation of isotopic pattern/distribution

Package License Version Authors
enviPat GPLv2 2.0 Martin Loos, Christian Gerber
BRAIN GPLv2 1.16 Piotr Dittwald
Rdisop GPL 1.30 Anton Pervukhin, Steffen Neumann

Obsolete R packages for calculation of isotopic pattern/distribution

Package License Version Authors Miscellaneous
isopat GPLv2 1.0 Martin Loos superseeded by enviPat

R packages to detect isotopic patterns

Package License Version Miscellaneous Authors
nontarget GPLv2 1.7 also screening for peaks related by different adducts and/or homologue series Martin Loos
IPPD GPL (≥2) 1.18.0 Martin Slawski

R packages for annotation of mass spectrometry data

Package License Version Miscellaneous Authors
CAMERA GPLv2 1.26 annotation of peaklist generated by xcms Carsten Kuhl, Ralf Tautenhahn, Steffen Neumann

R packages to handle mass spectrometry libraries

Package License Version Miscellaneous Authors
RMassBank Artistic-2.0 1.12.0 preparation of MS/MS spectra for a MassBank submission Michael Stravs, Emma Schymanski, Steffen Neumann, Erik Mueller

Non-R tools

Application Programming Language Operating Systems License Version Input Data Baseline Correction Peak Detection Normalization Peak Alignment Classification Miscellaneous Authors
eMZed Python L, M, W GPLv3 2.22.2 LC-MS development framework based on OpenMS/TOPP and xcms Patrick Kiefer and Uwe Schmitt
Mass-Up Java/R L, W GPL 1.0.7 CSV, mzML, mzXML see MALDIquant see MALDIquant and see MassSpecWavelet see MALDIquant Forward algorithm or see MALDIquant PCA, SVM various tools and plots for Quality Control, Biomarker Discovery; Hierachical Clustering, Biclustering Florentino Fdez-Riverola, Daniel Glez-Peña, Miguel Reboiro-Jato, José Luís Capelo-Martínez, Hugo López-Fernández
massXpert C++ L, M, W GPLv3 6.0.2 various tools for polymer editing/simulation/calculation; please visit: http://msxpertsuite.org/wiki/pmwiki.php/Main/Massxpert Filippo Rusconi
mineXpert2 C++ L, M, W GPLv3 6.0.2 full-depth data visualization and mining of MS^n mass spectrometric data; please visit: http://msxpertsuite.org/wiki/pmwiki.php/Main/Minexpert2 Filippo Rusconi
mMass Python L, M, W GPLv3 5.5.0 mzData, mzXML, mzML, ASCII, CSV, fid (Bruker Daltonics' compassXport has to be installed (W only)) median of all intensities minus median of absolute deviations (additional you can add a relative offset and smooth the baseline); gaussian smoothing local maximum above (relative and absolute) intensity threshold intensity*1/max_intensity, (range: 0-1) NA NA deisotoping function, connections to a lot of protein databases, batch processing, please see also: complete feature list Martin Strohalm
MZmine2 Java L, M, W GPLv2 Matej Orešič et al (full list)
OpenChrom Java L, M, W EPL 1.1.0 NetCDF, mzXML, CSV, D (Agilent Technologies), Bruker Daltonics *flex-series format, own file format *.chrom moving minimum, SNIP zero of first derivation of TIC signal NA NA NA batch processing; smoothing filter: Savitzky-Golay; extendable by plugins; database based identification possible (as plugin, NIST-DB); Philip Wenig
OpenMS/TOPP C++ L, M, W LGPL Knut Reinert, Oliver Kohlbacher, Andreas Hildebrandt and many others (full list)
ProteoWizard C++ L, M, W Apache v2 3.0 mzXML, mzML, mzIdentML, ... (only on Windows a lot of different vendor specific raw formats: AB Sciex, Aglient, Bruker, Thermo, Waters) framework for rapid development of data analysis tools; supports various methods for accessing metadata, plotting,smoothing, peak peaking, etc. Robert Burke, Matt Chambers, Brendan MacLean and many others (full list)

Non-R tools for importing mass spectrometry data files

Application License Version File Formats Miscellaneous Authors
pymzML LGPL 0.7.6 mzML Python 2.6.5/Python 3 Till Bald, Johannes Barth, Anna Niehues, Michael Specht, Michael Hippler, Christian Fufezan

Abbreviations:

AUCarea under the curve
BCbaseline correction
DNdenoising
PApeak alignment
PDpeak detection
SNRsignal to noise ratio
TICtotal ion current/total ion count
GC/MSgas chromatography/mass spectrometry
LC/MSliquid chromatography/mass spectrometry
MSmass spectrometry
IMSimaging mass spectrometry
NAnot available
NNnot needed
LLinux
MMac OS X
WMicrosoft Windows