Statistical Analysis of Gene Expression Data:
An Introduction

Summer 2002

Seminar course at the Department of Statistics, University of Munich

Organised by the Statistics and Computational Biology Group


Contact: Korbinian Strimmer
Ludwig Fahrmeir

To register please send email to

Time: Every Monday at 17:15 during summer term (exception: Friday 17.5) 2002.
Place: Seminar room, 1st floor, Ludwigstr. 33.
"Schein": Requires full attendance and the presentation of a talk.



The seminar is intended as an introduction to the analysis of gene expression data. We will survey the most important methodology and review some influential applications of this new biotechnology. The course is organized in 12 lectures, each of which deals with a specified topic in gene expression analysis. Please see below for the reading list - each article has a link and can be downloaded from the net (you can also obtain a paper copy from the organizers).

Talks can be in German or English. All participants are encouraged to read all the papers relevant for each session - we want to discuss the papers together!


Time Table:

At the first meeting (15.4.2002) we will discuss the overall organization and fix volunteer speakers for the following 12 sessions. All participants are encouraged to present a talk. On Monday 3.6.2002 we welcome Dr. Christian Gieger from IBM Lion Bioscience AG as a guest speaker.

Day Topic Speaker
15.4.2002 Overview and Introduction Korbinian Strimmer
22.4.2002 Biology and Technology (Lecture 1) Florian Burckhardt
29.4.2002 Expression Indices (Lecture 2) Korbinian Strimmer
06.5.2002 Normalisation (Lecture 3) Jan Wolfertz
13.5.2000 Differential Expression I (Lecture 4) Rainer Opgen-Rhein
*17.5.2000 Differential Expression II (Lecture 5) Gangolf Jobb
27.5.2002 Differential Expression III (Lecture 6) Roland Wolf
03.6.2002 Activities at IBM LION Bioscience AG Christian Gieger
(invited guest)
10.6.2002 ANOVA (Lecture 7) Samson Adebayo
17.6.2002 Clustering I (Lecture 9) Astrid Zierer
24.6.2002 Clustering II (Lecture 10) Korbinian Strimmer
01.7.2002 Data and Dimension reduction (Lecture 8) Martina Messow
08.7.2002 Classification I (Lecture 11) Anne-Laure Socher-Boulesteix
15.7.2002 Classification II (Lecture 12) Ludwig Fahrmeir

 * on Friday, 10am, same place (Monday, 20 May is bank holiday!)


Reading list:

These are the papers we plan to discuss in each session:

  1. Biology and Technology
  2. Expression Indices
    • Li, C., and W.H. Wong. 2001. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. PNAS 98:31-36.
    • Naef, F., D.A. Lim, N. patil, and M.O. Magnasco. 2001. From features to expression: high-density oligonucleotide array analysis revisited. Preprint.
    • Irizarry, R.A., B. Hobbs, F. Collin, Y.S. Beazer-Barclay, K.J. Antonellis, U. Scherf, and T.P. Speed. 2002. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Preprint.
  3. Normalisation
  4. Differential Expression I
    • Chen, Y., E.D. Dougherty, and M.L. Bittner. 1997. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics 2:364-374.
    • Newton, M.A., C.M. Kendziorski, C.S. Richmond, F.R. Blattner, and K.W. Tsui. 2001. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comp. Biol. 8:37-52.
    • Ting Lee, M.L., F.C. Kuo, G.A. Whitmore, and J. Sklar. 2000. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. PNAS 97:9834-9839.
  5. Differential Expression II
    • Dudoit, S., Y.-H- Yang, M.C. Callow, and T.P. Speed. 2002. Statistical methods for identifying differentially expressed genes in replicated cNDA microarray experiments. Statistika Sinica 12(1). (preprint)
    • Goss Tusher, V., R. Tibshirani, and G. Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98:5116-5121.
    • Lönnstedt, I., and T.P. Speed. 2002. Replicated microarray data. Statistika Sinica 12(1). (preprint)
  6. Differential Expression III
    • Efron, B., R. Tibshirani, J.D. Storey, and V. Tusher. 2001. Empirical Bayes analysis of a microarray experiment. JASA 96:1151-1160.
    • Efron, B., J.D. Storey, and R. Tibshirani. 2001. Microarrays empirical Bayes methods, and false discovery rates. Technical Report 2001-23B/217 (Dept. of Statistics, Stanford).
    • Efron, B. 2001. Robbins, empirical Bayes, and microarrays. Technical Report 2001-30B/219 (Dept. of Statistics, Stanford).
  7. ANOVA
    • Kerr, M.K, M. Martin, and G.A. Churchill. 2000. Analysis of variance for gene expression microarray data. J. Comp. Biol. 7:819-837. (preprint)
    • Kerr, M.K, and G.A. Churchill. 2001. Statistical design and the analysis of gene expression microarray data. Genet. Res. Camb. 77:123-128.
    • Kerr, M.K., C.A. Afshari, L. Bennet, P. Bushel, J. Martinez, N.J. Walker, and G.A. Churchill. 2002. Statistika Sinica 12(1). (preprint)
  8. Clustering I
    • Tamayo, P., D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. PNAS 96:2907-2912.
    • Yeung, K.Y., C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo. 2001. Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977-987.
    • Fraley, C., and A.E. Raftery. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41:578-588.
  9. Clustering II
    • Eisen, M.B., P.T. Spellman, P.O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. PNAS 95:14863-14868.
    • Herrero, J., A. Valencia, and J. Dopazo. 2001. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126-136.
    • Kerr, M.K., and G.A. Churchill. 2001. Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98:8961-8965.
  10. Data and Dimension Reduction
    • Alter, O., P.O. Brown, and D. Botstein. 2000. Singular value decomposition for genome-wide expression data processing and modelling. PNAS97:10101-10106
    • Yeung, K.Y., and W.L. Ruzzo. 2001. Principal component analysis for clustering gene expression data. Bioinformatics 17:763-774.
    • Hastie, T., R. Tibshirani, M.B. Eisen, A. Alizadeh, R. Levy, L. Staudt, W.C. Chan, D. Botstein, and P. Brown. 2000. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1(2):research0003.-0003.21
  11. Classification I
    • Golub, T.R., D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-537.
    • Slonim, D.K., P. Tamayo, J.P. Mesirov, T.R. Golub, and E.S. Lander. 2000. Class prediction and discovery using gene expression data. Proceedings RECOMB IV:263-271. (reprint)
    • Yeang, C.-H., S. Ramaswamy, P. Tamayo, S. Mukherjee, R.M. Rifkin, M. Angelo, M. Reich, E.S. Lander, J. Mesirov, and T. Golub. 2001. Molecular classification of multiple tumor types. Bioinformatics 17:S316-S322.
  12. Classification II

Most of these papers are downloadable from within the network of the University of Munich. (You may need to configure your browser properly - see the proxy guide at the LRZ). If you have problems or prefer to copy an article please contact us.


More references and links:

We have only time to discuss a few selected papers - other interesting articles dealing with microarray analysis are listed at the following places:

Gene expression analysis workshops (with Bioconductor)



An overview over the large number of available software for gene expression analysis is attempted, e.g., here (Stanford) and here (Y.F. Leung). Note that many of these programs are commercial.

Fortunately, most of the state-of-the-art methods for gene expression analysis are also available (and sometimes even exclusively so!) in the form of packages for the R system. R is an open-source clone of S-Plus, and is a widely used and very versatile statistics package. R is freely available for Windows, Macintosh and Linux/Unix systems.

Hence, it is strongly recommended to analyze gene expression data with R. We have compiled a fairly exhaustive list of R packages for gene expression analysis. One of these packages deserves particular mention: the BioConductor project (Harvard) unifies and merges several previously independent R packages and provides tools for the analysis of both cDNA and Affymetrix arrays.

Last modified:
May 28, 2002