Update: the 2020/21 version of this course is available now (MATH38161 lecture notes).

Academic Year 2019/20, Term 1
Department of Mathematics, The University of Manchester

The course starts 23th September 2019 and runs until 10th December 2019. The first computer lab is on 1st October 2019 and the first tutorial on 8th October 2019. Lectures and computer labs are held in the Alan Turing Building (ATB) , tutorials at University Place.

Teaching staff:

Lecturer: Korbinian Strimmer (Office hour: Monday 3:30-4:30, ATB 2.221)
Tutors: Ioanna Nikolopoulou, Konstantin Siroki

Frequently asked questions, feedback and email:

If you have any suggestions, comments, corrections (e.g. typos in notes) etc. you are most welcome to contact the lecturer directly by email. However, please remember that this is a large class (90 students) so please check the MATH38161 FAQ whether your question has already been asked and answered before!

Please note that there will be no personal tuition by email. To get feedback please attend the tutorial sessions and ask questions in person to the tutors and the lecturer. For further questions the lecturer is available at the end of each lecture and during the office hour on Monday afternoon.

This course encourages good practice for work-life balance and mental health. In particular, a study break on weekends is encouraged. Correspondingly, no email will be answered on weekends.

Overview and syllabus:

Prerequisites: This course assumes students are familiar with the foundations of probability, statistical learning (e.g. maximum likelihood, Bayes) and matrix calculus. Furthermore, experience in statistical programming and data analysis using R is expected

For an outline of this course unit see the online module description for MATH38161: Multivariate statistics and machine learning or download the course description as PDF.

In this course strong emphasis is put on computation. All methods introduced and discussed in the lectures will be tried and tested on the computer. In the bi-weekly computer labs we will work in R Studio, using R for statistical data analysis and R Markdown for project reporting. Students are strongly encouraged to install R and the R Studio software on their personal computers.

Dates and location:

The course takes place at the following dates and locations:

Session Time slot (location) Term week
Lectures: Monday 9am-10am (ATB G205) and Tuesday 9am-10am (ATB G107) 1-5 and 7-12
Computer labs: Tuesday 10am-11am (ATB G105) 2, 4, 7, 9, 11
Tutorials: Tuesday 10am-11am (University Place 2.220) 3, 5, 8, 10, 12
Office hour: Monday 3:30pm-4:30pm (ATB 2.221) 1-5, 7-12

There are no lectures / tutorials / labs in term week 6 ("reading week").

Coursework and exam:

There is one take-home coursework worth 20% that consists of data analysis in R and writing of a corresponding statistical report, preferably in R Markdown. The written exam (2 hours) is worth the remaining 80% and is concerned with theory and methods.

Assessment Date Term week
Course work (20%): Announced Tuesday 19 November 2019
Submission Tuesday 3 December 2019, 12 noon
Please hand in the coursework at the ATB receiption!
9 and 11
Written exam (80%): 21 January 2020, 2pm-4pm (2 hours), Renold C16 exam period

The current coursework task will be made available on Blackboard two weeks before the due deadline. The expected amount of time to complete the coursework is 10h.

For revision, the exam question and the coursework tasks of the previous year (2018/19) are available on Blackboard.

Workload:

MATH38161 is a 10 credit module and correspondingly completion of this module requires about 100 hours study time. As a guideline the breakdown of the expected workload (104h) is as follows:

Type Purpose Study hours
Contact time Total: 32h
Lectures (new material) 10 x 2h = 20h
Lectures (revision) 1 x 2h = 2h
Example classes 5 x 1h = 5h
Software lab 5 x 1h = 5h
Self-study Total: 60h
Pre/post lectures work 10 x 2h = 20h
Pre/post tutorials work: 10 x 2h = 20h
Exam revision 20h
Assessment Total: 12h
Coursework 10h
Exam: 2h

Course material:

Course material can be retrieved from Blackboard. This includes i) course notes, ii) the example sheets, iii) the instructions for the computer labs, iv) coursework instructions, and v) previous exam questions and coursework tasks.

See also the corresponding MATH38161 UoM library reading list

In addition, it is essential to study the material further using a text book. The following books are recommend to accompany this module:

  1. Härdle and Simar. 2015. Applied multivariate statistical analysis. 4th edition.
  2. Hastie, Tibshirani and Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer.
  3. James, Witten, Hastie andTibshirani. 2013. An introduction to statistical learning with applications in R. Springer.
  4. Marden, J.I. 2015. Multivariate Statistics: Old School.
  5. Rogers, S. and M. Girolami. 2017. A first course in machine learning (2nd Edition). Chapman and Hall / CRC.

For learning R markdown please study the following references:

  1. The R markdown homepage.
  2. R Studio. 2014. R markdown reference guide.
  3. Shalizi. 2016. Using R markdown for class reports.
  4. Xie, Allaire and Grolemund. 2019. R markdown: the definitive guide.

Further suggested readings to refresh knowledge in statistics, matrices and R programming are:

  1. Dekking et al. 2005. A modern introduction to probability and statistics: understanding why and how. Springer.
  2. Petersen and Pedersen. 2012. The matrix cookbook. TU Denmark.
  3. R Core Team. 2018. An introduction to R. The R Foundation.
  4. Peng. 2016. R programming for data science. Leanpub.

Additional (advanced) reference books for probabilistic machine learning are:

  1. Murphy. 2012. Machine learning: a probabilistic perspective. MIT Press.
  2. Bishop. 2006. Pattern recognition and machine learning. Springer.

Lecture timetable and contents:

There will be 10 weeks of lectures and 1 week of revision. The course is divided into five parts, each of length 2 weeks and dealing with a different area in multivariate statistics and machine learning. Below you can find the topics discussed to facilitate further study:

Term week Lecture (Date) Content
1, 2 1-4
(23 Sept to 1 Oct)
Background in matrix calculus: matrix notation, matrix calculations, eigenvalues, singular values, spectral decomposition, rank, condition etc.; Multivariate random variables and distributions: basic multivariate statistics, multivariate normal distribution and properties, further multivariate distributions (categorical, multinomial, Dirichlet, Wishart); Estimation in large sample and small sample settings: estimation of covariance using likelihood and regularised/shrinkage estimation.
3, 4 5-8
(7 Oct to 15 Oct)
Transformations and dimension reduction: variable transformations, location-scale transformation, corresponding transformation of mean, variance and probability density, coloring transformation, Mahalanobis transformation, whitening transformations (ZCA, PCA, Cholesky and variations), Principle Components Analysis, Canonical Correlation Analysis (CCA).
5, 7 9-12
(21 Oct to 5 Nov)
Unsupervised learning / structure discovery: Algorithmic / heuristic approaches to clustering: K-means, PAM, hierarchical clustering, measuring uncertainty, model-based clustering: Gaussian mixture models, EM algorithm, graphical models.
8, 9 13-16
(11 Nov to 19 Nov)
Supervised learning / prediction and classification: Diagonal, Linear, and Quadratic Discriminant Analysis (DDA, LDA, QDA) and regularised versions for high-dimensional data analysis, crossvalidation, feature selection and variable importance, linear prediction.
10, 11 17-20
(25 Nov to 3 Dec)
Nonlinear and nonparametric models / machine learning models: Anscombe data sets, nonlinear regression (polynomial, splines, loess), decision trees, random forest, overview over further important nonparametric approaches (Gaussian processes, neural networks)
12 21-22
(9 Dec to 10 Dec)
Revision lectures

Corresponding lecture notes are available on Blackboard. The automated lecture capture system is active for this module so all lectures can be revisited online.

Computer labs timetable and contents:

Term week Lab (Date) Topic
2 1
(1 Oct)
Overview over R Studio, introduction to R Markdown, exploring multivariate normal density and estimation of covariances.
4 2
(15 Oct)
Simulation of multivariate normal data, comparison of whitening procedures, PCA analysis and dimension reduction.
7 3
(5 Nov)
Unsupervised learning using K-means, Gaussian mixture model and hierarchical clustering methods.
8 4
(19 Nov)
Supervised learning / classification with QDA and LDA and shrinkage LDA / DDA, cross-validation, comparison with GGMs / hierarchical clustering, constructing efficient high-dimensional classifier, feature selection, conditional independence graph.
10 5
(3 Dec)
DatasauRus dozen data sets, nonlinear regression, random forest, feature selection for wine data.

The material for each computer lab is available on Blackboard.

Tutorials timetable:

Term week Tutorial (Date) Topic
3 1
(8 Oct)
Multivariate random variables
5 2
(22 Oct)
Unsupervised learning
8 3
(12 Nov)
Supervised learning
10 4
(26 Nov)
Nonlinearity
12 5
(10 Dec)
Exam Q & A

The example class sheets are available on Blackboard.