Update: the 2020/21 version of this course is available now (MATH38161 lecture notes).
Academic Year 2019/20, Term 1
Department of Mathematics, The University of Manchester
Teaching staff:
Lecturer: Korbinian Strimmer (Office hour: Monday 3:30-4:30, ATB 2.221)
Tutors: Ioanna Nikolopoulou,
Konstantin Siroki
Frequently asked questions, feedback and email:
If you have any suggestions, comments, corrections (e.g. typos in notes) etc. you are most welcome to contact the lecturer directly by email. However, please remember that this is a large class (90 students) so please check the MATH38161 FAQ whether your question has already been asked and answered before!
Please note that there will be no personal tuition by email. To get feedback please attend the tutorial sessions and ask questions in person to the tutors and the lecturer. For further questions the lecturer is available at the end of each lecture and during the office hour on Monday afternoon.
This course encourages good practice for work-life balance and mental health. In particular, a study break on weekends is encouraged. Correspondingly, no email will be answered on weekends.
Overview and syllabus:
Prerequisites: This course assumes students are familiar with the foundations of probability, statistical learning (e.g. maximum likelihood, Bayes) and matrix calculus. Furthermore, experience in statistical programming and data analysis using R is expectedFor an outline of this course unit see the online module description for MATH38161: Multivariate statistics and machine learning or download the course description as PDF.
In this course strong emphasis is put on computation. All methods introduced and discussed in the lectures will be tried and tested on the computer. In the bi-weekly computer labs we will work in R Studio, using R for statistical data analysis and R Markdown for project reporting. Students are strongly encouraged to install R and the R Studio software on their personal computers.
Dates and location:
The course takes place at the following dates and locations:
Session | Time slot (location) | Term week |
---|---|---|
Lectures: | Monday 9am-10am (ATB G205) and Tuesday 9am-10am (ATB G107) | 1-5 and 7-12 |
Computer labs: | Tuesday 10am-11am (ATB G105) | 2, 4, 7, 9, 11 |
Tutorials: | Tuesday 10am-11am (University Place 2.220) | 3, 5, 8, 10, 12 |
Office hour: | Monday 3:30pm-4:30pm (ATB 2.221) | 1-5, 7-12 |
There are no lectures / tutorials / labs in term week 6 ("reading week").
Coursework and exam:
There is one take-home coursework worth 20% that consists of data analysis in R and writing of a corresponding statistical report, preferably in R Markdown. The written exam (2 hours) is worth the remaining 80% and is concerned with theory and methods.
Assessment | Date | Term week |
---|---|---|
Course work (20%): | Announced Tuesday 19 November 2019 Submission Tuesday 3 December 2019, 12 noon Please hand in the coursework at the ATB receiption! |
9 and 11 |
Written exam (80%): | 21 January 2020, 2pm-4pm (2 hours), Renold C16 | exam period |
The current coursework task will be made available on Blackboard two weeks before the due deadline. The expected amount of time to complete the coursework is 10h.
For revision, the exam question and the coursework tasks of the previous year (2018/19) are available on Blackboard.
Workload:
MATH38161 is a 10 credit module and correspondingly completion of this module requires about 100 hours study time. As a guideline the breakdown of the expected workload (104h) is as follows:
Type | Purpose | Study hours |
---|---|---|
Contact time | Total: 32h | |
Lectures (new material) | 10 x 2h = 20h | |
Lectures (revision) | 1 x 2h = 2h | |
Example classes | 5 x 1h = 5h | |
Software lab | 5 x 1h = 5h | |
Self-study | Total: 60h | |
Pre/post lectures work | 10 x 2h = 20h | |
Pre/post tutorials work: | 10 x 2h = 20h | |
Exam revision | 20h | |
Assessment | Total: 12h | |
Coursework | 10h | |
Exam: | 2h |
Course material:
Course material can be retrieved from Blackboard. This includes i) course notes, ii) the example sheets, iii) the instructions for the computer labs, iv) coursework instructions, and v) previous exam questions and coursework tasks.
See also the corresponding MATH38161 UoM library reading listIn addition, it is essential to study the material further using a text book. The following books are recommend to accompany this module:
- Härdle and Simar. 2015. Applied multivariate statistical analysis. 4th edition.
- Hastie, Tibshirani and Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer.
- James, Witten, Hastie andTibshirani. 2013. An introduction to statistical learning with applications in R. Springer.
- Marden, J.I. 2015. Multivariate Statistics: Old School.
- Rogers, S. and M. Girolami. 2017. A first course in machine learning (2nd Edition). Chapman and Hall / CRC.
For learning R markdown please study the following references:
- The R markdown homepage.
- R Studio. 2014. R markdown reference guide.
- Shalizi. 2016. Using R markdown for class reports.
- Xie, Allaire and Grolemund. 2019. R markdown: the definitive guide.
Further suggested readings to refresh knowledge in statistics, matrices and R programming are:
- Dekking et al. 2005. A modern introduction to probability and statistics: understanding why and how. Springer.
- Petersen and Pedersen. 2012. The matrix cookbook. TU Denmark.
- R Core Team. 2018. An introduction to R. The R Foundation.
- Peng. 2016. R programming for data science. Leanpub.
Additional (advanced) reference books for probabilistic machine learning are:
- Murphy. 2012. Machine learning: a probabilistic perspective. MIT Press.
- Bishop. 2006. Pattern recognition and machine learning. Springer.
Lecture timetable and contents:
There will be 10 weeks of lectures and 1 week of revision. The course is divided into five parts, each of length 2 weeks and dealing with a different area in multivariate statistics and machine learning. Below you can find the topics discussed to facilitate further study:
Term week | Lecture (Date) | Content |
---|---|---|
1, 2 | 1-4 (23 Sept to 1 Oct) |
Background in matrix calculus: matrix notation, matrix calculations, eigenvalues, singular values, spectral decomposition, rank, condition etc.; Multivariate random variables and distributions: basic multivariate statistics, multivariate normal distribution and properties, further multivariate distributions (categorical, multinomial, Dirichlet, Wishart); Estimation in large sample and small sample settings: estimation of covariance using likelihood and regularised/shrinkage estimation. |
3, 4 | 5-8 (7 Oct to 15 Oct) |
Transformations and dimension reduction: variable transformations, location-scale transformation, corresponding transformation of mean, variance and probability density, coloring transformation, Mahalanobis transformation, whitening transformations (ZCA, PCA, Cholesky and variations), Principle Components Analysis, Canonical Correlation Analysis (CCA). |
5, 7 | 9-12 (21 Oct to 5 Nov) |
Unsupervised learning / structure discovery: Algorithmic / heuristic approaches to clustering: K-means, PAM, hierarchical clustering, measuring uncertainty, model-based clustering: Gaussian mixture models, EM algorithm, graphical models. |
8, 9 | 13-16 (11 Nov to 19 Nov) |
Supervised learning / prediction and classification: Diagonal, Linear, and Quadratic Discriminant Analysis (DDA, LDA, QDA) and regularised versions for high-dimensional data analysis, crossvalidation, feature selection and variable importance, linear prediction. |
10, 11 | 17-20 (25 Nov to 3 Dec) |
Nonlinear and nonparametric models / machine learning models: Anscombe data sets, nonlinear regression (polynomial, splines, loess), decision trees, random forest, overview over further important nonparametric approaches (Gaussian processes, neural networks) |
12 | 21-22 (9 Dec to 10 Dec) |
Revision lectures |
Corresponding lecture notes are available on Blackboard. The automated lecture capture system is active for this module so all lectures can be revisited online.
Computer labs timetable and contents:
Term week | Lab (Date) | Topic |
---|---|---|
2 | 1 (1 Oct) |
Overview over R Studio, introduction to R Markdown, exploring multivariate normal density and estimation of covariances. |
4 | 2 (15 Oct) |
Simulation of multivariate normal data, comparison of whitening procedures, PCA analysis and dimension reduction. |
7 | 3 (5 Nov) |
Unsupervised learning using K-means, Gaussian mixture model and hierarchical clustering methods. |
8 | 4 (19 Nov) |
Supervised learning / classification with QDA and LDA and shrinkage LDA / DDA, cross-validation, comparison with GGMs / hierarchical clustering, constructing efficient high-dimensional classifier, feature selection, conditional independence graph. |
10 | 5 (3 Dec) |
DatasauRus dozen data sets, nonlinear regression, random forest, feature selection for wine data. |
The material for each computer lab is available on Blackboard.
Tutorials timetable:
Term week | Tutorial (Date) | Topic |
---|---|---|
3 | 1 (8 Oct) |
Multivariate random variables |
5 | 2 (22 Oct) |
Unsupervised learning |
8 | 3 (12 Nov) |
Supervised learning |
10 | 4 (26 Nov) |
Nonlinearity |
12 | 5 (10 Dec) |
Exam Q & A |
The example class sheets are available on Blackboard.