Academic Year 2018/19, Term 2
School of Mathematics, The University of Manchester
Teaching staff:
Lecturer: Korbinian Strimmer (Office hour: Friday 3-4pm, ATB 2.221)
Academic tutors: Georgi Boshnakov and Robert Gaunt
Student tutors: Rajenki Das,
Bindu Vekaria,
Jack Mckenzie,
Chunyu Wang
and Zili Zhang
Student assistant: Beatriz Costa Gomes
Frequently asked questions, feedback and email:
If you have any suggestions, comments, corrections (e.g. typos in notes) etc. you are most welcome to contact the lecturer directly by email. However, please remember that this is a very large class (approx. 300 students!) so please do check the MATH20802 FAQ to see whether your question has not already been asked and answered before!
Please also note that the size of the class does not allow for personal tuition via email. Therefore, to get feedback please attend the tutorial sessions and ask question in person to your tutor. This will also benefit all students in your class! For further questions the lecturer is available at the end of each lecture and during the office hour on Friday afternoon.
Finally, this course encourages good practice for mental health and work-life balance. In particular, note that no email will be answered on weekends.
Overview and syllabus:
For an outline of this course unit see MATH20802: Statistical Methods or download the course description as PDF.
Dates and location:
The course starts 31st January 2019 and runs until 10th May 2019. The tutorials start in week 3 on 12 February 2019.The course takes place at the following dates and locations:
Session | Time slot (location) | Term week |
---|---|---|
Lectures: | Thursday 5pm-6pm (Crawford House TH1) and Friday 12noon-1pm (Stopford TH1) |
1-11 |
Example classes: | Tuesday 3pm-4pm (Boshnakov, Vekaria), Tuesday 5pm-6pm (Boshnakov, Zhang), Thursday 11pm-12noon (Strimmer, Das), Thursday 4pm-5pm (Strimmer, McKenzie), Friday 9am-10am (Gaunt, Wang) (Alan Turing G209) |
3, 4, 6, 8, 9, 11 |
Computer labs: | Groups and times as above for example classes (Alan Turing G105) | 5, 10 |
In-class test: | Groups and times as above for example classes (Alan Turing G105). The test will be an online assessment on Blackboard (40 minutes). |
7 |
Revision week: | As above - revision lectures and Q & A classes | 12 |
In-class test and exam:
The in-class test is an online assessment on Blackboard and will take place in week 7 (worth 20%) in Alan Turing G105 during the usual example class / computer lab hours. The written exam (2 hours) is worth the remaining 80%.
Assessment | Date | Term week |
---|---|---|
In-class test (20%): | 12 March 2019 to 15 March 2019 (40 minutes) | 7 |
Written exam (80%): | 31 May 2019, 2pm-4pm (2 hours) | Exam period |
Course material:
Course material can be retrieved from Blackboard. This includes i) the scanned handwritten slides from the lectures, ii) the typed lectures notes (these will be written duing the course of the term), iii) the example sheets and iv) the instructions for the computer labs. Furthermore, the automated lecture capture system is active for this module so videos of the lectures can be revisited online.
In addition to the above, it is essential to study the material using a text book. The following are recommend to accompany this module (all can be downloaded as PDF):
- Cox, D.R. 2006. Principles of statistical inference. Cambridge University Press
- Shalizi, C.R. 2019. Advanced Data Analysis from an Elementary Point of View. Cambridge University Press (to appear)
- Wood, S. 2014. Core Statistics. Cambridge University Press.
Lecture contents:
There will be 11 weeks of lectures and 1 week of revision. Below you can find the topics discussed in each week to facilitate further study (this table is updated at the end of each week):
Term week | Content | Links and Keywords |
---|---|---|
1 | Lecture 1: Introduction to the module content: information and likelihood, linear model (regression), Bayesian learning, application in R.
Overview of data science - probabilistic inference vs. other schools (machine learning), difference between probability and statistics
(= randomness vs. uncertainty, intrinsic property vs. description, ontology vs. epistemology), Overview of probabilistic modeling, model fit by minimising KL divergence.
Lecture 2: Shannon entropy, application to discrete uniform model, definition of Kullback-Leibler divergence (relative entropy), KL properties, KL divergence between discrete distributions and link to chi-squared statistics, application to two univariate normals, likelihood function, maximum likelihood as large-sample limit of minimising KL divergence / cross entropy. |
See scanned slides for lectures 1 and 2 (available on Blackboard). Examinable topics: randomness, uncertainty, entropy (information theory), differential entropy, Kullback-Leibler divergence, cross entropy, likelihood function. Not relevant for exam but still interesting: book - the master algorithm, epistemology, Bregman divergence, f-divergence. |
2 |
Lecture 3: Maximum likelihood point estimates, bias, variance and MSE (mean squared error), log-likelihood function, score function (for scalar and vector-valued
parameters), MLE for Binomial, exponential and normal model, MLE of variance in normal model is biased, properties of MLEs, invariance, relationship to least squares (LS) estimator in normal case.
Lecture 4: Optimality properties, consistency, Cramer-Rao bound, MLE as minimally sufficient statistic, MLE as optimal summariser of information in data about a model, observed Fisher information matrix (for vector-valued parameters), quadratic approximation of log-likelihood function around the MLE, relationship of observed information to inverse variance. |
See notes for lectures 3 and 4 (available on Blackboard). Examinable topics: Mean squared error, maximum likelihood estimation, score function, observed Fisher information. Not relevant for exam but still interesting: Cramer-Rao bound, sufficient statistic. |
3 | Lecture 5: Observed Fisher information for estimate of proportion (Binomial model),
observed Fisher information matrix for normal model, asymptotical normal distribution of MLEs,
construction of corresponding symmetric normal confidence intervals, expected Fisher information, expected Fisher information as local approximation of KL divergence, expected Fisher information of normal model.
Lecture 6: (Squared) Wald statistic and corresponding asymptotic distribution, normal example, counter example (uniform distribution) with non-regular likelihood function (not differentiable at MLE hence no observed Fisher information and no asymptotics), likelihood based confidence intervals. |
See notes for lectures 5 and 6 (available on Blackboard). Examinable topics: expected Fisher information, confidence intervals using normal approximation. Not relevant for exam but still interesting: Information geometry, higher order likelihood inference. |
4 |
Lecture 7: Refresher of confidence intervals: coverage probability, frequentist interpretation, calculation of critical values for symmetric normal based intervals; Wilk's likelihood ratio statistic and its asymptotic chi-squared distribution,
computation of likelihood-based confidence interval using critical values from the chi-squared distribution, comparison of normal and likelihood CIs using exponential model, normal CI as likelihood CI for quadratic approximation of log-likelihood.
Lecture 8: Discussion of the importance of R and Python for statistics and data science. Demonstration in R how to numerically calculate the MLE, the observed Fisher information and to produce corresponding normal and likelihood-based confidence intervals. |
See notes for lectures 7 and 8 (available on Blackboard). Examinable topics: Wilk's likelihood ratio statistic, Chi-squared distribution, confidence intervals using likelihood Not relevant for exam but still extremely important (for many more links see notes!): R project for statistical computing, R for data science. |
5 | Lecture 9: Correspondence of likelihood confidence intervals with
likelihood ratio tests, generalised maximum likelihood ratio test (GLT), discussion of model
selection using GRLTs, two-sample-test.
Lecture 10: Wrap-up of likelihood methods for estimation and inference. |
See notes for lectures 9 and 10 (available on Blackboard). Examinable topics: likelihood ratio test. Not relevant for exam but still interesting: Akaike information criterion, Uniformly most powerful test. |
6 |
Lecture 11: General regression problem, objectives in regression, linear model, multiple regression, minimising residual sum of squares (RSS), ordinary least squares (OLS) estimator of regression coefficients.
Lecture 12: Regression as special case of supervised learning, hierarchy of regression models, background in multivariate statistics, covariance and correlation matrix and their properties. |
See notes for lectures 11 and 12 (available on Blackboard). Examinable topics: regression analysis, linear regression, residual sum of squares, ordinary least squares, covariance matrix. Not relevant for exam but still interesting: supervised learning. |
7 | Lecture 13: Multivariate normal distribution, maximum likelihood estimates
of mean vector and covariance matrix, three different notations (using data vectors, components and data matrix), OLS estimator interpreted via covariance matrices, plug-in estimate, correlation among predictors vs. marginal correlations between response and predictors.
Lecture 14: Marginal regression, OLS estimator derived by maximum likelihood, OLS estimator dervived as best linear predictor, multiple correlation coefficient, R2 coefficient, variance decomposition, OLS estimator derived by conditioning, law of total variance. |
See notes for lectures 13 and 14 (available on Blackboard). Examinable topics: multivariate normal distribution, coefficient of determination, mean squared prediction error, conditioning, law of total variance, unexplained variation in regression. explained variation. |
8 | Lecture 15: Training data and test data, prediction interval, geometric interpretation of regression as orthogonal
projection, definition of variable importance, discussion of variable importance measures, marginal correlation, decomposition of squared
multiple correlation coefficient.
Lecture 16: Regression t-scores, testing for vanishing regression coefficients, understanding the output of computer function for linear regression, equivalency of ranking by t-scores to ranking by partial correlation, further approaches to variable selection (heuristic search, lasso regression, mutual information), wrap-up. |
See nostes for lectures 15 and 16 (available on Blackboard). Examinable topics: Variable importance measures, marginal correlation, decomposition of R-squared, regression t-scores, testing for vanishing regression coefficients, partial correlation. Not relevant for exam but still interesting: feature selection, Information gain, mutual information, lasso regression. |
9 | Lecture 17: Overview of Bayesian statistics,
interpretation of probability, history, foundations of Bayesian learning,
prior distribution, posterior distribution, principle of minimal information update, entropy learning.
Lecture 18: Probabilistic programming languages, Bayesian estimator, credible intervals, Beta-Binomial model, asymptotics, prior as pseudo-data, linear shrinkage, shrinkage intensity. |
See notes for lectures 17 and 18 (available on Blackboard). Examinable topics: probability interpretations, Bayesian probability, Bayesian inference, Beta distribution, Not relevant for exam but still interesting: history of Bayesian statistics, Cox's theorem, Bayes linear method. |
10 | Lecture 19: Conjugate priors, frequentist properties of Bayes estimators, Normal-Normal model
to estimate the mean, Stein paradox, discussion of priors (weakly informative priors, empirical Bayes priors,
uninformative priors).
Lecture 20: Shrinkage estimation, James-Stein estimator, Inverse Gamma (IG) distribution, IG-Normal model to estimate the variance, Jeffreys prior. |
See notes for lectures 19 and 20 (available on Blackboard). Examinable topics: Conjugate prior, Inverse Gamma distribution, Stein paradox, Jeffreys prior. Not relevant for exam but still interesting: Admissible decision rule |
EASTER BREAK | ||
11 | Lecture 21: Brief revisit of Bayesian statistics, specification of model = prior + likelihood, optimality of Bayes inference,
model comparison using the Bayes factor, link to (generalised) likelihood ratio statistic, Occam's razor.
Lecture 22: Decision threshold, sensitivity and specificity, false discovery rates (FDR), local FDR and tail-area based FDR, q-values, multiple testing, FDR vs. FNDR (false non-discovery rates). |
See notes for lectures 21 and 22 (available on Blackboard). The topics discussed in this week are for your pleasure only and not relevant for the exam: Bayes factor, Occam's razor, False discovery rate. |
12 | Revision lectures |
Please note that the links to Wikipedia given above are for convenience but should not be considered as a definite resource! For further study please revisit the notes and read the suggested textbooks!
Computer labs:
There will be two computer labs one in week 5 and the second in week 10. The instruction for the computer labs will be available online on Blackboard.
Example classes:
There are six example classes taking place in term weeks 3, 4, 6, 8, 9 and 11. The corresponding example sheets will be available Tuesday noon on Blackboard. The solutions will be published on Friday noon. In term week 12 (revision week) the tutorials will be Q & A sessions for the exam.