Statistical Modelling and Analysis of Time-series Data

Sponsored by

Lecturers

Professor Sumeetpal Singh, School of Mathematics and Applied Statistics, University of Wollongong
Dr David Gunawan, School of Mathematics and Applied Statistics, University of Wollongong
Professor Ba-Ngu Vo, Department of Electrical & Computer Engineering, Curtin University, Western Australia (Guest lecturer)

Synopsis

The goal of this subject is to fit statistical models, which are almost invariably complex, to data that arrives sequentially over time. Upon fitting a suitable model, one can proceed to forecast future data values, or estimate quantities that are not explicitly observed. Numerous applied areas make extensive use of the statistical modelling and computational methodology that this course covers. For example, data that arrives sequentially over time is common in engineering, finance, machine learning and environmental statistics.

Statistical models that are used to describe data that arrives sequentially over time are called timeseries models. The Autoregressive Moving Average model is one such example. It is widely applied due to its simplicity, and also because it is easy to compute with.

However, the focus of this course is on two vastly more flexible class of time-series models that incorporate hidden (or latent) state variables to more accurately describe real physical processes. These are, respectively, the state-space model and its generalisation, the hidden Markov model.

This course will cover various instances of these models, which are motivated by their applications. In order to fit the model to the data, it is necessary to first compute the conditional probability distributions of all the unobserved variables of the model. This is one of the most challenging aspects in practice: This course will mathematically characterise these time-varying distributions and provide practical solutions for computing them. Building on this, it will then provide practical solutions for fitting the model to the data, and for using the fitted model to estimate the unobserved variables of the physical process being studied.

This course will cover both Bayesian and frequentist methods.

Course Overview

Week 1: Introduction to state-space models with examples.

  • Non-Bayesian (optimal linear estimation) and Bayesian estimation methodology for state-space models.
  • Optimal estimation for state-space models without distributional assumptions.
  • Optimal estimation for the state-space model under Gaussian distributional assumptions: the Kalman filter and Kalman smoother.

Week 2: Introduction to hidden Markov models.

  • Applications; Inference/estimation objectives; Theory for exactly computing the conditional probability distributions for data analysis.

Week 3: Monte Carlo computational methods for hidden Markov models.

  • Principles of importance sampling: The optimal proposal; Convergence (via the central limit theorem).
  • Principles of sequential importance sampling: Design of algorithms; Resampling preserves unbiasedness and its verification via the tower property of conditional expectations; Design of proposals; Mean-square error analysis; Controlling the accumulation of the approximation error over time.
  • Numerical demonstrations.

Week 4: Calibrating the hidden Markov model for real data. (Building on algorithms from earlier weeks.)

  • Maximum likelihood estimation for the hidden Markov model.
  • Estimating the gradient of the log likelihood via sequential importance sampling and its use within maximum likelihood estimation.
  • Implementing the expectation-maximisation algorithm using sequential importance sampling.
  • Applications, including economics and engineering.

Prerequisites

We assume students would have knowledge in the following topics:

  • Probability and Random Processes covering
    • Sample space; events; probability measure; probability axioms; conditional probability; probability chain rule; independence; Bayes rule.
    • Random variables (discrete and continuous): probability mass function; probability density function; cumulative distribution function; transformation of random variables.
    • Bivariate and multivariate, in particular: conditional probability mass function; conditional probability density function; conditional expectation; marginals; change of variables (Jacobian); properties of the Gaussian distribution.
    • Definition of a random process (discrete time only); finite-order distributions; autocorrelation function; Markov chains.
  • Statistics
    • Maximum likelihood estimation; least squares; principles of Monte Carlo estimation (e.g. Bootstrap).

These are typical subject material for an undergraduate Mathematics degree with a Probability and Statistics major.

  • This course is designed to be a ‘core’ module for graduate students in Mathematics who are further specialising in probability and statistics. For example, it is suited for students wishing to pursue a PhD in mathematical/computational statistics, as well as for those desiring an industry career, for example, in mathematical finance or data science.
  • The content of this course is designed for the Honours level, but without assuming knowledge of specialist topics. However, it assumes very good knowledge in the fundamentals of probability and multivariate statistics, as reflected in the pre-enrolment quiz.

Assessment

  • Weekly take home assignments (50% total)
  • Final take home exam (50%)

Attendance requirements

TBA

Resources/pre-reading

No Pre-reading required

Not sure if you should sign up for this course?

Take this QUIZ to self-evaluate and get a measure of the key foundational knowledge required.

Professor Sumeetpal Singh, School of Mathematics and Applied Statistics, University of Wollongong

Professor Singh was formerly a Professor and the Head of the Signal Processing Group in the Department of Engineering, University of Cambridge. He joined the School of Mathematics, University of Wollongong, in 2023 as the Tibra Foundation Chair in Mathematical Sciences. He was also formerly a Fellow of the Alan Turing Institute (UK), and a Fellow and Director of Studies of Churchill College, Cambridge. His research areas include Computational Statistics (Sequential Monte Carlo, Markov Chain Monte Carlo); Bayesian Statistics; Probabilistic Machine Learning. He has held major grants in the UK; organised several research-intensive programmes (e.g. Newton Institute, Cambridge); and serves his discipline as a member of the editorial board of journals in Statistics and Engineering, the EPRSC peer review college, and committees of major conferences. He contributes to mathematical enrichment activities for school children.

Dr David Gunawan, School of Mathematics and Applied Statistics, University of Wollongong

I am a Senior Lecturer in statistics at the University of Wollongong (UOW). I have a strong interest in Bayesian computation and its applications to solve real world problems, such as economic inequality and poverty measurement, health, cognitive psychology, finance, and environment. Since 2020, I have published 32 articles in top journals in statistics, econometrics, and related fields, such as Journal of Econometrics, Journal of Business and Economic Statistics, Journal of Computational and Graphical Statistics. I have been the chief investigator of two ARC grants. I was an Associate Investigator at the ARC Centre of Excellence for Mathematical and Statistical Frontiers from 2017-2021.