# Statistical Modelling and Analysis of Time-series Data

#### Lecturers

Professor Sumeetpal Singh, School of Mathematics and Applied Statistics, University of Wollongong
Dr David Gunawan, School of Mathematics and Applied Statistics, University of Wollongong
Professor Ba-Ngu Vo, Department of Electrical & Computer Engineering, Curtin University, Western Australia (Guest lecturer)

#### Synopsis

The goal of this subject is to fit statistical models, which are almost invariably complex, to data that arrives sequentially over time. Upon fitting a suitable model, one can proceed to forecast future data values, or estimate quantities that are not explicitly observed. Numerous applied areas make extensive use of the statistical modelling and computational methodology that this course covers. For example, data that arrives sequentially over time is common in engineering, finance, machine learning and environmental statistics.

Statistical models that are used to describe data that arrives sequentially over time are called timeseries models. The Autoregressive Moving Average model is one such example. It is widely applied due to its simplicity, and also because it is easy to compute with.

However, the focus of this course is on two vastly more flexible class of time-series models that incorporate hidden (or latent) state variables to more accurately describe real physical processes. These are, respectively, the state-space model and its generalisation, the hidden Markov model.

This course will cover various instances of these models, which are motivated by their applications. In order to fit the model to the data, it is necessary to first compute the conditional probability distributions of all the unobserved variables of the model. This is one of the most challenging aspects in practice: This course will mathematically characterise these time-varying distributions and provide practical solutions for computing them. Building on this, it will then provide practical solutions for fitting the model to the data, and for using the fitted model to estimate the unobserved variables of the physical process being studied.

This course will cover both Bayesian and frequentist methods.

#### Course Overview

Week 1: Introduction to state-space models with examples.

• Non-Bayesian (optimal linear estimation) and Bayesian estimation methodology for state-space models.
• Optimal estimation for state-space models without distributional assumptions.
• Optimal estimation for the state-space model under Gaussian distributional assumptions: the Kalman filter and Kalman smoother.

Week 2: Introduction to hidden Markov models.

• Applications; Inference/estimation objectives; Theory for exactly computing the conditional probability distributions for data analysis.

Week 3: Monte Carlo computational methods for hidden Markov models.

• Principles of importance sampling: The optimal proposal; Convergence (via the central limit theorem).
• Principles of sequential importance sampling: Design of algorithms; Resampling preserves unbiasedness and its verification via the tower property of conditional expectations; Design of proposals; Mean-square error analysis; Controlling the accumulation of the approximation error over time.
• Numerical demonstrations.

Week 4: Calibrating the hidden Markov model for real data. (Building on algorithms from earlier weeks.)

• Maximum likelihood estimation for the hidden Markov model.
• Estimating the gradient of the log likelihood via sequential importance sampling and its use within maximum likelihood estimation.
• Implementing the expectation-maximisation algorithm using sequential importance sampling.
• Applications, including economics and engineering.

#### Prerequisites

We assume students would have knowledge in the following topics:

• Probability and Random Processes covering
• Sample space; events; probability measure; probability axioms; conditional probability; probability chain rule; independence; Bayes rule.
• Random variables (discrete and continuous): probability mass function; probability density function; cumulative distribution function; transformation of random variables.
• Bivariate and multivariate, in particular: conditional probability mass function; conditional probability density function; conditional expectation; marginals; change of variables (Jacobian); properties of the Gaussian distribution.
• Definition of a random process (discrete time only); finite-order distributions; autocorrelation function; Markov chains.
• Statistics
• Maximum likelihood estimation; least squares; principles of Monte Carlo estimation (e.g. Bootstrap).

These are typical subject material for an undergraduate Mathematics degree with a Probability and Statistics major.

• This course is designed to be a ‘core’ module for graduate students in Mathematics who are further specialising in probability and statistics. For example, it is suited for students wishing to pursue a PhD in mathematical/computational statistics, as well as for those desiring an industry career, for example, in mathematical finance or data science.
• The content of this course is designed for the Honours level, but without assuming knowledge of specialist topics. However, it assumes very good knowledge in the fundamentals of probability and multivariate statistics, as reflected in the pre-enrolment quiz.

#### Assessment

• Weekly take home assignments (50% total)
• Final take home exam (50%)

TBA