Topics in Data Science & Machine Learning

Lecturer

Professor Yiming Ying, School of Mathematics and Statistics, University of Sydney

Synopsis

This summer course aims to explore the principles of Machine Learning with the goal of equipping students with essential mathematical and statistical tools. It will cover a diverse range of topics including classical machine learning topics such as classification and regression algorithms. It also covers modern topics such as deep neural networks, stochastic online learning for big data, and statistical learning theory, emphasizing the foundational concepts and mathematical methodologies in machine learning.

While prior programming experience is not required, familiarity with MATLAB or Python is beneficial. Through a combination of theoretical lectures and practical tutorials, students will develop analytical skills necessary for tackling real-world machine learning challenges. Upon completion, participants will possess a solid understanding of machine learning principles and mathematical techniques for data analysis, empowering them to pursue further study or apply their knowledge in academic or professional settings.

Course Overview

Week 1 

  • Basic concepts/examples in machine learning:  supervised and unsupervised learning, generalisation concept with linear models and neural networks, and model selection;
  • quick review of eigenvalue decomposition, singular value decomposition, and convex optimization (Lagrangian multiplier theory);
  • statistical description of data such as normalisation, correlation and independence

Week 2: 

  • Kernel methods: support vector machines (large margin, dual formulation, quadratic programming, PSD kernels) and representation theory;
  • Bayesian learning and generalisation.

Week 3:

  • Modelling and regression (ML and MAP solutions, LASSO models);
  • statistical learning theory (VC dimension, Covering number, Rademacher complexity, excess risk analysis)

Week 4: 

  • Unsupervised learning (PCA, GMM clustering model and EM algorithm, feature selection, density estimation);
  • stochastic online learning for big data.

Prerequisites

Basics of linear algebra, calculus (multi-variable) and statistics and probability (concepts of probability, random variables, distribution)

Assessment

  • 3 assignments (20% each)
  • Take-home exam (40%)

Attendance requirements

TBA

Resources/pre-reading

  1. Matrix cookbook, Kaare Brandt Petersen and Michael Syskind Pedersen https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
  2. Mathematics for machine learning, Marc Peter Deisenroth, A. Aldo Faisal and Cheng Soon Ong. (Chapter 2, 4, 5, 6) https://mml-book.github.iox
  3. Convex Optimization, Stephen Boyd (Chapter 1, first three sections of Chapter 2 and 3) https://web.stanford.edu/~boyd/cvxbook/

 

Not sure if you should sign up for this course?

Take this QUIZ to self-evaluate and get a measure of the key foundational knowledge required.

Yiming Ying, School of Mathematics and Statistics, University of Sydney

Yiming Ying is a Professor in the School of Mathematics and Statistics at the University of Sydney. Prior to that, he was a tenured Professor in the Department of Mathematics and Statistics at SUNY Albany and a Lecturer in Computer Science in the University of Exeter. He got his PhD in mathematics from Zhejiang University in 2002 and his postdoctoral training in Machine Learning in Hong Kong and UK. His research interests include Machine Learning, Statistical Learning Theory, and Trustworthy AI.