WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Survey on Time Series Change Points

Posted on
Tags: Change Points

This note is based on the survey paper, Aminikhanghahi, S., & Cook, D. J. (2017). A Survey of Methods for Time Series Change Point Detection. Knowledge and Information Systems, 51(2), 339–367.

Change point detection (CPD) is the problem of finding abrupt changes in data when a property of the time series changes. Segmentation, edge detection, event detection and anomaly detection are similar concepts which are occasionally applied as well as change point detection.

Different from change point estimation, which tries to model and interpret known changes in time series rather than identifying that a change has occurred.


Online detection

  • $\epsilon$-real time algorithm: an online algorithm which needs at least $\epsilon$ data samples
  • $\infty$-real time algorithm: offline
  • 1-real time algorithm: completely online


change detection methods need to be designed in a computationally efficient manner so that they can scale to massive data sizes.

one way to compare the computational cost of the algorithms is finding the algorithm is parametric or nonparametric since nonparametric approaches have demonstrated greater success for massively large datasets. Also the computational cost of parametric methods is higher than nonparametric approaches and does not scale as well with the size of the dataset.

  • Anytime algorithm (or interruptible algorithm) allows the execution to be interrupted at any time and output the best possible solution obtained so far.
  • Contract algorithm: trade off computation time for solution quality but is given the allowable run time in advance as a type of contract agreement. It receives its allowable execution time as a specified parameter. If a contract algorithm is interrupted before the allocated time is completed, it might not yield any useful results.


The output of CPD can be

  • binary classifier: yes or no
  • varying level of precision, change point occurs within $x$ time units.
  • the time of the next change point (or the times of all change points)

The performance metrics can be

  • accuracy
  • sensitivity
  • G-mean: $\sqrt{\text{sensitivity}\times \text{specificity}}$
  • precision
  • F-measure
  • ROC: TP vs FP
  • Precision-Recall Curve (PR curve)
  • Mean absolute error (MAE)
  • Mean squared error (MSE)
  • Mean signed difference (MSD)
  • Root mean squared error (RMSE)
  • Normalized root mean squared error (NRMSE)

Supervised Methods

Method 1: if the number of states is specified, the changing point detection algorithm is trained to find each state boundary. A sliding window moves through the data, considering each possible division between two data points as a possible change point. [not very clear]

Method 2: treat as a binary class problem, the change point represents one class

Method 3: virtual classifier. [interesting!!]

Unsupervised Methods

Likelihood Ratio Methods

analyze the probability distributions of data before and after a candidate change point, and identify the candidate as a change point if the two distributions are significantly different.

or reduce the problem of CPD into time series-based outlier detection

recent studies introduce more flexible nonparametric variations by estimating the ratio of probability densities directly without needing to perform density estimation

  • model the density ratio by a nonparametric Gaussian kernel
  • then approximate a dissimilarity measure, such as KL divergence, or Pearson divergence

other choice, such as the semi-parametric log-likelihood change detector

Subspace Model Methods

calculate the gap between subspaces and use it as a measure of the change in the time series sequence.

Probabilistic Methods

(online) Bayesian change point detection

Kernel Based Methods

use unsupervised kernel-based test statistic to test the homogeneity of data in time series past and present sliding windows.


Graph Based Methods

derive the graph from a distance or a generalized dissimilarity on the sample space, with time series observations as nodes and edges connecting observations based on their distance.

Clustering Methods

treat it as a clustering problem with a known or unknown number of clusters, such that observations within clusters are iid, and observations between adjacent clusters are not.


  • one important issue: detection delay
  • open problem: algorithm robustness
  • open issue: evaluate the significance of the detected change point
  • an ongoing challenge: non-stationary time series.

Published in categories Note