WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Change Points

Posted on (Update: )
Tags: Change Points

Introduction

  1. Page’s (1954, 1955) classical formulation
  2. Shiryaev (1963) and Lorden (1971) then developed

One is concerned with sequential detection of a change-point, which represents a disruption in a continuous production process.

  1. problems in fixed samples
  2. return to sequential detection motivated by problems involving parallel streams of data subject to disruptions in some fraction of them.
  3. not discuss applications to finance

Page’s problem

Suppose $X_1,\ldots,X_m$ are independent observations.

  • for $j\le K$, have the distribution $F_0$
  • for $j > K$, have the distribution $F_1$

where $F_i$ may be completely specified or may depend on unknown parameters.

Page’s solution and Barnard’s Suggestion

For sequential detection, Page (1954) suggested the stopping rule

\[N_0 = \min\{n:S_t-\min_{0\le k\le t}S_k\ge b\}\,,\]

$n$ should be $t$.

where $S_t$ is the $t$-th cumulative sum (CUSUM) of scores $Z(X_i)$.

Barnard (1959) discussed graphical methods for implementing Page’s sequential procedure and suggested a modified procedure for the case of normally distributed random variables with a mean value subject to change from an initial value of 0. Let $S_t = \sum_0^tX_i$, Barnard suggested the stopping rule

\[N = \min\{t:\max_{0\le k < t}\vert S_t-S_k\vert / \sigma(t-k)^{1/2}\ge b\}\,.\]

Note that if $F_1(F_0)$ denotes a normal distribution with unit variance and mean value equal to $\mu_1=\delta$ ($\mu_0=0$), the log-likelihood ratio at $n>K$ is

\[\delta \sum_{i=K+1}^n(X_i-\delta/2).\]

Maximization w.r.t. $\delta$ and $k < n$ leads to $\max_{0\le k < t}(S_t-S_k)^2/[2(t-k)]$, so Barnard’s suggestion can be described as stopping as soon as the generalized likelihood ratio statistic exceeds a suitable threshold.

Shiryaev’s and Lorden’s contributions

Shiryaev (1963) considered the case of completely specified $F_0$ and $F_1$. He assumed that $K$ is random and used optimal stopping theory to describe an exact solution to a well-formulated Bayesian version of the problem, and he computed the Bayes solution in a continuous time formulation involving Brownian motion.

  • loss: $1{K>n}+C(n-K)^+$
  • approximation: under geometric prior and P(a change in any bounded interval) = vanishingly small
\[N_1 = \min\left\{ t:\prod_{k=0}^t\prod_{j=k}^t\frac{dF_1}{dF_0}(X_j)\ge B \right\}\]

Lorden (1971) took a maximum likelihood approach, in the case of two completely specified distributions, leading to the stopping rule

\[N_2 = \min\left\{t : \max_{0\le k\le t}\sum_{j=k+1}^t\log[(dF_1/dF_0)(X_j)]\ge b\right\}\,.\]

Having observed $X_1,\ldots,X_m$, suppose we are interested in testing the hypothesis that there is no change-point. The statistic suggested by Page was

\[\max_{0\le k\le m}\sum_{j=k+1}^m [\delta(X_j-\delta/2)]\,,\]

which is the likelihood ratio statistic.

Hypothesis Testing When a Nuisance Parameter Is Only Present under the Alternative

“semi-linear” regression example:

\[Y_i=\alpha+\beta f_i(\theta)+e_i\,,\]

where $f$ is nonlinear and $\theta$ can be multidimensional.

The hypothesis to be tested is that $\beta = 0$; under this hypothesis the parameter $\theta$ has no meaning. The special case $f_i(\theta) = (x_i-\theta)^+$ is in the spirit of a change-point problem, where the change occurs in the slope of a linear regression.

DNA/Protein Sequence Analysis


Published in categories Note