M-estimator
Posted on (Update: )
Let Mn be random functions and let M be a fixed function of θ such that for every ε>0
supθ∈Θ|Mn(θ)−M(θ)|p→0supθ:d(θ,θ0)≥εM(θ)<M(θ0).Then any sequence of estimators ˆθn with Mn(ˆθn)≥Mn(θ0)−oP(1) converges in probability to θ0.
The first condition says that the sequence Mn converges to a nonrandom map M:Θ→¯IR, and the second condition requires that this map attains its maximum at a unique point θ0, and only parameters close to θ0 may yield a value of M(θ) close to the maximum value M(θ0). Thus, θ0 should be a well-separated point of maximum of M, and a counterexample is the following figure,
Pay attention to the oP(1) in the inequality, bear in mind that it is a short for a sequence of random vectors that converges to zero in probability.
Let Ψn be random vector-valued functions and let Ψ be a fixed vector-valued functions of θ such that for every ε>0
supθ∈Θ‖Ψn(θ)−Ψ(θ)‖p→0infθ:d(θ,θ0)≥ε‖Ψ(θ)‖>0=‖Ψ(θ0)‖.Then any sequence of estimators ˆθn such that Ψn(ˆθn)=oP(1) converges in probability to θ0.
Let X1,…,Xn be a sample from some distribution P, and let a random and a “true” criterion function be of the form:
Ψn(θ)=1nn∑i=1ψθ(Xi)=Pnψθ,Ψ(θ)=Pψθ.Assume that the estimator ˆθ0 is a zero of Ψn and converges in probability to a zero θ0 of Ψ. Because ˆθn→θ0, expand Ψn(ˆθn) in a Taylor series around θ0. Assume for simplicity that θ is one-dimensional, then
0=Ψn(ˆθn)=Ψn(θ0)+(ˆθn−θ0)˙Ψn(θ0)+12(ˆθn−θ0)2¨Ψn(˜θn),where ˜θn is a point between ˆθn and θ0. This can be rewritten as
√n(ˆθn−θ0)=−√nΨn(θ0)˙Ψn(θ0)+12(ˆθn−θ0)¨Ψn(˜θn). √n(ˆθn−θ0)→N(0,Pψ2θ0(P˙ψθ0)) √ˆθn−θ0→Nk(0,(P˙ψθ0)−1Pψθ0ψTθ0(P˙ψθ0)−1)here the invertibility of the matrix P˙ψθ0 is a condition.
The function θ↦sign(x−θ) is not Lipschitz, the Lipschitz condition is apparently still stronger than necessary.