Linkfree v.s. Semiparametric
Posted on (Update: )
This note is based on Li (1991) and Ma and Zhu (2012).
SIR for Dimension Reduction
Consider the model
\begin{equation} y = f(\beta_1\x, \beta_2\x,\ldots,\beta_K\x,\epsilon)\label{eq:model} \end{equation}
where $\beta$’s are unknown row vectors. We can view this model as the projection of the $p$dimensional explanatory variable $\x$ onto the $K$ dimensional subspace, $(\beta_1\x,\ldots,\beta_K\x)$.
Alternatively, \eqref{eq:model} describes the situation where the response variable $Y$ depends on the $p$dimensional variable $X$ only through a $K$dimensional subspace. The unknown $\beta_i$’s, which span this space, are called effective dimension reduction directions (EDRdirections). The span is denoted as effective dimension reduction space (EDRspace). The aim is to estimate the base vectors of this space, for which neither the length nor the direction can be identified. Only the space in which they lie is identifiable.^{1}
When $K$ is small, we may achieve the goal of data reduction by estimating the $\beta$’s efficiently. Any linear combination of the $\beta$’s is an effective dimensionreduction (e.d.r) direction, and called the linear space $B$ generated by the $\beta$’s as the e.d.r space.
Toy example, consider $\beta_1=(1,2,0)$, $\beta_2=(0,0,1)$, then any linear combination $c_1\beta_1+c_2\beta_2$ is a point (direction) in the subspace (e.d.r. space) spanned by $(1,2,0)$ and $(0,0,1)$.
Let $\Sigma_{\x\x}$ be the covariance matrix of $\x$, and consider the standardized version of $\x$, $\z=\Sigma_{\x\x}^{1/2}[\x\E\x]$. Then we have
\[y = f(\eta_1\z,\ldots,\eta_K\z,\varepsilon)\,,\]where $\eta_k=\beta_k\Sigma_{\x\x}^{1/2}$. Any vector in the linear space generated by the $\eta_k$’s is a standardized e.d.r. direction.
Onecomponent models ($K=1$)
 the generalized linear model
 the BoxCox transformation model
Multicomponent model ($K>1$)
 general form $g(\beta_1\x,\ldots,\beta_K\x)$
 additivity form: $g_1(\beta_1\x)+\cdots+g_K(\beta_K\x)$. (PPR)
Evaluate the effectiveness of an estimated e.d.r direction. An affine invariant criterion — the squared multiple correlation coefficient between the projected variable $b\x$ and the ideally reduced variable $\beta_1\x,\ldots,\beta_K\x$.
\[R^2(b) = \underset{\beta\in B}{\max}\frac{(b\Sigma_{\x\x}\beta')^2}{b\Sigma_{\x\x}b\cdot\beta\Sigma_{\x\x}\beta'}\]where $b$ is the estimated e.d.r direction, and $B$ is true e.d.r space.
Theorem
Condition: For any $b$ in $\IR^p$, for some constants $c_0,c_1,\ldots,c_K$, we have $\E(b\x\mid\beta_1\x,\ldots,\beta_K\x)=c_0+c_1\beta_1\x+\cdots+c_K\beta_K\x$.
Under the conditions, the centered inverse regression curve $\E(\x\mid y)\E(\x)$ is contained in the linear subspace spanned by $\beta_k\Sigma_{kk}$.
consequence

The eigenvectors, $\eta_k(k=1,\ldots,K)$, associated with the largest $K$ eigenvalues of $\cov[\E(\z\mid y)]$ are the standardized e.d.r. directions.

One can quantify how far away from the standardized e.d.r space the inverse regression curve $\E(\z\mid y)$ is when the condition violated.
The procedure is similar to the case $K=1$, which was covered in SIR and Its Implementation.
A Semiparametric Approach to Dimension Reduction
Literatures
identifying the central space
 sliced average variance estimation
 directional regression
 kernel inverse regression
 CANCOR analysis
but they rely on certain conditions
 $\E(\x\mid\x^T\beta)$ is linear function of $\x$
 $\cov(\x\mid \x^T\beta)$ is assumed to be a constant matrix
others:
 Fourier transformations requires one to estimate the joint pdf of $\x$, which is typically infeasible in a highdimensional environment.
 dMAVE, which adapting minimum average variance estimation (MAVE),
 SR.
Existing methods impose either the above two conditional moment conditions or distributional assumptions on the covariate vector in one form or another
identifying the central mean space
 OLS by assuming $\x$ to satisfy the linearity condition
 average derivative estimation, which requires $\x$ to be continuous
 nonlinear least squares
 minimum average variance estimation
 sliced regression
 principal Hessian directions which requires $\x$ to satisfy both the linearity condition and the constant variance condition.
 minimizing a KullbackLeibler distance.
Proposal
Casting the dimensionalreduction problem in the semiparametric framework, the dimensionreduction problems become semiparametric estimation problems. And powerful semiparametric estimation and inference tools become applicable.
superiority:
 relaxation of the linearity condition and the constant variance condition.
estimating the central subspace via semiparametric
Let $\x$ be a $p\times 1$ covariate vector and $Y$ a univariate response variable. The goal of sufficient dimension reduction is to seek a matrix $\bbeta$ such that
\[\begin{equation} F(y\mid\x)=F(y\mid\x^T\bbeta)\,, \text{for }y\in\IR\,. \label{eq:semiparam} \end{equation}\]The column space of $\bbeta$ satisfying $\eqref{eq:semiparam}$ is called a dimensionreduction subspace. Since the dimensionreduction subspace is not unique, the primary interest is the central subspace, which is defined as the interaction of all dimensionreduction subspaces, provided that the interaction itself is a dimensionreduction subspace.
The likelihood of one random observation $(\x,Y)$ is
\[\eta_1(\x)\eta_2(Y,\x^T\bbeta)\]where $\eta_1$ and $\eta_2$ are infinitedimensional nuisance parameters while $\bbeta$ as the parametric estimation problem.
Influence functions can be viewed as normalized elements in a so called nuisance tangent space orthogonal complement $\Lambda^\perp$. Derive the orthogonal complement, and obtain a general class of estimating equations for any functions $g(Y,\x^T\bbeta)$.