Link-free v.s. Semiparametric
Posted on (Update: )
This note is based on Li (1991) and Ma and Zhu (2012).
SIR for Dimension Reduction
Consider the model
\begin{equation} y = f(\beta_1\x, \beta_2\x,\ldots,\beta_K\x,\epsilon)\label{eq:model} \end{equation}
where $\beta$’s are unknown row vectors. We can view this model as the projection of the $p$-dimensional explanatory variable $\x$ onto the $K$ dimensional subspace, $(\beta_1\x,\ldots,\beta_K\x)$.
Alternatively, \eqref{eq:model} describes the situation where the response variable $Y$ depends on the $p$-dimensional variable $X$ only through a $K$-dimensional subspace. The unknown $\beta_i$’s, which span this space, are called effective dimension reduction directions (EDR-directions). The span is denoted as effective dimension reduction space (EDR-space). The aim is to estimate the base vectors of this space, for which neither the length nor the direction can be identified. Only the space in which they lie is identifiable.1
When $K$ is small, we may achieve the goal of data reduction by estimating the $\beta$’s efficiently. Any linear combination of the $\beta$’s is an effective dimension-reduction (e.d.r) direction, and called the linear space $B$ generated by the $\beta$’s as the e.d.r space.
Toy example, consider $\beta_1=(1,2,0)$, $\beta_2=(0,0,1)$, then any linear combination $c_1\beta_1+c_2\beta_2$ is a point (direction) in the subspace (e.d.r. space) spanned by $(1,2,0)$ and $(0,0,1)$.
Let $\Sigma_{\x\x}$ be the covariance matrix of $\x$, and consider the standardized version of $\x$, $\z=\Sigma_{\x\x}^{-1/2}[\x-\E\x]$. Then we have
\[y = f(\eta_1\z,\ldots,\eta_K\z,\varepsilon)\,,\]where $\eta_k=\beta_k\Sigma_{\x\x}^{1/2}$. Any vector in the linear space generated by the $\eta_k$’s is a standardized e.d.r. direction.
One-component models ($K=1$)
- the generalized linear model
- the Box-Cox transformation model
Multicomponent model ($K>1$)
- general form $g(\beta_1\x,\ldots,\beta_K\x)$
- additivity form: $g_1(\beta_1\x)+\cdots+g_K(\beta_K\x)$. (PPR)
Evaluate the effectiveness of an estimated e.d.r direction. An affine invariant criterion —- the squared multiple correlation coefficient between the projected variable $b\x$ and the ideally reduced variable $\beta_1\x,\ldots,\beta_K\x$.
\[R^2(b) = \underset{\beta\in B}{\max}\frac{(b\Sigma_{\x\x}\beta')^2}{b\Sigma_{\x\x}b\cdot\beta\Sigma_{\x\x}\beta'}\]where $b$ is the estimated e.d.r direction, and $B$ is true e.d.r space.
Theorem
Condition: For any $b$ in $\IR^p$, for some constants $c_0,c_1,\ldots,c_K$, we have $\E(b\x\mid\beta_1\x,\ldots,\beta_K\x)=c_0+c_1\beta_1\x+\cdots+c_K\beta_K\x$.
Under the conditions, the centered inverse regression curve $\E(\x\mid y)-\E(\x)$ is contained in the linear subspace spanned by $\beta_k\Sigma_{kk}$.
consequence
-
The eigenvectors, $\eta_k(k=1,\ldots,K)$, associated with the largest $K$ eigenvalues of $\cov[\E(\z\mid y)]$ are the standardized e.d.r. directions.
-
One can quantify how far away from the standardized e.d.r space the inverse regression curve $\E(\z\mid y)$ is when the condition violated.
The procedure is similar to the case $K=1$, which was covered in SIR and Its Implementation.
A Semiparametric Approach to Dimension Reduction
Literatures
identifying the central space
- sliced average variance estimation
- directional regression
- kernel inverse regression
- CANCOR analysis
but they rely on certain conditions
- $\E(\x\mid\x^T\beta)$ is linear function of $\x$
- $\cov(\x\mid \x^T\beta)$ is assumed to be a constant matrix
others:
- Fourier transformations requires one to estimate the joint pdf of $\x$, which is typically infeasible in a high-dimensional environment.
- dMAVE, which adapting minimum average variance estimation (MAVE),
- SR.
Existing methods impose either the above two conditional moment conditions or distributional assumptions on the covariate vector in one form or another
identifying the central mean space
- OLS by assuming $\x$ to satisfy the linearity condition
- average derivative estimation, which requires $\x$ to be continuous
- nonlinear least squares
- minimum average variance estimation
- sliced regression
- principal Hessian directions which requires $\x$ to satisfy both the linearity condition and the constant variance condition.
- minimizing a Kullback-Leibler distance.
Proposal
Casting the dimensional-reduction problem in the semiparametric framework, the dimension-reduction problems become semiparametric estimation problems. And powerful semiparametric estimation and inference tools become applicable.
superiority:
- relaxation of the linearity condition and the constant variance condition.
estimating the central subspace via semiparametric
Let $\x$ be a $p\times 1$ covariate vector and $Y$ a univariate response variable. The goal of sufficient dimension reduction is to seek a matrix $\bbeta$ such that
\[\begin{equation} F(y\mid\x)=F(y\mid\x^T\bbeta)\,, \text{for }y\in\IR\,. \label{eq:semiparam} \end{equation}\]The column space of $\bbeta$ satisfying $\eqref{eq:semiparam}$ is called a dimension-reduction subspace. Since the dimension-reduction subspace is not unique, the primary interest is the central subspace, which is defined as the interaction of all dimension-reduction subspaces, provided that the interaction itself is a dimension-reduction subspace.
The likelihood of one random observation $(\x,Y)$ is
\[\eta_1(\x)\eta_2(Y,\x^T\bbeta)\]where $\eta_1$ and $\eta_2$ are infinite-dimensional nuisance parameters while $\bbeta$ as the parametric estimation problem.
Influence functions can be viewed as normalized elements in a so called nuisance tangent space orthogonal complement $\Lambda^\perp$. Derive the orthogonal complement, and obtain a general class of estimating equations for any functions $g(Y,\x^T\bbeta)$.