Reluctant Interaction Modeling
Posted on
This note is based on Yu, G., Bien, J., & Tibshirani, R. (2019). Reluctant Interaction Modeling. ArXiv:1907.08414 [Stat].
Motivated by the large-scale problem sizes, the paper adopt a very simple guiding principle:
One should prefer a main effect over an interaction if all else is equal.
Consider the following two-way interaction model:
\[\begin{equation} Y=X^T\beta^* + Z^T\gamma^* + \varepsilon\,,\label{eq:1} \end{equation}\]where $X\in\bbR^p$ is a $p$-dimensional random vector of main effects, $Z=(X_1X_1, X_1X_2,\ldots,X_pX_p)\in\bbR^{(p^2+p)/2}$ is the random vector of all pairwise interactions of $X$.
All pairs lasso (APL) becomes infeasible to compute as $p$ gets large, which takes $O(np^2)$ space.
The paper introduce a computationally viable approach to interaction modeling, called sprinter (for sparse reluctant interaction modeling). not found package yet
- one should prefer main effects over interactions given similar prediction performance. The authors emphasize that this principle is distinct from (although reminiscent of) the common heredity principle really? Sprinter is a multiple-stage method: the first stage it tries to capture as much of the variability in the response as possible without resorting to interactions; in the second stage it includes only interactions that capture signal that cannot be captured by main effects. MARS?
- faster than APL
- finite-sample theoretical properties of sprinter
Related Methods
- hierarchy assumption: an interaction effect is in the model only if either (or both) of the main effects corresponding to the interaction are in the model
- interaction pursuit (IP): first seek a subset of the original $p$ variables that are involved in the nonzero interactions and then restricting attention to interactions between these selected variables.
- easiest when the interactions are concentrated among a small set of original variables, but it would be challenged when there is no such concentration of interactions over a small set of original variables.
- other screening-based methods:
- select interactions based on the partial correlation between the response and each interaction
- screens interactions based on the three-way joint cumulant between the response and two main effects that consists of an interaction.
A reluctance principle
Consider a simple model
\[Y = X_1+X_2+X_1\times X_2\,,\]where $X_1 = 1_A$ and $X_2 = 1_B$. Suppose with high probability $A\subseteq B$, so that $X_1X_2=1_A1_B=1_{A\cap B}\approx 1_A$.
It claims that although the proposed principle and the well-known hierarchical assumption simplify the search of interactions by focusing on certain main effects, the principle does not explicitly tie an interaction to its corresponding main effects concerns
In model \eqref{eq:1}, if $X$ and $Z$ were (so impossible?) uncorrelated, this would be a unique decomposition. However, there can be “overlap” between these two signal terms.
Let $X^T\vartheta^*$ be the part of $Z^T\gamma^*$ that be explained by a linear combination of $X$, i.e.,
\[\vartheta^* :=\arg\min_{\vartheta\in\bbR^p} \Var(Z^T \gamma^* -X^T\vartheta) = \Cov(X)^{-1}\Cov(X,Z) \gamma^* = \Sigma^{-1}\Phi \gamma^*\,,\]thus \eqref{eq:1} can be written as
\[Y = X^T(\beta^* + \vartheta^* )+W^T \gamma^* + \varepsilon\,,\]where $W=Z-\Phi^T\Sigma^{-1}X$ is the “pure” interaction effects that cannot be captured by linear combinations of $X$.