# Feature Annealed Independent Rules

##### Posted on (Update: ) 0 Comments

This note is based on Fan, J., & Fan, Y. (2008). High-dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605–2637.

The difficulty of high-dimensional classification is intrinsically caused by the existence of many noise features that do not contribute to the reduction of misclassification rate.

The paper claims that the feature selection is necessary for high-dimensional classification problems. When the independence rule is applied to selected features, the resulting Feature Annealed Independent Rules (FAIR) overcome both the issues of interpretability and the noise accumulation.

Consider the independence classification rule, which classifies the new feature vector $\x$ into class 1 if

where $\mu = (\mu_1+\mu_2) / 2$ and $\D = \diag(\Sigma)$.

The sample version is

where

and

where $S_{kj}^2$ is the sample variance of the $j$-th feature in class $k$.

*once assuming same covariance, why not pooled covariance?*

To extract salient features, the authors appeal to the two-sample $t$-test statistics. The two-sample $t$-statistics for feature $j$ is defined as

then the FAIR takes the following form:

The FAIR works the same way as that we first sort the features by the absolute values of their $t$-statistics in the descending order, and then take out the first $m$ features to classify the data.

I reproduce the simulation as follows, but with small misclassification rates, maybe with different parameters, although I followed the setting described in the paper.