# WeiYa's Work Yard

## Reproduce Auto-modeling on Many-Normal-Means

Lijun Wang (2022.07.31)

$\newcommand\bbP}\notag$

Here is my attempt on reproducing the results of auto-modeling on the many-Normal-means example.

The main algorithm is

• Take another perspective of (3.3), it is just to minimize $G_{\hat\bbP}(\theta, \lambda)$. Thus, we can directly take advantage with existing nonlinear programming solvers, such as Ipopt (Interior Point Optimizer).
• For (3.4) and (3.5), since $\theta_2,\cdots,\theta_m\ge 0$, then
$\pi^{(n)}(\theta,\lambda) = \sum_{k=2}^m \lambda_k\vert \theta_k\vert = \sum_{k=2}^m \lambda_k\theta_k$

and hence

$\frac{\partial \pi^{(n)}(\theta,\lambda)}{\partial \theta} = % \begin{bmatrix} % 0 \\ % \lambda_2\\ % \vdots\\ % \lambda_m\\ % 0\\ % \vdots\\ % 0 % \end{bmatrix} \begin{bmatrix} 0 & \lambda_2 & \lambda_3 & \cdots & \lambda_m & 0 & \cdots & 0 \end{bmatrix}^T$

And note that $\lambda_i\ge 0$, it follows that $$\lambda_i = \max(0, d_i), i=2,\ldots,m$$ where $d_i$ is the $i$-th element of $$E_{(X, Y)\sim \bbP}[\frac{\partial L(\theta\mid X, Y)}{\partial \theta}] - \frac 1n\sum_{i=1}^n\frac{\partial L(\theta\mid x_i, y_i)}{\partial \theta}$$ For the many-Normal-means example,

\begin{align*} L_i \triangleq L(\theta\mid y_i) &= -\log \sum_{k=1}^m \alpha_k \exp\left(-\frac{(y_i-\eta_k)^2}{2}\right)\\ &= -\log \sum_{k=1}^m \theta_{m+k} \exp\left(-\frac{(y_i-\sum_{j=1}^k\theta_j)^2}{2}\right)\,, \end{align*}

then

$$\frac{\partial L_i}{\partial \theta_\ell} = \begin{cases} -\frac{\sum_{k=\ell}^m\theta_{m+k}\exp\left(-\frac{(y_i-\sum_1^k\theta_j)^2}{2}\right)(y_i-\sum_1^k\theta_j)}{\sum_{k=1}^m\theta_{m+k}\exp\left(-\frac{(y_i-\sum_1^k\theta_j)^2}{2}\right)} & \ell\le m\\ -\frac{\exp\left(-\frac{(y_i-\sum_1^{\ell-m}\theta_j)^2}{2}\right)}{\sum_{k=1}^{m}\theta_{m+k}\exp\left(-\frac{(y_i-\sum_1^k\theta_j)^2}{2}\right)}& \ell\ge m+1 \end{cases}$$ For simplicity, I skip the step of approximation of $\bbP$ by bootstrap samples, and instead directly use the true $\bbP$.

My pseudo Julia code is as follows,

function auto_modeling()
for i = 1:N
θold = θ
λold = λ
λ = sol_λ_given_θ(y, θ, ...)
θ = sol_θ_given_λ(y, λ, ...)
if (‖ θold - θ ‖ < tol) and (‖ λold - λ ‖ < tol)
break
end
end
end
function sol_λ_given_θ(y, θ, ...)
# Equations (3) (4) (5)
end
function sol_θ_given_λ(y, λ, ...)
model = Model(Ipopt.Optimizer)
# express the optimization problem min G in language of JuMP and Ipopt
optimize!(model)
end


As for g-modeling, call deconv function from the R package deconvolveR.

Like in the paper, I repeat 200 times, then report the average mean square error. The results are as follows:

Compared to the results in the paper,

• Except for the results of g-modeling, others are close to the reported results in the paper.
• In my experiments, g-modeling can outperform auto-modeling, but I did not deliberately select its parameters. The paper also did not discuss how they chose the parameters for this method. So g-modeling might be better than the reported performance in the paper.
• Currently, the program is relatively slow, and the computational burden is mainly Step 3 in the coordinate descent algorithm. Are there any speed-up strategies?