\[\newcommand\bbP}\notag\]Lijun Wang (2022.07.31)
Here is my attempt on reproducing the results of auto-modeling on the many-Normal-means example.
The main algorithm is
and hence
\[\frac{\partial \pi^{(n)}(\theta,\lambda)}{\partial \theta} = % \begin{bmatrix} % 0 \\ % \lambda_2\\ % \vdots\\ % \lambda_m\\ % 0\\ % \vdots\\ % 0 % \end{bmatrix} \begin{bmatrix} 0 & \lambda_2 & \lambda_3 & \cdots & \lambda_m & 0 & \cdots & 0 \end{bmatrix}^T\]And note that $\lambda_i\ge 0$, it follows that \(\lambda_i = \max(0, d_i), i=2,\ldots,m\) where $d_i$ is the $i$-th element of \(E_{(X, Y)\sim \bbP}[\frac{\partial L(\theta\mid X, Y)}{\partial \theta}] - \frac 1n\sum_{i=1}^n\frac{\partial L(\theta\mid x_i, y_i)}{\partial \theta}\) For the many-Normal-means example,
\[\begin{align*} L_i \triangleq L(\theta\mid y_i) &= -\log \sum_{k=1}^m \alpha_k \exp\left(-\frac{(y_i-\eta_k)^2}{2}\right)\\ &= -\log \sum_{k=1}^m \theta_{m+k} \exp\left(-\frac{(y_i-\sum_{j=1}^k\theta_j)^2}{2}\right)\,, \end{align*}\]then
\(\frac{\partial L_i}{\partial \theta_\ell} = \begin{cases} -\frac{\sum_{k=\ell}^m\theta_{m+k}\exp\left(-\frac{(y_i-\sum_1^k\theta_j)^2}{2}\right)(y_i-\sum_1^k\theta_j)}{\sum_{k=1}^m\theta_{m+k}\exp\left(-\frac{(y_i-\sum_1^k\theta_j)^2}{2}\right)} & \ell\le m\\ -\frac{\exp\left(-\frac{(y_i-\sum_1^{\ell-m}\theta_j)^2}{2}\right)}{\sum_{k=1}^{m}\theta_{m+k}\exp\left(-\frac{(y_i-\sum_1^k\theta_j)^2}{2}\right)}& \ell\ge m+1 \end{cases}\) For simplicity, I skip the step of approximation of $\bbP$ by bootstrap samples, and instead directly use the true $\bbP$.
My pseudo Julia code is as follows,
function auto_modeling()
for i = 1:N
θold = θ
λold = λ
λ = sol_λ_given_θ(y, θ, ...)
θ = sol_θ_given_λ(y, λ, ...)
if (‖ θold - θ ‖ < tol) and (‖ λold - λ ‖ < tol)
break
end
end
end
function sol_λ_given_θ(y, θ, ...)
# Equations (3) (4) (5)
end
function sol_θ_given_λ(y, λ, ...)
model = Model(Ipopt.Optimizer)
# express the optimization problem `min G` in language of JuMP and Ipopt
optimize!(model)
end
As for g-modeling, call deconv
function from the R package deconvolveR
.
Like in the paper, I repeat 200 times, then report the average mean square error. The results are as follows:
Compared to the results in the paper,