# WeiYa's Work Yard

##### Posted on Oct 09, 2022

This post is based on

Kingma and Ba (2017) propose ADAM, which is derived from adaptive moment estimation. It is designed to combine the advantages of

Reddi et al. (2018) analyzed the convergence issue of ADAM, and pointed out that the issue can be fixed by endowing with “long-term memory” of past gradient, and they proposed a variance of ADAM, so-called AMSGrad, which not only fix the convergence issue but often also lead to improved empirical performance.

PyTorch has enabled this algorithm via amsgrad option,

# https://pytorch.org/docs/stable/generated/torch.optim.Adam.html