Personalized Federated Learning with Robust and Sparse Regressions
Posted on (Update: )
Federated learning is an emerging topic due to its advantage in collaborative learning with distributed data.
the paper proposed a personalized federated learning method to address the robust regression problem
- learn the regression weight by solving a Huber loss with the sparse fused penalty.
- design the personalized federated learning for robust and sparse regression (PerFL-RSR) algorithm to solve the estimation problem in the federated system efficiently
In a FL system, a large number of clients will collaboratively train a machine learning model coordinated by a central server.
Rather than transferring all the data to the central server as in the traditional distributed learning system, FL allows the clients to keep their local data and thus gain a basic level of data privacy
However, due to the heterogeneous data-generating mechanism among the clients, there emerge two new challenges that might cause performance deterioration on FL:
- statistical heterogeneity
- system heterogeneity
many recent studies for the data heterogeneity:
- feature normalization: Huang and Belongie (2017), Choi et al. (2021)
- model weight regularization: Karimireddy et al. (2020)
- improving aggregation: Reddi et al. (2020), Wang et al. (2020)
however, the above methods focus on training a global shared model, which means that they assume a common global model for all clients. the global model-sharing assumption might not be appropriate, especially when data distributions differ significantly among clients
the traditional FL with the global model sharing assumption would sacrifice the generalizability of the model
to balance generalization and personalization, personalized federated learning (PFL) is proposed
Rather than aggregation and generation of one global model in FL, in PFL, the sever needs to learn the relationship or similarity between local models and then generate personalized models for each client.
many recent studies for PFL:
- fine-tuning: Fallah, Mokhtari, and Ozdaglar (2020)
- multi-task learning (MTL): Smith et al. (2017), Huang et al. (2021)
- clustered federated learning: Ghosh et al. (2020), Sattler, Muller, and Samek (2020)
- parameter decoupling: Arivazhagan et al. (2019)
- knowledge distillation: Li and Wang (2019)
this work combines the idea of MTL and clustered FL
- MTL achieves personalization by studying the similarity between local client models
- clustered FL achieves personalization through inherent partitions of all local client models
system heterogeneity becomes a considerable bottleneck in FL, especially when there are a large number of local clients in the network, such as learning over mobile phones, wearable devices, and autonomous vehicles.
- because of the different conditions of the clients, e.g., the network connection and power status of the devices, it may be impractical to involve all the clients in each communication round
- thus, the paper considers designing a federated learning algorithm that allows a low participation rate per model iteration round
in the paper, they focus on personalized federated learning for robust regression problems
- the study is motivated by the fact that many real-world datasets are contaminated with heavy-tailed noises and abnormal values, especially for the federated learning system with massive data sources
- and they consider that the regression problem is high-dimensional, the high dimensionality issue necessitates the model sparsity recovery.
in this work, the objective is to develop a personalized federated learning method for robust and sparse regression.
main contributions:
- balance the tradeoff of personalization and generalization for the robust sparse regression problem under the federated learning paradigm
- propose a novel learning loss, which consists of the Huber loss for robustness, the client-wise fusion regularizer for personalization, and the sparse regularizer for sparsity recovery
- develop an alternating direction method of multipliers (ADMM) based algorithm in the federated server-client system to solve the proposed loss, called PerFL-RSR
- it addresses system heterogeneity and communication efficiency through random sampling clients for each update on the server.
- establish the convergence theory for the proposed Perfl-RSR algorithm.
- establish the consistency properties for the proposed estimator
Federated MTL:
- MOCHA algorithm (Smith et al., 2017)
- FedAMP (Huang et al., 2021): propose a message-passing mechanism to solve loss with pairwise regularization terms among all clients
- both are designed for convex objectives and are not applicable to the proposed non-convex loss
clustering FL assumes that there are homogeneous groups of clients in terms of local data distributions
- IFCA (Ghosh et al., 2020) has been proposed to learn $K$ global models on the server, and then each client can choose a model with the smallest local loss
- the server needs to communicate $K$ times more information
- prior knowledge of the number of groups $K$ is required
- Sattler, Muller, and Samek (2020):
- authors proposed an algorithm with a post-processing step of clustering
- use recursive bi-partitioning clustering, which increases computation and communication consists
robust regressions:
- median-on-means
- quantile regression
- Huber loss
consider a federation of $M$ clients
- each client owns a local datasets $\cD_m = \{x_{mi}, y_{mi}\}_{i=1}^{n_m}, m=1,\ldots,M$
- assume client-specific linear models: $y_{mi} = x_{mi}^\top \beta_m^\star + \epsilon_{mi}$
adopt the regularization term on the scalar-wise and pair-wise differences on $\beta_{mj}$ and $\beta_{m’j}$ for $j\in [p]$.
then the PEL solution is obtained by solving
\[\argmin_{\beta_1,\ldots,\beta_M\in \IR^p} \frac 1M\sum_{m=1}^M l_m(\beta_m) + \sum_{j=1}^p\sum_{m\le m'} p_\lambda(\vert \beta_{mj} - \beta_{m'j}\vert)\]