Federated Learning

I would like to learn more about machine learning, and I recently came across an interesting ACM journal: Transactions on Intelligent Systems and Technology.

The latest issue (https://dl.acm.org/toc/tist/2022/13/5) contains many papers on federated learning. I have decided to take some notes, not to forget about them. Luckily you can download many of these papers for free if you are interested.

First of all, what is federated learning? Usually, you have a set of input data, classified, and you apply an algorithm to train a model. For instance, you can train a neural network with this data.

This setting implies that one single entity has access to all the training data, but sometimes this is not possible. For instance, the law can forbid sharing patient data across different hospitals. Also, some counties have laws prohibiting to export personal information outside theirs territory, or protecting consumers’ privacy.

So it is not possible to concentrate all the data in a single place to train a global high-quality model. Models trained just with local data can be less effective, and it is worth finding a way to overcome this limitation.

The approach described in these articles still requires a central coordinator, but it also requires each entity holding the training data to perform model training. From time to time the model parameters are sent from the local entities to the central coordinator. This one updates a global model and sends back the new averaged parameters to each participant. The process is repeated again and again until the training is completed.

Notice that in this way the entities share just parameters and not personal information, overcoming law restrictions.

Some authors report that federated learning has been originally proposed by Google for the use case of Gboard query suggestion (https://arxiv.org/abs/1610.02527), but I did not read this article.

The first paper I read is: Dimitris Stripelis, Paul M. Thompson, and José Luis Ambite. 2022. Semi-Synchronous Federated Learning for Energy-Efficient Training and Accelerated Convergence in Cross-Silo Settings. ACM Trans. Intell. Syst. Technol. 13, 5, Article 78 (October 2022), 29 pages. https://doi.org/10.1145/3524885. It describes synchronous and asynchronous approaches to Federated Learning, and proposes a new algorithm to share model parameters across the participants.

From what I have got, with synchronous learning, each participant trains its model for a fixed number of ephocs and sends an update to the coordinator. When the coordinator has all the updates, it changes the global model that is sent back to the participants. The process then repeats.

If the participants are different, some fast and some slow, the training proceeds slowly and the more powerful sites are underused.

With asynchronous learning, each participant works at their own pace and there is more communication with the central coordinator, each time an epoch is completed. More communication means more energy consumed, and the paper gives some insight into this problem. Another problem is that slower participants continue training on an old model, while the coordinator has a better version, computed using faster participants’ updates.

With their proposal, the training proceeds until a fixed time limit and the updated models are computed and shared across all participants. The paper explains also how the parameters are averaged.

Another interesting point I have learned is the IID/non-IID assumption. IID stands for Independent and Identically Distributed. If all the participants have IID data, it makes much sense to train a model with all of them. If this is not the case, and one participant is different, this can affect the model convergence.

Another interesting paper is: Trung Kien Dang, Xiang Lan, Jianshu Weng, and Mengling Feng. 2022. Federated Learning for Electronic Health Records. ACM Trans. Intell. Syst. Technol. 13, 5, Article 72 (October 2022), 17 pages. https://doi.org/10.1145/3514500.

It provides a survey of existing applications of Federated Learning on Electronic Health Records, to predict in-hospital mortality and acute kidney injury in intensive care units.

Different algorithms are compared, and one can get references and understand a few of their performances. It also describes CIIL cyclic institutional increment learning: a setting where each party trains a model with its data and passes its parameters to the next hospital. The Federated learning approach, with a central entity assembling a common model, outperforms CIIL approach.

A few paragraphs above I was writing that it is fine to share model parameters, as they do not contain personal information. But is this really true? Can the central server guess personal information from parameter updates? Xue Jiang, Xuebing Zhou, and Jens Grossklags. 2022. SignDS-FL: Local Differentially Private Federated Learning with Sign-based Dimension Selection. ACM Trans. Intell. Syst. Technol. 13, 5, Article 74 (October 2022), 22 pages. https://doi.org/10.1145/3517820 gives some insight on this point.

The idea is to trade model accuracy with privacy: the model update will not be accurate as it could be and will convey only part of the information. The global model will converge more slowly.

You can use a random variable to decide which model dimensions you want to share with the central coordinator, and at each update just send these instead of all the updates. In the paper, just k of the smallest or largest model parameters are shared.

The last paper I have read so far is: Bixiao Zeng, Xiaodong Yang, Yiqiang Chen, Hanchao Yu, and Yingwei Zhang. 2022. CLC: A Consensus-based Label Correction Approach in Federated Learning. ACM Trans. Intell. Syst. Technol. 13, 5, Article 75 (October 2022), 23 pages. https://doi.org/10.1145/3519311

Not all participants will have the same ability in assigning correctly the labels to training data. Additionally, with Federated Learning, these samples will not be available to other parties, due to privacy restrictions, so it is not feasible to review them centrally before training the model.

Some of these samples may just be errors, but just removing them may also prevent the model to learn new patterns. Anyway using them early during the training can slow down the learning process or lead to model overfitting.

The method proposed in the article keeps some samples outside the training set during the initial learning epochs. Some thresholds are computed and used later to decide whether to admit the samples in later stages or even correct the label assigned to them.

Written by Giovanni

August 30, 2022 at 2:25 pm

Posted in Varie

Giovanni Bricconi