“Draft:Federated Machine Learning”的意思、由来-开放百科全书

Federated machine learning (or federated learning) ^[1],^[2] is a machine learning framework that helps multiple parties or organizations to effectively and collaboratively use data, create and apply models in accordance with user privacy protection, data security, data confidentiality and government regulations (such as GDPR ^[3]). Often times the federated machine learning (FML) solution is a distributed machine learning algorithm and system ^[4].

Typically, when different parties work together to build a collaborative model, they need to contribute various portions of their own data. The models shall possess the following properties ^[6]:

Federated machine learning can be applied in many industries which are sensitive about privacy protection. Some examples include collective training of smart mobile functionalities ^[7], financial services, logistics, supply chain management, telecommunications solutions and operators, client-cloud systems, healthcare, etc.

1) Horizontal federated learning ^[8]: This is a sample-based federated learning framework. This represents scenarios that data sets share the same feature space but contain different samples (Figure 1).

For example, two mobile users of the same system may share the same set of sensors on their phones. Thus, their data have similar features; in many cases, their feature spaces are the same. An example is Google, which proposed a horizontal federated learning solution for Android phone model updates ^[9]. In that framework, a single user using an Android phone updates the model parameters locally and uploads the parameters to the Android cloud, thus jointly training the centralised model together with other data owners. A secure aggregation scheme is proposed to protect the privacy of aggregated user updates under their federated learning framework ^[10].

2) Vertical Federated Learning ^[11]: Vertical federated learning is also known as feature-based federated learning. It presents cases in which two data sets share the same sample ID space but differ in their feature spaces (Figure 2).

For example, consider two different companies in the same city, where one is a bank, and the other is an e-commerce company. Their user sets are likely to contain most of the common residents of the area, so the intersection of their user space is large. However, since the bank records the user’s revenue and expenditure behaviour and credit rating, and the e-commerce retains the user’s browsing and purchasing history, their feature spaces are very different. Vertical federated learning is the process of aggregating these different features and computing the training loss and gradients in a privacy-preserving manner to build a model with data from both parties collaboratively.

3) Federated Transfer Learning (FTL) ^[12]: Federated Transfer Learning applies to the scenarios that the two data sets differ not only in samples but also in feature spaces (Figure 3). Consider two institutions, one is a bank located in one area, and the other is an e-commerce company located in another area.

Due to geographical restrictions, the user groups of the two institutions have a small intersection. However, due to the different businesses, only a small portion of the feature space from both parties overlaps. In this case, transfer learning techniques ^[13] are applied to provide solutions for the entire sample and feature spaces under a federation. Specially, a common representation between the two feature space is learned using the limited common sample sets and later applied to obtain predictions for samples with only one-side features.

Led by Professor Qiang Yang ^[14], Chief AI Officer (CAIO) of WeBank, the Federated AI Ecosystem (FedAI) ^[15] is spearheading a global open initiative to carry out research, development, and deployment of federated learning technologies in applications with strong data privacy concerns such as banking and healthcare. They are working on establishing a Federated Machine Learning Standard with the IEEE (IEEE P3652.1) ^[16]. FedAI has published source codes for the building blocks of federated learning – the Federated AI Technology Enabler (FATE) ^[17] – as well as tutorial materials ^[18] online. AI researchers and engineers can make use of these open resources to contribute towards making more complex and capable privacy preserving machine learning technologies compliant with stricter laws governing AI.

References

1. ^H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas. Federated learning of deep networks using model averaging. CoRR,abs/1602.05629, 2016.
2. ^Qiang Yang, Yang Liu, Tianjian Chen & Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019
3. ^Regulation (EU) 2016/679 of the European Parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). Technical report, European Union, 2016
4. ^Mu Li, David G. Anderson, Jun Woo Park, Alexander J. Sloma, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita & Bor-Yiing Su. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, pp. 583-598, 2014.
5. ^Qiang Yang, Yang Liu, Tianjian Chen & Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019
6. ^Qiang Yang, Yang Liu, Tianjian Chen & Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019
7. ^https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
8. ^Qiang Yang, Yang Liu, Tianjian Chen & Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019
9. ^https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
10. ^Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS’17). ACM, New York, NY, USA, 1175–1191.
11. ^Qiang Yang, Yang Liu, Tianjian Chen & Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019
12. ^Qiang Yang, Yang Liu, Tianjian Chen & Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019
13. ^Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering vol. 22, no. 10, pp. 1345–1359, 2010.
14. ^https://en.wikipedia.org/wiki/Qiang_Yang
15. ^https://www.fedai.org/
16. ^https://sagroups.ieee.org/3652-1/
17. ^https://github.com/webankfintech/fate
18. ^https://www.fedai.org/#/conferences/link_aaai2019
19. ^https://www.fedai.org/#/conferences/link_aaai2019
20. ^https://aaai.org/Conferences/AAAI-19/invited-speakers/#yang

References

In this revision, we address the comments by the editor to include more secondary sources and figures and make the style of writing more like an encyclopedia.