Le Laboratoire Instrumentation Intelligente, Distribuée et Embarquée (LIIDE) a pour mission de développer une plateforme mixte, matérielle et logicielle, pour concevoir les fonctionnalités de l'instrumentation du futur. Le laboratoire développe conjointement 1) le volet matériel, visant des cartes électroniques polyvalentes et modulaires, accompagnées des logiciels nécessaires à leur fonctionnement, pour couvrir une large gamme de technologie de capteurs ; et 2) des fonctionnalités innovantes d'intelligence artificielle pour la mesure répartie et l'apprentissage frugal et distribué.
Le laboratoire est ancré dans un environnement riche centré autour de l'instrumentation numérique pour le contrôle, le monitoring et le diagnostic. Le département auquel il appartient s'appuie sur une large gamme de capteurs (fibres optiques, capteurs piézo-électriques, sondes Courants de Foucault, rayons X) ainsi que sur des plateformes d'expérimentation de pointe. Les applications sont principalement focalisées sur le contrôle non-destructif (Non-Destructive Evaluation - NDE) ou la surveillance de l'état de santé de structures (Structural Health Monitoring - SHM).
Federated learning was introduced in 2016 by Google [1] as a new machine learning paradigm where multiple entities (clients) collaborate in solving a machine learning problem under the coordination of a central server. Each client trains a local model using its private data, and only model parameters are exchanged between the clients and the server, without exposing clients' private data [2].
Based on how the training data is partitioned in the sample and feature space, federated learning can be categorised into horizontal, vertical, and federated transfer learning [3]. Horizontal federated learning (HFL) refers to a federated learning configuration where clients share the same feature space, but have different samples. In vertical federated learning (VFL), clients share similar samples, but hold different features [4]. The aim of this internship is to study the key challenges of VFL and the solutions to designing privacy-preserving and effective VFL.
The need for VFL is clear and growing as companies and institutions from different fields own data with different, yet complementary, attributes of overlapping samples; thus, they need to collaborate to develop a more effective model without sharing private data. Many existing works focus on HFL, while VFL has been much less investigated in comparison. The design of VFL is fundamentally different from that of HFL [5]; therefore, the existing solutions for HFL may not be applicable for VFL.
The objective of this internship is to investigate and analyse VFL. We aim at understanding the challenges inherent in VFL and identifying the solutions to address the privacy concern (e.g., privacy-preserving entity resolution [6], homomorphic encryption, secure multi-party computation) in VFL design as well as to improve the effectiveness of the trained model.
The internship will proceed as follows:
Conduct a literature review on VFL to identify the fundamental considerations for designing VFL (e.g., data partitions, training protocols, learning algorithms) and its key challenges;
Study the state-of-the-art solutions to improve privacy and effectiveness of VFL and determine the optimal compromise between the two objectives;
Conduct an empirical evaluation of the solutions;
Implement a software component for simulating a VFL environment and testing different VFL configurations, which can be integrated into our federated learning platform.
[1] McMahan, B., Moore, E., Ramage, D., Hampson, S. and y Arcas, B.A., 2017, April. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR.
[2] Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R. and D’Oliveira, R.G., 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2), pp.1-210.
[3] Yang, Q., Liu, Y., Chen, T. and Tong, Y., 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), pp.1-19.
[4] Cheng, Y., Liu, Y., Chen, T. and Yang, Q., 2020. Federated learning for privacy-preserving AI. Communications of the ACM, 63(12), pp.33-36.
[5] Liu, Y., Kang, Y., Zou, T., Pu, Y., He, Y., Ye, X., Ouyang, Y., Zhang, Y.Q. and Yang, Q., 2022. Vertical federated learning. arXiv preprint arXiv:2211.12814.
[6] Hardy, S., Henecka, W., Ivey-Law, H., Nock, R., Patrini, G., Smith, G. and Thorne, B., 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677.
The candidate should be in the last year of an engineering school or a master student (Bac+5) in a field related to machine learning/AI, who wishes to conduct research and development in an emerging, yet impactful field, in a collaborative environment. The intern will work in a team of researchers, post-docs, and PhD students who are actively investigating various challenges and aspects of federated learning. The candidate should have knowledge in machine learning and optimisation, and be skilled in Python programming and in using various machine learning libraries and frameworks.