General information
Organisation
The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.
Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.
The CEA is established in ten centers spread throughout France
Reference
2023-30158
Description de l'unité
Based in Paris-Saclay, CEA List is one of the four institutes under CEA Tech, the technological research branch of CEA. Specializing in intelligent digital systems, it contributes to enhancing the competitiveness of businesses through technology development and transfer.
The expertise and skills cultivated by the 800 research engineers and technicians at CEA List enable the institute to support annually over 200 French and international companies in applied research projects. These projects are based on four programs and nine technological platforms. Since 2003, 21 start-ups have been created as a result of these efforts. Designated as a "Carnot Institute" since 2006, CEA List is currently recognized as the "Digital Technologies Carnot Institute".
The Laboratory of Semantic Analysis of Texts and Images (LASTI) is a team comprising around 25 individuals, including researchers, engineers, and doctoral students. Their research activities focus on technologies for describing and understanding multimedia content (images, text, speech) and multilingual documents, especially at a large scale. The scientific challenges include:
– Developing efficient and robust algorithms for the analysis and extraction of multimedia content, their classification, and semantic analysis
– Reconstructing or fusing heterogeneous data in order to interpret scenes or documents
– Creating methods and tools for constructing, formalizing, and organizing resources and knowledge.
Position description
Category
Engineering science
Contract
Internship
Job title
Text-based cybersecurity attacks classification H/F
Subject
The emergence of AI-generated cybersecurity attacks has paved the way for a new era of digital threats. AI-generated text-based cyber attacks represent a new breed of cyber threats where AI is used to create and execute different malicious activities (phishing, spear phishing, fake news, disinformation, social manipulation, etc). These attacks leverage text generation models to create convincing and contextually relevant textual content. The primary goal of these attacks is to deceive individuals, systems and even nations, leading to various harmful consequences. In this context, it becomes imperative to understand the threats brought by such attacks and develop innovative strategies to mitigate them. The aim of this internship consists in developing AI techniques to detect different types of text-based cyber attacks in general and AI-generated attacks in particular in order to equip network experts with precise tools for identifying patterns of misuse and malicious behaviors.1a
Contract duration (months)
6
Job description
Technically, the internship involves the fields of machine learning (ML) and natural language processing (NLP), and more specifically natural language generation (NLG) and classification techniques. In collaboration with CEA research engineers, the aim will be to train classification models capable of recognizing different types of text-based cyber attacks and distinguishing text-based attacks authored by humans from those generated by AI or by a specific generative model. This internship is meant to be an introduction to research, with the goal of publishing a scientific article if the obtained results are conclusive. The implemented models may also be used to participate in a shared task like AuTexTification (https://sites.google.com/view/autextification/home) and CLIN33 (https://sites.google.com/view/shared-task-clin33/home) or in a challenge like MLMAC (https://mlmac.io/).
This work may be followed by a PhD in a broader context.
Applicant Profile
Engineering degree and/or Master 2 (M2) degree in computer science with a strong interest in artificial intelligence and natural language processing.
Required skills :
working environment : linux
knowledge of text classification techniques
background in natural language generation and language modeling
familiarity with pre-trained language models and large language models
Basic knowledge of the cybersecurity field
programming : Python + PyTorch/TensorFlow
Position location
Site
Saclay
Job location
France, Ile-de-France, Essonne (91)
Location
Palaiseau
Candidate criteria
Languages
French (Fluent)
Prepared diploma
Bac+5 - Diplôme École d'ingénieurs
PhD opportunity
Oui
Requester
Position start date
01/04/2024