Text-based cybersecurity attacks classification H/F

Vacancy details

General information


The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :
• defence and security,
• nuclear energy (fission and fusion),
• technological research for industry,
• fundamental research in the physical sciences and life sciences.

Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.

The CEA is established in ten centers spread throughout France



Description de l'unité

Based in Paris-Saclay, CEA List is one of the four institutes under CEA Tech, the technological research branch of CEA. Specializing in intelligent digital systems, it contributes to enhancing the competitiveness of businesses through technology development and transfer.
The expertise and skills cultivated by the 800 research engineers and technicians at CEA List enable the institute to support annually over 200 French and international companies in applied research projects. These projects are based on four programs and nine technological platforms. Since 2003, 21 start-ups have been created as a result of these efforts. Designated as a "Carnot Institute" since 2006, CEA List is currently recognized as the "Digital Technologies Carnot Institute".
The Laboratory of Semantic Analysis of Texts and Images (LASTI) is a team comprising around 25 individuals, including researchers, engineers, and doctoral students. Their research activities focus on technologies for describing and understanding multimedia content (images, text, speech) and multilingual documents, especially at a large scale. The scientific challenges include:
– Developing efficient and robust algorithms for the analysis and extraction of multimedia content, their classification, and semantic analysis
– Reconstructing or fusing heterogeneous data in order to interpret scenes or documents
– Creating methods and tools for constructing, formalizing, and organizing resources and knowledge.

Position description


Engineering science



Job title

Text-based cybersecurity attacks classification H/F


The emergence of AI-generated cybersecurity attacks has paved the way for a new era of digital threats. AI-generated text-based cyber attacks represent a new breed of cyber threats where AI is used to create and execute different malicious activities (phishing, spear phishing, fake news, disinformation, social manipulation, etc). These attacks leverage text generation models to create convincing and contextually relevant textual content. The primary goal of these attacks is to deceive individuals, systems and even nations, leading to various harmful consequences. In this context, it becomes imperative to understand the threats brought by such attacks and develop innovative strategies to mitigate them. The aim of this internship consists in developing AI techniques to detect different types of text-based cyber attacks in general and AI-generated attacks in particular in order to equip network experts with precise tools for identifying patterns of misuse and malicious behaviors.1a

Contract duration (months)


Job description

Technically, the internship involves the fields of machine learning (ML) and natural language processing (NLP), and more specifically natural language generation (NLG) and classification techniques. In collaboration with CEA research engineers, the aim will be to train classification models capable of recognizing different types of text-based cyber attacks and distinguishing text-based attacks authored by humans from those generated by AI or by a specific generative model. This internship is meant to be an introduction to research, with the goal of publishing a scientific article if the obtained results are conclusive. The implemented models may  also be used to participate in a shared task like AuTexTification (https://sites.google.com/view/autextification/home) and CLIN33 (https://sites.google.com/view/shared-task-clin33/home) or in a challenge like MLMAC (https://mlmac.io/).

This work may be followed by a PhD in a broader context.

Applicant Profile

Engineering degree and/or Master 2 (M2) degree in computer science with a strong interest in artificial intelligence and natural language processing.

Required skills :

working environment : linux
knowledge of text classification techniques
background in natural language generation and language modeling
familiarity with pre-trained language models and large language models
Basic knowledge of the cybersecurity field
programming : Python + PyTorch/TensorFlow


In accordance with the commitments made by the CEA in favor of the integration of people with disabilities, this job is open to everyone.

Position location



Job location

France, Ile-de-France, Essonne (91)



Candidate criteria


French (Fluent)

Prepared diploma

Bac+5 - Diplôme École d'ingénieurs

PhD opportunity



Position start date