AI Privacy Risks and Mitigations in Large Language Models
AI Privacy Risks and Mitigations in Large Language Models
SUPPORT POOL
OF EXPERTS PROGRAMME
By Isabel BARBERÁ
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
As part of the SPE programme, the EDPB may commission contractors to provide reports and tools on
specific topics.
The views expressed in the deliverables are those of their authors and they do not necessarily reflect
the official position of the EDPB. The EDPB does not guarantee the accuracy of the information
included in the deliverables. Neither the EDPB nor any person acting on the EDPB’s behalf may be held
responsible for any use that may be made of the information contained in the deliverables.
Some excerpts may be redacted or removed from the deliverables as their publication would
undermine the protection of legitimate interests, including, inter alia, the privacy and integrity of an
individual regarding the protection of personal data in accordance with Regulation (EU) 2018/1725
and/or the commercial interests of a natural or legal person.
2
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
TABLE OF CONTENTS:
Disclaimer by the Author: The examples and references to companies included in this report are provided for illustrative purposes only and
do not imply endorsement or suggest that they represent the sole or best options available. While this report strives to provide thorough
and insightful information, it is not exhaustive. The technology analysis reflects the state of the art as of March 2025 and is based on
extensive research, referenced sources, and the author's expertise. For transparency reasons, the author wants to inform the reader that a
LLM system has been used for the exclusive purpose of improving the readability and formatting of parts of the text.
3
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
The risk management methodology outlined in this document is designed to help developers and
users systematically identify, assess, and mitigate privacy and data protection risks, supporting the
responsible development and deployment of LLM systems.
This guidance also supports the requirements of the GDPR Article 25 Data protection by design and by
default and Article 32 Security of processing by offering technical and organizational measures to help
ensure an appropriate level of security and data protection. However, the guidance is not intended to
replace a Data Protection Impact Assessment (DPIA) as required under Article 35 of the GDPR. Instead,
it complements the DPIA process by addressing privacy risks specific to LLM systems, thereby
enhancing the robustness of such assessments.
Below is an overview of the document’s structure and the topics covered in each section:
2. Background
This section introduces Large Language Models, how they work, and their common applications. It also
discusses performance evaluation measures, helping readers understand the foundational aspects of
LLM systems.
5. Data Protection and Privacy Risk Assessment: Risk Estimation & Evaluation
Guidance on how to analyse, classify and assess privacy risks is provided here, with criteria for
evaluating both the probability and severity of risks. This section explains how to derive a final risk
evaluation to prioritize mitigation efforts effectively.
4
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
5
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
2. Background
What Are Large Language Models?
Large Language Models (LLMs) represent a transformative advancement in artificial intelligence. These
general purpose models are trained on extensive datasets, which often encompass publicly available
content, proprietary datasets, and specialized domain-specific data. Their applications are diverse,
ranging from text generation and summarization to coding assistance, sentiment analysis, and more.
Some LLMs are multimodal LLMs, capable of processing and generating multiple data modalities such
as image, audio or video.
The development of LLMs has been marked by key technological milestones that have shaped their
evolution. Early advancements in the 1960s and 1970s included rule-based systems like ELIZA, which
laid foundational principles for simulating human conversation through predefined patterns. In 2017,
the introduction of transformer architectures (see Figure 2) in the seminal paper "Attention Is All You
Need"1 revolutionized the field by enabling efficient handling of contextual relationships within text
sequences. Subsequent developments, such as OpenAI’s GPT series and Google’s BERT (see Figure 3),
have set benchmarks for natural language processing (NLP)2, culminating in models like GPT-4, LaMDA3,
and DeepSeek-V34 (see Figure 4) integrating multimodal capabilities.
1
A.Vaswan et al., ‘Attention Is All You Need’ (2023) https://arxiv.org/pdf/1706.03762
2 Wikipedia, ‘Natural language processing’ (2025) https://en.wikipedia.org/wiki/Natural_language_processing
3 E.Collins and Z.Ghahramani, ‘LaMDA: our breakthrough conversation technology’ (2021) https://blog.google/technology/ai/lamda/
4
Github, ‘Deepseek’ (n.d) https://github.com/deepseek-ai/DeepSeek-V3
5
Wikipedia, ‘Deep Learning Architecture’ (2025) https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)
6
Artificial Intelligence, ‘Why does the transformer do better than RNN and LSTM in long-range context dependencies?’ (2020)
https://ai.stackexchange.com/questions/20075/why-does-the-transformer-do-better-than-rnn-and-lstm-in-long-range-context-depen
7
A.Gu, T.Dao, ‘Mamba: Linear-Time Sequence Modeling with Selective State Spaces’ (2024) https://arxiv.org/pdf/2312.00752, B.Peng et al,
‘RWKV: Reinventing RNNs for the Transformer Era’ (2023) https://arxiv.org/pdf/2305.13048
8
Y.Liu et al., ‘Understanding LLMs: A Comprehensive Overview from Training to Inference’ (2024) https://arxiv.org/pdf/2401.02038v2
6
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Text is cleaned and normalized by removing inconsistencies (e.g., special characters) and
irrelevant content, ensuring uniformity in the training data.
Text data is broken into smaller units called tokens, which can be words, subwords, or even
individual characters. Tokenization algorithms transforms unstructured text into manageable
sequences for computational processing.
Tokens are converted into numerical IDs that represent their vocabulary position. These IDs are
then transformed into word embeddings9—dense vector representations that capture semantic
similarities and relationships between words. For instance, semantically related words like
“king” and “queen” will occupy nearby positions in the embedding space.
3. Transformer Architecture:10
Transformer architectures can be categorized into three main types: encoder-only, encoder-
decoder, and decoder-only. While encoder-only architectures were foundational in earlier models,
they are generally not used in the latest generation of LLMs. Most state of the art LLMs today use
decoder-only architectures, while encoder-decoder models are still used in tasks like translation
and instruction tuning.
Encoder:11
The encoder takes the input text and converts it into a contextualized representation by
analyzing relationships between words. Key elements include:
o Token embeddings: Tokens are transformed into numerical vectors that capture their
meaning.
o Positional encodings: Since the transformer processes words in parallel, positional
encodings are added to token embeddings to represent the order of words, preserving
the structure of the input.
o Attention mechanisms: The encoder evaluates the importance of each word relative to
others in the input sequence, capturing dependencies and context. For example, it
helps distinguish between “park” as a verb and “park” as a location based on the
surrounding text.
o Feed-Forward Network: A series of transformations are applied to refine the
contextualized word representations, preparing them for subsequent stages.
9
V.Zhukov, ’A Guide to Understanding Word Embeddings in Natural Language Processing (NLP)’ (2023) https://ingestai.io/blog/word-
embeddings-in-nlp
10
See footnote 1
11
Geeksforgeels, ‘Architecture and Working of Transformers in Deep Learning’ (2025) https://www.geeksforgeeks.org/architecture-and-
working-of-transformers-in-deep-learning/
7
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Decoder:12
The decoder generates text by predicting one token at a time. It builds upon the encoder’s
output (if used) and the sequence of tokens already generated. Key elements include:
o Input: Combines encoder outputs with tokens generated so far.
o Attention mechanisms:13 Ensures each token considers previously generated tokens to
maintain coherence and context.
o Feed-Forward Network (FFN):14 This layer refines the token representations to ensure
they are relevant and coherent.
o Masked attention: During training, future tokens are hidden from the model, ensuring
it predicts outputs step by step without "cheating".
Figure 3. A comparison of the architectures for the Transformer, GPT and BERT.
Source: B.Smith ‘A Complete Guide to BERT with Code’ (2024)
https://towardsdatascience.com/a-complete-guide-to-bert-with-code-9f87602e4a11
12
idem
13
The architecture of DeepSeek models contains an innovative attention mechanism called Multi-head Latent Attention (MLA) that
compresses Key/Value vectors offering better compute and memory efficiency.
14
DeepSeek models employ the DeepSeekMoE architecture based on Mixture-of-Experts (MoE) introducing multiple parallel expert
networks (FFNs) instead of a single FFN.
8
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Mixture of Experts (MoE) is a technique used to improve transformer-based LLMs making them more
efficient and scalable. Instead of using the entire model for every input, MoE activates only a few
smaller parts of the model—called "experts"—based on what the input needs. This means the model
can be much larger overall, but only the necessary parts are used at any time, saving computing power
without losing performance
Figure 4. Illustration of DeepSeek-V3’s basic architecture called DeepSeekMoE based on Mixture-of-Experts (MoE).
Source: ‘DeepSeek-V3 Technical Report’
https://arxiv.org/pdf/2412.19437
15
‘PyTorch Loss.backward() and Optimizer.step(): A Deep Dive for Machine Learning’ (2025) https://iifx.dev/en/articles/315715245
16
C.R. Wolfe, ‘Understanding and Using Supervised Fine-Tuning (SFT) for Language Models ‘ (2023)
https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
9
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
process often involves the use of techniques such as supervised fine-tuning on domain-specific data or
Reinforcement Learning with Human Feedback (RLHF). The most common alignment methods are:
17
Bergmann, D. ‘What IS fine-tuning?’ (2024) https://www.ibm.com/think/topics/fine-tuning
18
D.Bergman, ‘What is instruction tuning?’ (2024) https://www.ibm.com/think/topics/instruction-tuning
19
S. Chaudhari et al. ‘RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs’ (2024)
https://arxiv.org/abs/2404.08555
20
R. Rafailov, ’ Direct Preference Optimization: Your Language Model is Secretly a Reward Model’ (2024) https://arxiv.org/abs/2305.18290
21
Z.Shao, ‘DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models’ (2024)https://arxiv.org/abs/2402.03300
22 Stryker, C. et al., ‘What is parameter-efficient fine-tuning (PEFT)?’ (2024) https://www.ibm.com/think/topics/parameter-efficient-fine-
tuning
23
AWS, ‘What is RAG (Retrieval-Augmented Generation)’? (2025) https://aws.amazon.com/what-is/retrieval-augmented-generation/
24
Wikipedia, ‘Retrieval Augmented Generation’ (2025) https://en.wikipedia.org/wiki/Retrieval-augmented_generation
25
IBM, ‘Retrieval Augmented Generation’ (2025) https://www.ibm.com/architectures/hybrid/genai-rag?mhsrc=ibmsearch_a&mhq=RAG
26
V.Chaba, ‘Understanding the Differences: Fine-Tuning vs. Transfer Learning ‘ (2023) https://dev.to/luxacademy/understanding-the-
differences-fine-tuning-vs-transfer-learning-370
27
Wikipedia, ‘Transfer Learning’ (2025) https://en.wikipedia.org/wiki/Transfer_learning
28
Nebuly AI, ‘LLM Feedback Loop’ (2024) https://www.nebuly.com/blog/llm-feedback-loop
10
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
The three key stages described outline how a traditional text-only LLM is developed. Multimodal LLMs
follow a similar process but to handle multiple data modalities, they incorporate specialized
components such as modality-specific encoders, connectors and cross-modal fusion mechanisms to
integrate the different data representations, along with a shared decoder to generate coherent outputs
across modalities. Their development also involves pre-training and fine-tuning stages; however, some
architectures build multimodal LLMs by fine-tuning an already pre-trained text-only LLM rather than
training one from scratch.
In practice, LLMs are often part of a system and can be accessed directly via APIs, are embedded within
SaaS platforms, deployed as off-the-shelf foundational models fine-tuned for specific use cases, or
integrated into on-premise solutions. It is important to note that while LLMs are essential components
of AI systems, they do not constitute AI systems on their own. For an LLM to become part of an AI
system, additional components such as a user interface, must be integrated to enable it to function as
a complete system.30Throughout this document, we will refer to such complete systems as LLM-based
systems or simply LLM systems to emphasize their broader context and functionality. This distinction is
crucial when assessing the risks associated with these systems, as an LLM system inherently carries more
risks due to its additional components and integrations compared to a standalone LLM.
29
Wikipedia, ‘Softmax Function’ (2025) https://en.wikipedia.org/wiki/Softmax_function
30
Recital 97 AI Act
11
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Each stage of an LLM’s development lifecycle could introduce potential privacy risks, as the model
interacts with large datasets that might contain personal data and it generates outputs based on that
data. Some of the key privacy concerns may occur during:
The collection of data: The training, testing and validation set could contain identifiable
personal data, sensitive data or special category of data.
Inference: Generated outputs could inadvertently reveal private information or contain
misinformation.
RAG process: We might use knowledge bases containing sensitive data or identifiable personal
data without implementing proper safeguards.
Feedback loops: User interactions might be stored without adequate safeguards.
Workflows are structured systems where LLMs and tools operate in a predefined manner,
following orchestrated code paths.
Agents, in contrast, are designed to function dynamically. They allow LLMs to autonomously
direct their processes and determine how to use tools and resources to achieve objectives.
The architecture of an AI agent focuses on critical components that work together to enable
sophisticated behavior and adaptability in real-world scenarios. The architecture is modular, involving
distinct components for perception, reasoning, planning, memory management, and action. This
12
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
modularity allows the system to handle complex tasks, interact dynamically with their environment,
and refine performance iteratively.
1. Perception module
This module handles the agent’s ability to process inputs from the environment and format them into
a structure that the LLM can understand. It converts raw inputs (e.g., text, voice, or data streams) into
embeddings or structured formats that can be processed by the reasoning module.
2. Reasoning module
The reasoning module enables the agent to interpret input data, analyze its context, and decompose
complex tasks into smaller, manageable subtasks. It leverages the LLM’s ability to understand and
process natural language to make decisions. The reasoning mechanism enables the agent to analyze
user inputs to determine the best course of action and leverage the appropriate tool or resource to
achieve the desired outcome.
3. Planning module
The planning module determines how the agent will execute the subtasks identified by the reasoning
module. It organizes and sequences actions to achieve a defined goal.
o Short-Term Memory: Maintains context within the current interaction to ensure coherence in
responses.
o Long-Term Memory: Stores user preferences, past interactions, and learned insights for
personalization.
5. Action module
This module is responsible for executing the plan and interacting with the external environment. It
carries out the tasks identified and planned by earlier modules. The agent must have access to a
defined set of tools, such as APIs, databases, or external systems, which it can use to accomplish the
specific tasks. For example, an AI assistant might use a calendar API for scheduling or a booking service
for travel reservations.
13
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Figure 6. Source: Z.Deng et al. AI ‘Agents Under Threat: A Survey of Key Security Challenges and Future Pathways’ (2024)
https://www.researchgate.net/figure/General-workflow-of-AI-agent-Typically-an-AI-agent-consists-of-three-components_fig1_381190070
To fully leverage the capabilities of LLMs within organizations, it is essential to adapt the models to the
organization's specific knowledge base and business processes. This customization, often achieved by
fine-tuning the LLM with organization-specific data, can result in a domain-focused small language
model (SLM).37
Model Orchestration38
For agentic AI to seamlessly integrate the strengths of both SLMs39 and LLMs, a system is needed to
dynamically manage which model handles which task. This is where model orchestration plays a
critical role, ensuring efficient and secure collaboration between different models. In agentic AI,
orchestration determines the most appropriate model—LLM or SLM—for a given task, routes inputs
accordingly, and combines their outputs into a unified response.
Privacy Concerns40
The growing adoption of AI agents powered by LLMs, brings the promise of revolutionizing the way
humans work by automating tasks and improving productivity. However, these systems also introduce
significant privacy risks that need to be carefully managed:
36
Cabalar, R., ‘What are small language models?’ (2024) https://www.ibm.com/think/topics/small-language-models
37
D. Biswas, ICAART, ‘Stateful Monitoring and Responsible Deployment of AI Agents’, (2025)
38
Windland, V. et al. ’What is LLM orchestration’ (2024) https://www.ibm.com/think/topics/llm-orchestration
39 D. Vellante et al., ‘From LLMs to SLMs to SAMs, how agents are redefining AI’ (2024) https://siliconangle.com/2024/09/28/llms-slms-sams-
agents-redefining-ai
40
B.O’Neill, ‘What is an AI agent? A computer scientist explains the next wave of artificial intelligence tools’ (2024)
https://theconversation.com/what-is-an-ai-agent-a-computer-scientist-explains-the-next-wave-of-artificial-intelligence-tools-242586
14
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
To perform their tasks effectively, AI agents often require access to a wide range of user data,
such as:
o Internet activity: Browsing history, online searches, and frequently visited websites.
o Personal applications: Emails, calendars, and messaging apps for scheduling or
communication tasks.
o Third-party systems: Financial accounts, customer management platforms, or other
organizational systems.
This level of access significantly increases the risk of unauthorized data exposure, particularly if
the agent's systems are compromised.
AI agents are designed to make decisions autonomously, which can lead to errors or choices that
users may disagree with.
Like other AI systems, AI agents are susceptible to biases originating from their training data,
algorithms and usage context.
Privacy trade-offs for user convenience:41 As AI agents grow more capable, users will need to consider
how much personal data they are willing to share in exchange for convenience. For example, an agent
might save time by managing travel bookings or negotiating purchases but requires access to sensitive
information such as payment details or login credentials42. Balancing these trade-offs requires clear
communication about data usage policies and robust consent mechanisms.
Accountability for Agent decisions:43 AI agents operate in complex environments and may encounter
unforeseen challenges. When an agent makes an error, or its actions cause harm, determining
accountability can be difficult. Organizations must ensure transparency in how decisions are made and
provide mechanisms for users to intervene when errors occur.
OpenAI's GPT44 Series (Generative Pre-Trained Transformer (GPT) models), are renowned for their
advanced language processing capabilities. These models are accessible through APIs, enabling
businesses to integrate sophisticated language understanding and generation into their
applications.
Google's Gemini45 models are designed to assist with various tasks, providing users with detailed
information and facilitating complex queries.
Claude’s Anthropic Models46 are developed with safety and alignment in mind. Claude specializes
in conversational AI with a focus on ethical and secure interactions.
Several European companies and collaborations are contributing to the LLM landscape:
41 Z.Zhang et al. ‘"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based
Conversational Agents’ (2024) https://arxiv.org/abs/2309.11653
42
Login credentials are the unique information used to access systems, accounts, or services, typically consisting of a username and
password, but they can also include additional methods like two-factor authentication, biometric data, or security PINs for added protection.
43
J. Zeiser, ‘Owning Decisions: AI Decision-Support and the Attributability-Gap’ (2024). https://doi.org/10.1007/s11948-024-00485-1
44
ChatGPT (https://chatgpt.com/)
45
Gemini (https://gemini.google.com/)
46
Claude (https://claude.ai/)
15
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Mistral AI,47a Paris-based startup established in 2023 by former Google DeepMind and Meta AI
scientists offers both open source and proprietary AI models.
Aleph Alpha48is based in Heidelberg, Germany, and it specializes in developing LLMs designed to
provide transparency regarding the sources used for generating results. Their models are intended
for use by enterprises and governmental agencies, trained in multiple European languages.
Silo AI's Poro49, through its generative AI arm SiloGen, has developed Poro, a family of multilingual
open source LLMs. This initiative aims to strengthen European digital sovereignty and democratize
access to LLMs for all European languages.
TrustLLM50is a coordinated project by Linköping University that focuses on developing trustworthy
and factual LLM technology for Europe, emphasizing accessibility and reliability.
OpenEuroLLM51 is an open source family of performant, multilingual, large language foundation
models for commercial, industrial and public services.
Hugging Face's Transformers52 is an extensive library of pre-trained models and tools, allowing
developers to fine-tune and deploy LLMs for specific tasks.
Deepseek53 is an advanced language model comprising 67 billion parameters. It has been trained
from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
Deepset's Haystack54 is an open source framework designed to build search systems and question-
answering applications powered by Large Language Models (LLMs) and other natural language
processing (NLP) techniques.
OLMo 32B55 is the first fully open model (all data, code, weights, and details are freely available).
Meta's LLaMA56 models focus on research and practical applications in NLP.
BLOOM57 was developed by BigScience as a multilingual open source model capable of generating
text in over 50 languages, with a focus on accessibility and inclusivity.
BERT58 was created by Google to understand the context of text through bidirectional language
representation, excelling in tasks like question answering and sentiment analysis.
Falcon59 was developed by the Technology Innovation Institute as a high-performance model
optimized for text generation and understanding, with significant efficiency improvements over
similar models.
Qwen60 is a large language model family built by Alibaba Cloud.
47 Mistral (https://mistral.ai/)
48
Aleph Alpha (https://aleph-alpha.com/)
49
Silo AI, ‘Poro - a family of open models that bring European languages to the frontier’ (2023) https://www.silo.ai/blog/poro-a-family-of-
open-models-that-bring-european-languages-to-the-frontier
50
TrustLLM (https://trustllm.eu/)
51
OpenEuroLLM (https://openeurollm.eu/)
52
Hugging Face, ‘Transformers’ (n.d) https://huggingface.co/docs/transformers/v4.17.0/en/index
53 Deepseek (https://www.deepseek.com/)
54 Haystack (https://haystack.deepset.ai/)
55
Ai2, ‘OLMo 2 32B: First fully open model to outperform GPT 3.5 and GPT 4o mini’ (2025) https://allenai.org/blog/olmo2-32B
56
Llama (https://www.llama.com/ )
57
Hugging Face, ‘Introducing The World’s Largest Open Multilingual Language Model: BLOOM’ (2025)
https://bigscience.huggingface.co/blog/bloom
58
Hugging Face, ‘BERT’ (n.d) https://huggingface.co/docs/transformers/model_doc/bert
59
TTI, ‘Introducing the Technology Innovation Institute’s Falcon 3’ (n.d) https://falconllm.tii.ae/
60
Hugging Face, ‘’Qwen’ (n.d) https://huggingface.co/Qwen
16
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
LangChain61 is an open source framework for building applications powered by large language
models.
Microsoft Azure OpenAI62 Service collaborates with OpenAI to provide API access to GPT models,
enabling businesses to incorporate advanced language features into their applications.
Amazon Web Services (AWS) Bedrock63 offers a suite of AI services, including language models
that support various natural language processing tasks.
Google Cloud Vertex AI64 is a platform for building, deploying, and scaling machine learning
models, including LLMs. It provides access to models like PaLM 2 and supports customization for
various applications, such as translation, summarization, and conversational AI.
IBM Watson65 provides LLM capabilities that can be tailored to recognize industry-specific entities,
enhancing the relevance and accuracy of information extraction.
Cohere66 offers customizable LLMs that can be fine-tuned for specific tasks.
Applications of LLMs
LLMs are employed across various applications67, enhancing both user experience and operational
efficiency. This list represents some of the most prominent applications of LLMs, but it is by no means
exhaustive. The versatility of LLMs continues to unlock new use cases across industries, demonstrating
their transformative potential in various domains.
Chatbots and AI Assistants:68 LLMs power virtual assistants like Siri, Alexa, and Google
Assistant, understand and process natural language, interpret user intent, and generate
responses.
Content generation:69 LLMs assist in creating articles, reports, and marketing materials by
generating human-like text, thereby streamlining content creation processes.
Language translation: 70 Advanced LLMs facilitate real-time translation services.
Sentiment analysis:71 Businesses use LLMs to analyze customer feedback and social media
content, gaining insights into public sentiment and informing strategic decisions.
Code generation and debugging:72 Developers leverage LLMs to generate code snippets and
identify errors, enhancing software development efficiency.
Educational support tools:73 LLMs play a key role in personalized learning by generating
educational content, explanations, and answering student questions.
Legal document processing:74 LLMs help professionals in the legal field by reviewing and
summarizing legal texts, extracting important information, and offering insights.
61
LangChain, ‘Introduction’ (n.d) https://python.langchain.com/
62
Microsoft, ‘Azure OpenAI Service’ (2025) https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
63
AWS, ‘Bedrock’ (n.d)https://aws.amazon.com/bedrock
64
Vertex AI Platform, ‘Innovate faster with enterprise-ready AI, enhanced by Gemini models’ (n.d) https://cloud.google.com/vertex-ai
65
IBM, ‘IBM Watson to watsonx’ (n.d) https://www.ibm.com/watson
66
Cohere (https://cohere.com/)
67 N. Sashidharan, ‘Three Pillars of LLM: Architecture, Use Cases, and Examples ‘ (2024) https://www.extentia.com/post/pillars-of-llm-
architecture-use-cases-and-examples
68
Google Assistant (https://assistant.google.com/)
69
Jasper AI (https://www.jasper.ai/)
70
Deepl (https://www.deepl.com/en/translator )
71
SurveySparrow (https://surveysparrow.com/features/cognivue/)
72
GitHub Copilot (https://github.com/features/copilot)
73
Khanmigo (https://www.khanmigo.ai/)
74
Luminance (https://www.luminance.com/)
17
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Customer support:75 Automating responses to customer inquiries and escalating complex cases
to human agents.
Autonomous vehicles:76 Driving cars with real-time decision-making capabilities.
The following metrics77 are commonly used, each offering different insights:
Accuracy78measures how often an output aligns with the correct or expected results. In tasks like
text classification or question answering, accuracy is calculated as the ratio of correct predictions to
the total number of predictions. However, for generative tasks such as text generation, traditional
accuracy metrics may not fully capture performance due to the open-ended nature of possible
correct responses. In such cases, metrics like BLEU (Bilingual Evaluation Understudy) and ROUGE
(Recall-Oriented Understudy for Gisting Evaluation) are employed to assess the quality of generated
text by comparing it to reference texts.
Precision quantifies the ratio of correctly predicted positive outcomes to the total number of
positive predictions made by the model. In the context of LLMs, a high precision score indicates the
model is accurate when making predictions. However, it does not account for relevant instances the
model fails to predict (false negatives), so it is commonly combined with recall for a more
comprehensive evaluation.
Recall, also referred to as sensitivity or the true positive rate, measures the proportion of actual
positive instances that the model successfully identifies. A high recall score reflects the model’s
effectiveness in capturing relevant information but does not address irrelevant predictions (false
positives). For this reason, recall is typically evaluated alongside precision to provide a balanced
view.
F1 Score offers a balanced metric by combining precision and recall into their harmonic mean. A
high F1 score indicates that the model achieves a strong balance between precision and recall,
making it a valuable metric when both false positives and false negatives are critical. The F1 score
ranges from 0 to 1, with 1 representing perfect performance on both metrics.
Specificity79 measures the proportion of true negatives correctly identified by a model.
AUC (Area Under the Curve) and AUROC80(Area Under the Receiver Operating Characteristic Curve)
quantify a model's ability to distinguish between classes. It evaluates the trade-off between
75 Salesforce (https://www.salesforce.com/eu/)
76
Tesla Autopilot ( https://www.tesla.com/autopilot)
77
A. Chaudhary, ‘Understanding LLM Evaluation and Benchmarks: A Complete Guide’ (2024)
https://www.turing.com/resources/understanding-llm-evaluation-and-benchmarks
78
S. Karzhev, ‘LLM Evaluation: Metrics, Methodologies, Best Practices’ (2024) https://www.datacamp.com/blog/llm-evaluation
79
Wikipedia, ‘Sensitivity and Specificity’ (2025) https://en.wikipedia.org/wiki/Sensitivity_and_specificity
80
E.Becker and S.Soatto, ‘Cycles of Thought: Measuring LLM Confidence through Stable Explanations’ (2024)
https://arxiv.org/pdf/2406.03441v1
18
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
sensitivity (true positive rate) and 1-specificity (false positive rate) across various thresholds. A
higher AUC value indicates better performance in classification tasks.
AUPRC81(Area Under the Precision-Recall Curve), measures a model's performance in imbalanced
datasets, focusing on the trade-off between precision and recall. A high AUPRC indicates that the
model performs well in identifying positive instances, even when they are rare.
Cross Entropy82is a measure of uncertainty or randomness in a system's predictions. It measures
the difference between two probability distributions: the true labels (actual data distribution) and
the predicted probabilities from the model (output). Lower entropy means higher confidence in
predictions, while higher entropy indicates uncertainty.
Perplexity83derives from cross entropy and evaluates how well a language model predicts a sample,
serving as an indicator of its ability to handle uncertainty. A lower perplexity score means better
performance, indicating that the model is more confident in its predictions. Some studies suggest
that perplexity has proven unreliable84 to evaluate LLMs due to their long-context capabilities. It is
also difficult to use perplexity as a benchmark between models since its scores depend on factors
like tokenization method, dataset, preprocessing steps, vocabulary size, and context length. 85
Calibration 86 refers to the alignment between a model's predicted probabilities and the actual
probability of those predictions being correct. A well-calibrated model provides confidence scores
that accurately reflect the true probabilities of outcomes. Proper calibration is vital in applications
where understanding the certainty of predictions is important, such as in medical diagnoses or legal
document analysis.
MoverScore87is a modern metric developed to assess the semantic similarity between two texts.
Other metrics used for assessing the performance and usability of LLM-based systems, especially in real-
time or high-demand applications are:88
Completed requests per minute: Measures how many requests the LLM can process and return
responses for in one minute. It reflects the system's efficiency in handling multiple queries.
Time to first token (TTFT): The time taken from when a request is submitted to when the first token
of the response is generated.
Inter-token Latency (ITL): The time delay between generating consecutive tokens in the response.
This metric evaluates the speed and fluidity of text generation.
End to end Latency /ETEL): The total time taken from when a request is made to when the entire
response is completed. It encompasses all processing stages, including input handling, model
inference, and output generation.
81
J. Czakon, ‘F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?’ (2024) https://neptune.ai/blog/f1-
score-accuracy-roc-auc-pr-auc
82
C.Xu, ‘ Understanding the Role of Cross-Entropy Loss in Fairly Evaluating Large Language Model-based Recommendation’ (2024)
https://arxiv.org/pdf/2402.06216v2
83 C.Huyen ‘Evaluation Metrics for Language Modeling’ (2019) https://thegradient.pub/understanding-evaluation-metrics-for-language-
models/
84
L.Fang ‘What is wrong with perplexity for long-context language modeling?’ (2024) https://arxiv.org/pdf/2410.23771v1
85
A.Morgan ‘Perplexity for LLM Evaluation’ (2024) https://www.comet.com/site/blog/perplexity-for-llm-evaluation/
86
P.Liang et al. ‘Holistic Evaluation of Language Models’ (2023) https://arxiv.org/abs/2211.09110
87
PI, ’Moverscore 1.0.3’ (2020) https://pypi.org/project/moverscore/
88
W. Kadous et al. ‘Reproducible Performance Metrics for LLM inference’ (2023) https://www.anyscale.com/blog/reproducible-
performance-metrics-for-llm-inference
19
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
In addition to these metrics, there are comprehensive evaluation frameworks or benchmarks89 such as
GLUE (General Language Understanding Evaluation)90, MMLU (Massive Multitask Language
Understanding)91, HELM (Holistic Evaluation of Language Models)92, DeepEval93 or OpenAI Evals94.
Task-specific metrics such as BLEU95 (Bilingual Evaluation Understudy), ROUGE96 (Recall-Oriented
Understudy for Gisting Evaluation), and BLEURT97 (Bilingual Evaluation Understudy with
Representations from Transformers) are widely used for evaluating text generation, summarization,
and translation.
It is important to recognize that quantitative metrics alone are not sufficient. While these metrics are
highly valuable in identifying risks, especially when integrated into automated evaluation pipelines,
they primarily serve as early warning signals, prompting further investigation when thresholds are
exceeded. Many critical risks, including misuse potential, ethical concerns, and long-term impact,
cannot be effectively captured through those numerical measurements alone.
To ensure a more holistic evaluation, organizations should complement quantitative indicators with
expert judgment, scenario-based testing, and qualitative assessments.
Open source frameworks like Inspect98, support an integrated approach by enabling model-graded
evaluations, prompt engineering, session tracking, and extensible scoring techniques. These tools help
operationalize both metric-based and qualitative evaluations, offering better observability and insight
into LLM behavior in real-world settings.
89
Benchmarks are standardized frameworks developed to assess LLMs across various scenarios and metrics (See also section 10 of this
document).
90
Gluebenchmark (https://gluebenchmark.com/)
91
Papers with code, ‘MMLU (Massive Multitask Language Understanding)’ (n.d) https://paperswithcode.com/dataset/mmlu
92
Center for Research on Foundation Models, ‘A reproducible and transparent framework for evaluating foundation models’ (n.d)
https://crfm.stanford.edu/helm/
93
GitHub, ‘The LLM Evaluation framework’ (n.d) https://github.com/confident-ai/deepeval
94 GitHub, ‘Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks’ (n.d)
https://github.com/openai/evals
95
Wikipedia, ‘BLEU’ (2025) https://en.wikipedia.org/wiki/BLEU
96
Wikipedia, ‘ROUGE(metric)’ (2025) https://en.wikipedia.org/wiki/ROUGE_(metric)
97
GitHub, ‘BLEURT is a metric for Natural Language Generation based on transfer learning’ (n.d) https://github.com/google-research/bleurt
98
AISI, ‘An open-source framework for large language model evaluations’ (n.d) https://inspect.aisi.org.uk/
99
P. Bhavsar ‘Mastering Agents: Metrics for Evaluating AI Agents’ (2024) https://www.galileo.ai/blog/metrics-for-evaluating-ai-agents
100
https://smythos.com/ai-agents/impact/ai-agent-performance-measurement/
101
AISERA, ‘An Introduction to Agent Evaluation’ (n.d) https://aisera.com/blog/ai-agent-evaluation/
20
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Precision and Recall: Measures how accurately the agent retrieves relevant information (precision)
and whether it captures all necessary details (recall). These metrics are vital for tasks like document
summarization or answering complex queries.
Contextual understanding:102Measures the agent's proficiency in maintaining and utilizing context
in interactions, crucial for coherent multi-turn dialogues. Dialog State Tracking 103 is a relevant
metric.
User satisfaction:104Measures user perceptions of the agent's performance, often through feedback
scores or surveys and using scales to measure system and user experience usability.
Evaluating AI agents with traditional LLM benchmarks presents challenges, as they often fail to
capture real-world dynamics, multi-step reasoning, tool use, and adaptability. Effective assessment
requires new benchmarks that measure long-term planning, interaction with external tools, and real-
time decision-making. Below are some of the most recognized benchmarks currently used:
102 Smyth OS, ‘Conversational Agents and Context Awareness: How AI Understands and Adapts to User Needs’ (n.d)
https://smythos.com/artificial-intelligence/conversational-agents/conversational-agents-and-context-awareness/
103
Papers with code, ‘ Dialogue State Tracking’, (n.d) https://paperswithcode.com/task/dialogue-state-tracking/codeless?page=2
104
N. Bekmanis, ‘Artificial Intelligence Conversational Agents: A Measure of Satisfaction in Use’ (2023)
https://essay.utwente.nl/94906/1/Bekmanis_MA_BMS.pdf
105
Swebench (https://www.swebench.com/)
106
Github, ‘A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)’, (n.d) https://github.com/THUDM/AgentBench
107
Papers with code, ’Agentench’ (n.d) https://paperswithcode.com/dataset/agentbench
108 Q.Huang et al. ‘MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation’ (2024)
https://arxiv.org/abs/2310.03302
109
Hugging Face Dataset (https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)
110
GitHub, ‘Code and Data’ (n.d) https://github.com/sierra-research/tau-bench
111
GitHub, ‘An extensible benchmark for evaluating large language models on planning’ (n.d) https://github.com/karthikv792/LLMs-Planning
I. O. Gallegos et al. ‘Bias and Fairness in Large Language Models: A Survey’ (2024) https://direct.mit.edu/coli/article/50/3/1097/121961/Bias-
and-Fairness-in-Large-Language-Models-A
113
‘Large Language Models pose risk to science with false answers, says Oxford study’ (2023) https://www.ox.ac.uk/news/2023-11-20-large-
language-models-pose-risk-science-false-answers-says-oxford-study
21
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
2. Model limitations
Understanding context: 114 Despite advanced architectures, LLMs can struggle with nuanced
contexts or multi-turn conversations where earlier parts of the dialogue must inform later
responses.
Handling ambiguities:115Ambiguous input can lead to incorrect or nonsensical outputs if the
model cannot infer the intended meaning.
6. Limitations in knowledge
Knowledge cutoff: 122LLMs are trained on data up to a specific point in time. They may lack
awareness of recent developments or emerging knowledge.
Factual errors: 123 LLMs can "hallucinate" information, generating plausible but factually
incorrect responses due to the probabilistic nature of their predictions.
7. Lack of robustness
Adversarial inputs: 124 LLMs may fail when presented with deliberately manipulated or
adversarial inputs designed to exploit their weaknesses.
114
J. Browning, ‘Getting it right: the limits of fine-tuning large language models’ (2024) https://link.springer.com/article/10.1007/s10676-
024-09779-1
115 E.Jones and J. Steinhardt, ‘Capturing Failures of Large Language Models via Human Cognitive Biases’ (2022)
https://arxiv.org/abs/2202.12299
116
G.B.Mohan et al. ’ An analysis of large language models: their impact and potential application’ (2024)
https://link.springer.com/article/10.1007/s10115-024-02120-8
117
H.Naveed et al ‘A Comprehensive Overview of Large Language Models’ (2024) https://arxiv.org/abs/2307.06435
118
P.Jindal ‘Evaluating Large Language Models: A Comprehensive Guide’ (2024) https://www.labellerr.com/blog/evaluating-large-language-
models
119
idem
120 J.Browning ‘Getting it right: the limits of fine-tuning large language models’ (2024) https://link.springer.com/article/10.1007/s10676-024-
09779-1
121
H.Naveed et al. ‘A Comprehensive Overview of Large Language Models’ (2024) https://arxiv.org/abs/2307.06435
122
University of Oxford, ‘Large Language Models pose risk to science with false answers, says Oxford study’ (2023)
https://www.ox.ac.uk/news/2023-11-20-large-language-models-pose-risk-science-false-answers-says-oxford-study
123
ht Ho, D.E., ‘Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive’(2024)
https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive
124
E.Jones and J.Steinhardt, ‘Capturing Failures of Large Language Models via Human Cognitive Biases’ (2022)
https://arxiv.org/abs/2202.12299
22
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Noise and variability: 125 Spelling errors, slang, or non-standard language can lead to
misinterpretations and lower accuracy.
8. Inadequate calibration
Overconfidence: 126 Poorly calibrated models may assign high confidence scores to incorrect
predictions, misleading users. Failing to properly convey uncertainty in predictions can erode
trust in the model.
125
G.B.Mohan ‘An analysis of large language models: their impact and potential applications’ (2024)
https://link.springer.com/article/10.1007/s10115-024-02120-8
126
L.Li et al. ‘Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models’ (2024)
https://arxiv.org/abs/2402.12563
23
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
In this document, we use this AI lifecycle as a reference framework, recognizing that each organization
may have its own adapted version based on its specific needs. While the core stages of the lifecycle are
generally similar across organizations, the exact phases may vary.
Each one of the phases of the lifecycle involves unique privacy risks that require tailored mitigation
strategies. Implementing Privacy by Design into each phase helps to address risks proactively rather
than retroactively fixing them.
127
ISO/IEC 22989 (Artificial Intelligence – Concepts and Terminology)
128
ISO/IEC 5338:2023 Information technology — Artificial intelligence — AI system life cycle processes
24
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Throughout the AI system lifecycle, it is important to consider how different types of personal data
may be involved at each phase. Depending on the stage, personal data can be collected, processed,
exposed, or transformed in different ways. Recognizing this variability is essential for implementing
effective privacy and data protection measures.
129
Important to consider the EDPB opinion 28/2024 and section 3.2 On the circumstances under which AI models could be considered
anonymous and the related demonstration: ‘…, the EDPB considers that, for an AI model to be considered anonymous, using reasonable
means, both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were
used to train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be
insignificant for any data subject.’
130
Testing, Evaluation, Validation, and Verification (TEVV) is an ongoing process that occurs throughout the AI lifecycle to ensure that a
system meets its intended requirements, performs reliably, and aligns with safety and compliance standards.
25
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Figure 8. The illustration shows how different types of personal data can arise across various phases of the AI lifecycle.
Closed models are proprietary models that do not provide public access to their weights or source code
and interaction with the model is restricted, typically requiring an API or subscription, while open
models are made publicly available fully (weights, full code, training data, and other documentation is
available) or partly (not everything is available, usually training data; or it is available under licences).
Similarly, closed weights indicate proprietary models whose trained parameters are not disclosed,
whereas open weights describe models with publicly available parameters, allowing for inspection, fine-
tuning, or integration into other systems.
It is also important to distinguish the term open model from open source model. This classification of a
model as "open source" requires it to be released under an open source license, which legally grants
anyone the freedom to use, study, modify, and distribute the model for any purpose131.
131
AI Action Summit, ‘International AI Safety Report on the Safety of Advanced AI’ , p 150, (2025)
https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf
26
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
1. LLM as a Service: This service model provides access to LLMs via APIs hosted on a cloud platform.
Users can send input and receive output without having direct access to the model’s underlying
architecture or weights.
Based on this service model we can usually find the different LLM model variations available:
o Closed models with closed weights where the provider trains the model and retains
control over the weights and data, offering access through an API. This approach ensures
ease of use but requires user data to flow through the provider’s systems. Example:
OpenAI GPT-4 API132
o Customizable closed weights where deployers may fine-tune the model using their own
data, within a controlled environment, although the underlying weights remain
inaccessible balancing customization with security. Example: Azure OpenAI Service 133
o Open weights where some providers grant deployers full or partial access to the
architecture for greater transparency and flexibility through a platform or via an API 134.
Example: Hugging Face's models in AWS Bedrock 135
2. LLM ‘off-the-shelf’: In this service model the deployer can customize weights and fine tune the
model. This happens sometimes through platforms like Microsoft Azure and AWS where a deployer can
select a model and develop their own solution with it. It is also commonly used with open weight
models, such as LLaMA or BLOOM. While an LLM as a Service typically involves API-based interaction
without model ownership, the LLM ‘off-the-Shelf’ service emphasizes more developer and deployer
control. The distinction lies in this level of control and access provided, for instance, in Hugging Face
models can be downloaded locally.
3. Self-developed LLM: In this model, organizations develop and deploy LLMs on their own
infrastructure, maintaining full control over data and model interaction. While this option may offer
more privacy, this service model requires of significant computational resources and expertise.
Each of the three service models features a distinct data flow. While there are similarities across models,
each phase—from user input to output generation—presents unique risks that can impact user privacy
and data protection. In this section, we will first examine the data flow in an LLM as a Service solution,
followed by an analysis of the key differences in data flow when using an LLM ‘off-the-shelf’ model and
a self-developed LLM system.
*Note that in this section, the terms 'provider'136 and 'deployer'137 are used as defined in the AI Act, where the
provider refers to the entity developing and offering the AI system, and the deployer refers to the entity
implementing and operating the system for end-users.
132
Open AI, ‘The most powerful platform for building AI products, (2025) https://openai.com/api/
133
Microsoft, ‘Azure OpenAI Service’ (2025) https://azure.microsoft.com/en-us/services/cognitive-services/openai-service/
134
Wikipedia, ‘API’ (2025)https://en.wikipedia.org/wiki/API
135
S.Pagezy, ‘Use Hugging Face models with Amazon Bedrock’ (2024) https://huggingface.co/blog/bedrock-marketplace
136 ‘provider’ means a natural or legal person, public authority, agency or other body that develops an AI system or a general-purpose AI
model or that has an AI system or a general-purpose AI model developed and places it on the market or puts the AI system into service under
its own name or trademark, whether for payment or free of charge; (Article 3 (3) AI Act)
137
‘deployer’ means a natural or legal person, public authority, agency or other body using an AI system under its authority except where the
AI system is used in the course of a personal non-professional activity; (Article 3 (4) AI Act)
27
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
User input:
The process starts with the user submitting input, such as a query or command. This could be entered
through a web-based interface, mobile application, or other tools provided by the LLM provider.
Provider interface & API:
The input is sent through an interface or application managed by the provider (e.g., a webpage, app or
a chatbot window embedded on a website). This interface ensures the input is formatted appropriately
and securely transmitted to the LLM infrastructure.
LLM processing at providers’ infrastructure:
The API receives the input and routes it to the LLM model hosted on the provider's infrastructure.
The LLM processes the input using its trained parameters (weights) to generate a relevant response.
This may involve steps like tokenization, context understanding, reasoning, and text generation. The
model generates a response.
* Logging: The provider may log the user input (query) along with the generated response to analyze
the interaction and identify system errors or gaps in response quality.
The data could be also included in a training dataset to improve the model’s ability to handle similar
queries in the future. In this case, anonymization and filtering techniques are often applied.
Processed output:
The generated output is returned via the provider's interface to the user. The response is typically in a
format ready for display or integration, such as text, suggestions, or actionable data.
28
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
138
G. Nagli ‘Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History’ (2025)
https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak
139
T.S. Dutta ‘New Jailbreak Techniques Expose DeepSeek LLM Vulnerabilities, Enabling Malicious Exploits’ (2025)
https://cybersecuritynews.com/new-jailbreak-techniques-expose-deepseek-llm-vulnerabilities/
140
S.Schulhoff ‘Prompt Injection vs. Jailbreaking: What's the Difference?’ (2024) https://learnprompting.org/blog/injection_jailbreaking
141
https://www.nightfall.ai/ai-security-101/data-leakage-prevention-dlp-for-llms
142 Some of the tools used are Google Cloud DLP, Microsoft Presidio, OpenAI Moderation API, Hugging Face Fine-Tuned NER Models and
29
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
148
S.Cheng et al. ‘StruQ: Defending Against Prompt Injection with Structured Queries’ (2024) https://arxiv.org/abs/2402.06363
149
Open AI Platform, ‘Safety best practices’ (n.d) https://platform.openai.com/docs/guides/safety-best-practices#constrain-user-input-and-
limit-output-tokens
150
Trust Community, NIST password guidelines 2025: 15 rules to follow’ (2024) https://community.trustcloud.ai/article/nist-password-
guidelines-2025-15-rules-to-follow/
151
Wikipedia, ‘Password Manager’ (2025) https://en.wikipedia.org/wiki/Password_manager
152
LLM Engine, (n.d) https://llm-engine.scale.com/guides/rate_limits/
30
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
and deployer have roles in addressing this risk. The provider should implement
platform-level protections, such as safeguarding the authenticity of their interface
(e.g., anti-spoofing measures, branding protections, secure APIs), monitoring for
suspicious activity, and providing tools to help deployers detect and prevent abuse.
Mitigations for Deployers:
- If the deployer is using only the provider's interface, their responsibility is limited to
securely managing access credentials and complying with data handling policies;
however, if the deployer integrates the provider’s API into their own systems, they
are additionally responsible for securing the integration, including encryption,
monitoring, and safeguarding data in transit.
Both provider and deployer should design, develop, deploy and test applications and
APIs in accordance with leading industry standards (e.g., OWASP for web
applications153) and adhere to applicable legal, statutory or regulatory compliance
obligations.
- The deployer should educate employees and end users about evolving phishing
techniques—such as fake interfaces, deceptive emails, or fraudulent integrations—
that could trick individuals into revealing sensitive information. Education should
focus on recognizing suspicious behaviors and verifying the legitimacy of
communications and interfaces.
LLM Risks:
processing at Model inference risks: During processing, the model might inadvertently infer sensitive
Providers’ or inappropriate outputs based on the training data or provided input.
infrastructure (Un)intended data logging: Providers can log user input queries and outputs for
debugging or model improvement, potentially storing sensitive data154 without explicit
user consent. If logged user queries are included in training data, in case of an adversarial
attack, attackers might introduce malicious or misleading content to manipulate the
model’s future outputs (data poisoning attack)155.
Anonymization failures: Inadequate anonymization or filtering techniques could lead to
the inclusion of identifiable user data in model training datasets, raising privacy concerns.
Unauthorized access to logs: Logs containing user inputs and outputs could be accessed
by unauthorized personnel or exploited in the event of a data breach.
Data aggregation risks: If logs are aggregated over time, they could form a
comprehensive dataset that may reveal patterns about individuals, organizations, or
other sensitive activities.
Third-party exposure: If the provider relies on external cloud infrastructure or third-party
tools for LLM processing, there’s an added risk of data exposure through those
dependencies. These dependencies involve external systems, which may have their own
vulnerabilities.
Lack of data retention policies: The provider could store the data indefinitely without
having retention policies in place.
153
OWASP, ‘OWASP Top Ten’ (2025) https://owasp.org/www-project-top-ten/
154
The data stored could be sensitive data such as credit card numbers, or special category of data such as health data (article 9 GDPR).
155
Aubert, P. et al., ‘Data Poisoning: a threat to LLM’s Integrity and Security’ (2024) https://www.riskinsight-
wavestone.com/en/2024/10/data-poisoning-a-threat-to-llms-integrity-and-security/
31
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
156
OWASP, ‘LLM10:2023 - Training Data Poisoning’ (2023) https://owasp.org/www-project-top-10-for-large-language-model-
applications/Archive/0_1_vulns/Training_Data_Poisoning.html
157
Center for Internet Security, ’ The 18 CIS Critical Security Controls’ (2025) https://www.cisecurity.org/controls/cis-controls-list
158
Wikipedia, ‘Hallucination Artificial Intelligence’ (2025) https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)
159
OWASP, ‘LLM05:2025 Improper Output Handling’ (2025) https://genai.owasp.org/llmrisk/llm052025-improper-output-handling/
32
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
flow closely resembles that of an LLM as a Service model, particularly during user interactions and
output generation.
However, several key differences and limitations set these models apart:
160
According to Article 25 of the AI Act, a deployer of a high risk AI system becomes a provider when they substantially modify an existing AI
system, including by fine-tuning or adapting a pre-trained model for new applications. In such cases, the deployer assumes the
responsibilities of a provider under the AI Act.
161
EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted
on 17 December 2024
33
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
A similar approach is cache-augmented generation (CAG) 162 which can reduce latency, lower
compute costs, and ensure consistency in responses across repeated interactions but that is less
practical for large datasets that are often updated.
The figure below illustrates how RAG163 works: the user's query is first enhanced with relevant
information retrieved from an external database, and this enriched input is then sent to the
language model to generate a more accurate and grounded response.
Insecure logging or caching: User queries and retrieved documents may be stored insecurely,
increasing the risk of unauthorized access or data leaks.
Third-party data handling: If the retrieval system uses external APIs or services, user queries
may be sent to third parties, where they can be logged, tracked, or stored without user consent.
Exposure of sensitive data: The model may retrieve personal or confidential information if this
is stored in the knowledge base.
Since we have already explored an example of privacy risks within the AI lifecycle data flow in a
previous section, we will take here a more general approach, focusing on some of the important
phases for this service model. The general data flow phases could be as follow:
162 Sharma, R., ‘Cache RAG: Enhancing speed and efficiency in AI systems’ (2025) https://developer.ibm.com/articles/awb-cache-rag-
efficiency-speed-ai/
163
Theja, R., ‘Evaluate RAG with LlamaIndex’ (2023) https://cookbook.openai.com/examples/evaluation/evaluate_rag_with_llamaindex
164
Atlan, ‘Data Curation in Machine Learning: Ultimate Guide 2024’ (2023) https://atlan.com/data-curation-in-machine-learning/
34
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
distributed computing systems. Before deployment, the model undergoes rigorous evaluation and
testing using separate validation and test datasets to ensure its accuracy, reliability, and alignment
with intended use cases.
Fine-Tuning:
After initial training, the model may be fine-tuned using additional datasets to specialize its
capabilities for specific tasks or domains.
Deployment:
The trained and fine-tuned LLM is integrated into the organization’s infrastructure, making an
interface available for end-users.
User input:
End-users interact with the deployed AI system by submitting inputs through an interface such as an
app, chatbot, or custom API.
Provider interface & API:
The input is sent through an interface or application. This interface ensures the input is formatted
appropriately and securely transmitted to the LLM infrastructure.
Model processing:
The self-developed LLM processes user inputs locally or on the organization’s (cloud) infrastructure,
generating contextually relevant responses using its trained parameters.
Processed output delivery:
The processed outputs are delivered to end-users or integrated into downstream systems for
actionable use. Outputs may include text-based responses, insights, or recommendations.
Dataset
Model
Collection
Training & Deployment
and
Fine-tuning
Preparation
35
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Bias and discrimination: Datasets could reflect societal or historical biases, leading to
discriminatory outputs.
Data poisoning: Datasets may be intentionally manipulated by malicious actors during
collection or preparation, introducing corrupted or adversarial data to mislead the
model during training.
165
ENISA, ‘Pseudonymisation techniques and best practices. Recommendations on shaping technology according
to data protection and privacy provisions’ (2019)
https://www.enisa.europa.eu/sites/default/files/publications/Guidelines%20on%20shaping%20technology%20according%20to%20GDPR%2
0provisions.pdf
166
Marwala, T., ‘Algorithm Bias — Synthetic Data Should Be Option of Last Resort When Training AI Systems’ (2023)
https://unu.edu/article/algorithm-bias-synthetic-data-should-be-option-last-resort-when-training-ai-systems
167 Van Breugel, B. et al., ‘Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic Data’ (2023)
https://proceedings.mlr.press/v202/van-breugel23a/van-breugel23a.pdf
168
Vongthongsri, K., ‘Using LLMs for Synthetic Data Generation: The Definitive Guide’ (2025) https://www.confident-ai.com/blog/the-
definitive-guide-to-synthetic-data-generation-using-llms
169
Desfontaines, D., ‘The fundamental trilemma of synthetic data generation’ (n.d) https://www.tmlt.io/resources/fundamental-trilemma-
synthetic-data-generation
170
Wikipedia, ‘Private access management’ (2025) https://en.wikipedia.org/wiki/Privileged_access_management
171
Wikipedia, ‘Role-based access control’ (2025) https://en.wikipedia.org/wiki/Role-based_access_control
36
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
172
Anthropic, ‘Introducing the Model Context Protocol’ (2024) https://www.anthropic.com/news/model-context-protocol
37
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
A simplified overview of the most common phases involved in this data flow includes:
User input:
The process begins with the user providing input to the AI agent, such as a query, command, or task
description (e.g., “Book me a flight and a hotel for my trip to Amsterdam”).
o Actions:
- Input is collected through the user interface, such as a chatbot or voice assistant.
- Preprocessing may occur locally to sanitize and standardize the input.
Agent processing:
The AI agent uses an integrated LLM to understand and process the user input. This includes parsing
the request and identifying the actions required to fulfill the task.
o Actions:
- The input is tokenized and interpreted by the LLM.
- The agent decides which external applications to contact and formulates queries or
commands for them.
Interaction with application 1 (e.g., flight booking system):
The agent sends a query or command to the first external application to retrieve or process
information. For instance, it may request available flights based on the user’s travel preferences.
o Actions:
- Data (e.g., user preferences) is transmitted to the external application.
- The application processes the query and returns a response, such as a list of available
flights.
- The agent receives and processes the response for integration into the overall
workflow.
Interaction with application 2 (e.g., hotel booking system):
The agent engages with the second external application to complete another part of the task. For
example, it might request hotel options based on the destination and travel dates.
o Actions:
- Data (e.g., travel dates) is transmitted to the application.
- The application provides a response, such as available hotels, which is processed by
the agent.
Aggregation of responses:
The AI agent integrates the responses from both applications to generate a cohesive result. For
instance, it compiles the flight and hotel options into a single output for the user.
o Actions:
- Responses are validated and formatted for clarity and relevance.
- Potential errors or conflicts (e.g., overlapping schedules) are resolved.
Output generation:
The agent delivers the aggregated result to the user in a user-friendly format, such as a summary of
booking options or actionable recommendations.
o Actions:
- Output is displayed via the user interface or transmitted to another system for further
action.
- If necessary, the agent provides follow-up prompts to refine the user’s preferences or
choices.
Logging and continuous improvement:
Interaction logs may be stored temporarily for debugging, system improvements, or retraining
purposes, depending on the organization’s policies.
o Actions:
- Logs are analyzed to optimize the agent’s performance and enhance user experience.
38
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
173
OWASP, ‘Agentic AI – Threats and Mitigations’ (2025) https://genaisecurityproject.com/resource/agentic-ai-threats-and-mitigations/
39
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
- Configure the integrations to only collect and store necessary input data. Avoid
requesting or storing sensitive data unless explicitly required for the task.
- Implement preprocessing pipelines to remove sensitive information before inputs
are sent to the provider’s system.
- When necessary, implement user-facing consent forms or interfaces when collecting
data to use with the LLM.
- Secure user interfaces with encryption and authentication mechanisms to protect
input data and ensure data is handled properly at the deployment level through
encrypted local storage and secure API connections.
- Clearly communicate to users how their data is handled and processed. This could be
done through (internal) privacy policies, warnings, instructions or disclaimers in the
user interface.
Reasoning Risks:
(agent - Sensitive data used in reasoning tasks may be misused or exposed.
processing) - Improper handling of user data during task decomposition may propagate sensitive
information.
- Inferences made during reasoning could unintentionally reveal personal insights.
- Limited explainability could lead to incorrect or suboptimal outputs, reducing trust
and reliability, especially in complex decision-making tasks. Without clear reasoning
chains, users may struggle to understand or verify how conclusions are reached,
increasing the probability of errors and unintended outcomes.
174
Gadesha, V., ‘What is chain of thoughts (CoT)?’ (2024) https://www.ibm.com/think/topics/chain-of-thoughts
175
Biswas, D., ‘Stateful and Responsible AI Agents’ (2024) https://www.linkedin.com/pulse/stateful-responsible-aiagents-debmalya-biswas-
runze/
40
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
176 Providers are responsible for ensuring the foundational model and platform are secure, privacy-compliant, and equipped with features for
secure deployment that deployers can configure. Deployers are responsible for integrating, configuring, and using the LLM securely within
their specific context. The division of responsibilities depends on the level of customization required by the deployer and the deployment
context, with both parties sharing accountability for implementing necessary mitigations.
177
McDougald, D., et al., ‘Strengthening AI agent security with identity management’ (2025) https://www.accenture.com/us-
en/blogs/security/strengthening-ai-agent-security-identity-management
178
See footnote 176
179
idem
41
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
- To reduce hallucinations, though crafted prompts can be helpful, they offer limited
effectiveness. LLMs should be fine-tuned with curated, high-quality data and
configured to limit the search space of responses to relevant and up-to-date
information180.
Feedback and Risks:
Iteration Loop - User feedback may be stored or used for model retraining without consent.
(learning and - Sensitive feedback information may unintentionally persist in logs or datasets.
improvement)
Mitigations for Providers:
- Ensure the platform includes clear and user-friendly mechanisms for users to opt-in
or opt-out of having their feedback data used in model retraining.
- Offer built-in tools or features to automatically anonymize or pseudonymize
feedback data before storing or processing it for retraining purposes.
- Limit log retention periods by default and provide configurable options to deployers
to ensure compliance with privacy regulations.
Mitigations for Deployers:
- Clearly communicate to users how their feedback data will be used and ensure
robust tracking of user consent (opt-in/opt-out) mechanisms provided by the
platform.
- Use anonymization and pseudonymization tools to securely handle feedback data.
- Limit log retention periods and ensure compliance with privacy regulations.
The addition of filters introduces complexity into the system’s architecture: Filters may add
latency/processing time, impacting response times in real-time systems. They need to be secure, as
vulnerabilities could expose sensitive data or allow malicious inputs to bypass scrutiny. And must be
180
See footnote 175
181
Inan, H. et al., ‘Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations’ (2023)
https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/
42
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
monitored and regularly updated to adapt to new risks, changing regulations, or evolving system
requirements.
Roles in LLMs Service Models According to the AI Act and the GDPR
The roles of provider and deployer under the AI Act, as well as controller and processor under the GDPR,
may differ based on the different service model. Below is an explanation of how these roles may apply
and the rationale behind their assignment, categorized by each service model. Note that the
qualification of organizations as controller or processor should be assessed based on the circumstances
of each case, and the explanation provided here is intended for reference purposes only and does not
imply it will always apply in the same way.
AI Act Roles
Provider: The organization that develops and offers the LLM as a service. Providers are
responsible for ensuring compliance with the AI Act, including risk management, transparency,
and technical robustness (e.g., OpenAI providing GPT models via APIs).
Deployer: The organization using the LLM (e.g., a business using the provided interface for any
particular task).
New provider: An organization integrating the API of an LLM as a Service model into their
commercial AI system (e.g., a chatbot) could also be considered a provider under the AI Act if
their system qualifies as high-risk and falls within the scope of Article 25 of the AI Act.
GDPR Roles
Deployer as controller: The deployer using the LLM as a Service typically acts as the data
controller, as they determine the purposes and means of data processing (e.g., collecting
customer queries to improve services or using an LLM tool for summarization purposes).
Provider as controller: When providers collect or retain data for their own purposes (e.g., model
fine-tuning or feature improvement), they assume the role of controller too. This is the case in
most LLM as a Service solutions where providers have ownership of the model and the training
data. In this scenario, a joint controllership might be the more suitable option.
Processor: The provider acts as a processor when handling data strictly according to the
deployer's instructions for specific tasks, like generating responses. This might be difficult in this
service model due to the providers model’s ownership.
In an LLM as a Service model scenario, we often talk about the concept of shared responsibility, where
both the provider and the deployer play distinct but complementary roles in ensuring privacy, security,
43
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
and compliance. The provider is responsible for the infrastructure, model training, and maintenance,
while the deployer must ensure secure usage, proper integration, and adherence to applicable
regulations within their specific deployment context. This division of responsibilities requires clear
agreements and robust collaboration to effectively manage risks.
AI Act Roles
Provider: The organization that develops, puts in the market or into service the off-the-shelf
LLM model. Providers are responsible for ensuring that the model adheres to the AI Act’s
requirements182. In case of LLMs released under free and open source licenses, they should be
considered to ensure high levels of transparency and openness if their parameters, including
the weights, the information on the model architecture, and the information on model usage
are made publicly available183
o If the platform provider develops, trains, or significantly fine-tunes an LLM and makes
it available to deployers, they would act as providers under the AI Act.
o The platform could also just have the role of infrastructure enabler and not being
considered then a provider but a distributor.
Deployer: The organization using the off-the-shelf model to build or enhance its own services
takes on the role of deployer. However, in cases of high risk AI systems, the deployer may also
assume the role of provider if they significantly modify or fine-tune the model or make it
available to others as part of their own services. This dual role is addressed under Article 25 of
the AI Act.
182 Note based on recital 104 AI Act: Providers of general-purpose AI models released under a free and open source license, with publicly
available parameters (including weights, architecture details, and usage information), should be subject to exceptions regarding
transparency-related requirements under the AI Act. However, exceptions should not apply when such models present a systemic risk. In
such cases, transparency and an open source license alone should not suffice to exempt the provider from compliance with the regulation's
obligations. Furthermore, the release of open source models does not inherently guarantee substantial disclosure about the datasets used
for training or fine-tuning, nor does it ensure compliance with copyright law. Therefore, providers should still be required to produce a
summary of the content used for model training and implement a policy to comply with Union copyright law, including identifying and
respecting reservations of rights as outlined in Article 4(3) of Directive (EU) 2019/790.
183
Recital 102 AI Act
44
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
GDPR Roles
Deployer as Controller: The deployer typically acts as the controller, as they determine the
purpose and means of processing personal data during their use of the LLM.
Provider as controller: The original model provider may act as a controller in limited scenarios
where they process data for their own purposes. If the platform provider logs, analyzes, or
retains user or deployer data for purposes like improving platform services, debugging, or
monitoring system performance, they could be taking on the role of controller for this specific
data processing.
Processor: This role could be carrying out cloud-based tasks explicitly instructed by the
deployer. For example, during data inference tasks, data might be processed according to the
deployer’s instructions. In this case, a platform providing a model could act as a processor under
the GDPR.
The provider remains accountable for the foundational model’s compliance and functionality. The
deployer is responsible for how the model is implemented, customized, and operated within their
specific context, especially in scenarios where data is processed locally, or cloud tasks are guided by the
deployer. This dual-layered responsibility emphasizes the need for clear contractual agreements and
robust governance mechanisms.
3. Self-developed LLMs
All operations, from model development, infrastructure, input collection to model processing, are
performed under the responsibility of the provider that is often also deploying the model for own use.
AI Act Roles
Provider: The entity developing the LLM.
Deployer: The organization deploying the solution and taking on most operational
responsibilities, including monitoring, risk management, and transparency.
In this specific service model, the organization developing the LLM system could be the same
organization putting the system into own use. In that scenario the same organization would be
considered a provider and deployer under the AI Act.
GDPR Roles
Provider as Controller: The LLM system developer, as they control and execute all data
processing activities within their local infrastructure during development.
Deployer as Controller: The deployer, as they determine the purpose and means of processing
personal data during their use of the LLM.
Processor: Any third party processing data on behalf of the controller might take this role.
45
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
The controller’s full control over infrastructure and data makes them responsible for compliance with
GDPR and AI Act requirements.
The processor’s role is limited to any third party tool or component that the controller could be using in
the process.
4. Agentic AI Systems
Agentic AI systems introduce unique dynamics to data flows and role allocation under the GDPR and
AI Act due to their autonomous and dynamic behavior.
AI Act Roles
Provider: The entity developing and supplying the LLM or core agentic architecture.
Deployer: The organization implementing the agentic AI system for its own or third-party use.
In high risk AI systems, if the deployer fine-tunes the agent, integrates it with specific systems,
or significantly modifies its architecture, they may also assume the role of a provider under
Article 25 of the AI Act, responsible for compliance of the modified system.
In this service model, the deployer is often both a provider and a deployer, depending on the level of
customization, fine-tuning, or downstream deployment of the agentic AI system.
GDPR Roles
Deployer as Controller: The deployer typically assumes the role of the controller, as they
determine the purpose and means of processing personal data. This includes inputs, outputs,
memory management, and interactions with external systems.
Processor: When the deployer uses third-party tools, external APIs, or cloud services as part of
the agentic AI’s operations, these third-party providers could act as processors. For example, if
an external API or service facilitates real-time data retrieval or enhances functionality, it takes
on a processing role under the deployer’s instruction. In some cases, third-parties could act as
joint-controllers.
Responsibility Sharing:
The deployer bears significant responsibility for managing the AI agent’s outputs and interactions.
However, providers supplying foundational LLMs, or modules could also share responsibility for pre-
deployment compliance.
46
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
This table shows an overview of the possible roles per service model, always subject to an assessment
of the circumstances at hand:
47
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
The risk factors shown below are the result of analysing the contents of legal instruments such as the
GDPR184, the EUDPR185, the EU Charter186 and other applicable guidelines related to privacy and data
protection.187The following risk factors can help us identify data protection and privacy high level risks
in LLM-based systems:
184
General Data Protection Regulation (2016/679)
185
European Union Data Protection Regulation (Reg. 2018/1725)
186
Charter of Fundamental Rights of the European Union (2012/C 326/02)
187
AEPD, ‘Risk Management and Impact Assessment in Processing of Personal Data’ p-79 (2021) https://www.aepd.es/guides/risk-
management-and-impact-assessment-in-processing-personal-data.pdf
48
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Processing data of vulnerable individuals This could be the case when LLM systems are used in the
This is a concern because vulnerable individuals often require health sector, at schools, social services organizations,
special protection. Processing their personal data without government institutions, employers, etc. For instance, an
proper safeguards can lead to violations of their fundamental LLM-based platform used in schools to assess student
rights. Some examples of vulnerable individuals are children, performance and provide personalized learning
elderly people, people with mental illness, disabled, patients, recommendations processes data about children.
people at risk of social exclusion, asylum seekers, persons who
access social services, employees, etc.
Low data quality LLMs rely heavily on the quality of both the input data
The low data quality of the input data and/or the training data provided by users and the data used for training the
is a concern bringing possible risks of inaccuracies in the model. Any inaccuracies, biases, or incompleteness in the
generated output what could cause wrong identification of data can have far-reaching consequences, as LLMs
characters and have other adverse impacts depending on the generate outputs based on patterns they detect in their
use case. training and input data. The degree of risk posed by low
data quality depends heavily on the application. In less
critical use cases, such as content generation, inaccuracies
may be less impactful. However, in high-stakes scenarios,
such as healthcare, finance, or public policy, even minor
inaccuracies can have significant negative consequences.
Insufficient security measures This could be the case if there are not sufficient safeguards
The lack of sufficient safeguards could be the cause of a data implemented to protect the input data and the results of
breach. Data could also be transferred to states or the processing. This could be applicable to any use case.
organizations in other countries without an adequate level of LLMs offered as SaaS solutions involve in some cases data
protection. being sent for processing to servers in countries without
adequate data protection laws, increasing exposure to
privacy risks.
A hazard refers to a potential source of harm, while hazard exposure describes the conditions or extent
to which individuals or systems are exposed to that harm in a hazardous situation. Safety represents
the measures implemented to minimize or mitigate harm, ensuring the system operates as intended
without causing undue risk. Threats are external factors that may exploit vulnerabilities within the LLM
based system, which are weaknesses that could be exploited to compromise functionality, security, or
188
These terms are explained here in an accessible manner to aid understanding, but they are not official definitions. The European harmonized
standard on risk management, currently being developed by CEN/CENELEC at the request of the European Commission, will contain
standardized definitions and provide formalized guidance on these terms.
49
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
data protection. The AI Act emphasizes the protection of fundamental rights, including privacy, to
ensure that AI systems do not adversely impact individuals.
When trying to identify the risks of an LLM based system, is important to consider all these components
of risks 189 that could have an impact on privacy and data protection. Privacy risks often stem from
hazards, or from vulnerabilities within the system that could be exploited by external or internal threats.
A hazard exposure in this context could refer to how individuals’ personal data is exposed to these risks
through the use of the LLM based system, for example, during input querying.
Understanding these interrelated concepts facilitates the risk management process of AI systems that
need to comply with the GDPR and the AI Act having as end goal the protection of individuals.
By clearly defining the intended purpose, you can assess whether the design and functionalities align
with the anticipated application’s use. This is also helpful to identify potential misuse of the system and
harm to specific user groups. Similarly, understanding the broader context—including user
demographics, language, cultural factors, and business models—enables you to evaluate how the
system interacts with its environment to anticipate potential issues.
Threat modeling can assist in identifying potential attack surfaces 193, misuse cases, and vulnerabilities,
enabling a proactive approach to risk identification and mitigation. For example, by identifying data
flows and system dependencies, threat modeling can reveal risks like unauthorized data access which
189
Novelli, C et al., ‘AI Risk Assessment: A Scenario-Based, Proportional Methodology for the AI Act’ (2024) https://doi.org/10.1007/s44206-
024-00095-1
190 The AI Act requires in Article 9 (2)(a) for high-risk AI systems risk management systems ‘the identification and analysis of the known and
the reasonably foreseeable risks that the high-risk AI system can pose to health, safety or fundamental rights when the high-risk AI system is
used in accordance with its intended purpose;
191
NIST, ‘Artificial Intelligence Risk Management Framework (AI RMF 1.0)’ (2023) https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
192
Threat Modeling Manifiesto, (n.d) https://www.threatmodelingmanifesto.org/
193
Meta, Frontier AI Framework (2025) https://ai.meta.com/static-resource/meta-frontier-ai-
framework/?utm_source=newsroom&utm_medium=web&utm_content=Frontier_AI_Framework_PDF&utm_campaign=Our_Approach_to_F
rontier_AI_blog
50
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
may not be immediately apparent. The identified threats from a threat modeling session can be
integrated into LLM evaluations, where models are systematically tested against the threats through
adversarial testing, red teaming, or scenario-based assessments.
Incorporating evidence 197 from both core and enhanced sources ensures a comprehensive
understanding of risks. Core evidence includes existing data on system characteristics and user
interactions, while enhanced evidence might involve consulting experts, conducting targeted research,
or using outputs from content moderation or technical evaluation systems. This multi-faceted approach
not only aids in identifying risks but also provides a documented basis for risk management decisions.
The table below categorizes common privacy risks of LLM systems based on their applicability to the
roles of providers and deployers. Both providers and deployers are responsible to all risks on the table,
but the degree of responsibility depends on the level of control over the system (e.g., providers for
infrastructure, deployers for usage and configuration).
It is important to consider how risks differ depending on the perspective. For instance, while a provider
may face regulatory obligations to minimize data storage, a deployer must evaluate the risks of
entrusting a provider with sensitive information. These roles come with different responsibilities that
require specific risk management strategies.
Providers, deployers, and procurement teams must address these risks collaboratively. Procurement,
in particular, plays a vital role in bridging the responsibilities of providers and deployers by ensuring
that selected systems meet regulatory standards and organizational privacy requirements. Key
194
Ofcom, ‘Protecting people from illegal harms online - Annex 5: Service Risk Assessment Guidance (2024)
https://www.ofcom.org.uk/siteassets/resources/documents/consultations/category-1-10-weeks/270826-consultation-protecting-people-
from-illegal-content-online/associated-documents/annex-5-draft-service-risk-assessment-guidance?v=330403
195
Martineau, K. ‘What is red teaming for generative AI?’ (2024) https://research.ibm.com/blog/what-is-red-teaming-gen-AI
196
Recital 172 AI Act: “Persons acting as whistleblowers on the infringements of this Regulation should be protected under the Union law.
Directive (EU) 2019/1937 of the European Parliament and of the Council (54) should therefore apply to the reporting of infringements of this
Regulation and the protection of persons reporting such infringements.”
197
Ofcom, ‘Protecting people from illegal harms online - Annex 5: Service Risk Assessment Guidance (2024)
https://www.ofcom.org.uk/siteassets/resources/documents/consultations/category-1-10-weeks/270826-consultation-protecting-people-
from-illegal-content-online/associated-documents/annex-5-draft-service-risk-assessment-guidance?v=330403
51
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
considerations during procurement include assessing the provider’s policies, ensuring compliance with
relevant regulations, and embedding clauses that limit data misuse and support data subject rights.
Deployers using LLMs need to consider the risks related to their specific use cases and context. Making
use of the risk factors or evaluation criteria can facilitate the identification of those risks. For instance,
the criteria ‘low data quality’ can already trigger the identification of risky processing activities that
could result in harm.
Providers and developers of LLMs must implement risk management as an iterative process to identify
and address risks, recognizing that these risks can emerge at various phases of the development
lifecycle, as discussed in previous sections.
The overview provided by the table below can serve as a practical starting point for identifying and
analyzing privacy and data protection risks throughout the lifecycle of LLM based systems. The table
presents a consolidated summary of privacy risks, complementing the details already provided in
Section 3 under Data Flow and Associated Privacy Risks in LLM Systems.
52
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risk applicability
Data Protection and Privacy Risk description GDPR potential Impact Examples Service Model Provider Deployer
Risks
1. Insufficient Safeguards for the Infringement of: Sensitive data disclosure in user inputs or
protection of personal protection of personal Art. 32 Security of processing, during training, inference and output.
data what eventually data are not Art. 5(1)(f) Integrity and Unauthorized access, insufficient encryption LLM as a Service
can be the cause of a implemented or are confidentiality and Art. 9 during data transmission, API misuse, interface LLM ‘off-the-shelf’
data breach. insufficient. Processing of special categories vulnerabilities, inadequate anonymization or Self-developed LLM
of personal data filtering techniques, third party exposure. Agentic LLM
2. Misclassifying Controllers may Infringement of: An LLM trained on improperly anonymized user
training data as incorrectly assume Articles 5(1)(a) (Lawfulness, logs reveals identifiable user information LLM as a Service198
anonymous by training data is Fairness, and Transparency), through model inference attacks. LLM ‘off-the-
controllers when it anonymous, failing to 5(1)(b) (Purpose Limitation), 25 A deployer discovers that the third-party LLM shelf’199
contains identifiable implement necessary (Data Protection by Design and they are using has been trained on non- Self-developed LLM
information. safeguards for Default) anonymized personal data, and the vendor fails Agentic LLM
personal data to implement appropriate safeguards, exposing
protection. the deployer to compliance risks.
3. Unlawful processing Personal data is Infringement of: An e-commerce platform uses customer
of personal data in included in training Articles 5(1)(a) (Lawfulness, purchase histories to train an LLM without LLM as a Service200
training sets. datasets without Fairness, and Transparency) informing customers or obtaining their consent. LLM ‘off-the-
proper legal basis, Articles 6(1) (Lawfulness of shelf’201
safeguards, or user Processing), 7 (Consent), 5(1)(c) Self-developed LLM
consent. (Data Minimization) Agentic LLM
4. Unlawful processing Training datasets Infringement of: Medical records scraped from unsecured online
of special categories of include sensitive data, Articles 9(1) and 9(2) (Special sources are used to train a healthcare chatbot LLM as a Service202
personal data and data such as health or Categories of Data), Article 10 without applying GDPR compliant safeguards. LLM ‘off-the-
relating to criminal criminal records, (Criminal Convictions and shelf’203
convictions and without meeting GDPR Offences). Self-developed LLM
offences in training exceptions for lawful Agentic LLM
data. processing.
198
This risk primarily applies to the provider; however, the deployer shares responsibility by ensuring they engage with lawful vendors. The deployer's role includes conducting due diligence to verify that the provider
complies with legal obligations and operates within the bounds of applicable regulations.
199
Idem
200 Idem
201 Idem
202
Idem
203
Idem
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
5. Possible adverse The output of the Infringement of: A system providing output that is not accurate
impact on data system could have an Art. 5(1)(d) Accuracy, Art. or contain bias and does not provide with LLM as a Service
subjects that could adverse impact on the 5(1)(a) Fairness, Art. 22 mechanisms to amend errors. The output of an LLM ‘off-the-shelf’
negatively impact individual. Automated individual decision- LLM could be used to make automatic decisions Self-developed LLM
fundamental rights. making, including profiling, Art. which produce legal effects or similarly Agentic LLM
25 Data protection by design significant effects on data subjects.
and by default
6. Not providing Automated decisions Infringement of: A chatbot automates loan approvals based on
human intervention for that significantly Articles 22(1) and 22(3) user provided data, denying applications LLM as a Service
a processing that can impact individuals are (Automated Decision-Making), without involving a human reviewer. LLM ‘off-the-shelf’
have a legal or made without human Article 12 (Transparent Self-developed LLM
important effect on review, violating GDPR Communication). Agentic LLM
the data subject. requirements for
human oversight, or
are based on
inappropriate
ground204.
7. Not granting data Data subjects’ rights Infringement of: Data subjects’ requests to rectify or to erase
subjects their rights. cannot be completely Art. 12 – 14: Information to be personal data cannot be completed. Users are LLM as a Service
or partially granted. provided when personal data is not aware of how their data will be used, LLM ‘off-the-shelf’
collected retained, or shared by the provider. Self-developed LLM
Art. 16 and Art. 17: Right to Agentic LLM
rectification and right to
erasure
Article 18 Right to restriction of
processing and Article 21 Right
to object
8. Unlawful repurpose Personal data is used Infringement of: This could be the case if the provider uses the
of personal data. for a different purpose. Art. 5(1)(b) Purpose limitation, input and/or output data for training the LLM LLM as a Service
Art. 5(1)(a) Lawfulness, fairness without this being formally agreed on LLM ‘off-the-shelf’
and transparency, beforehand. Self-developed LLM
Article 28(3)(a)205 and Art. 29 Agentic LLM
Processing under the authority
of the controller or processor
204
Under the exceptions outlined in Article 22(2) of the GDPR, automated individual decision-making is permitted only if it is based on contractual necessity, explicit consent, or if authorized by EU or Member State law.
205
“processes the personal data only on documented instructions from the controller, including with regard to transfers of personal data to a third country or an international organisation, unless required to do so by Union
or Member State law to which the processor is subject; in such a case, the processor shall inform the controller of that legal requirement before processing, unless that law prohibits such information on important grounds of
public interest;”
54
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
9. Unlawful unlimited Input and/or output Infringement of: The system could be unnecessarily storing input LLM as a Service
storage of personal data is being stored Art. 5(1)(e) Storage limitation data that is not directly relevant to the LLM LLM ‘off-the-shelf’
data. longer than necessary. and Art. 25 Data protection by process. In some cases, the output could be Self-developed LLM
design and by default stored by the deployer longer than necessary. Agentic LLM
Providers can also log user inputs and outputs
for debugging or model improvement.
10. Unlawful transfer Data are being Infringement of: LLM providers could be processing the data in
of personal data. processed in countries Art. 44 General principle for countries that do not offer enough LLM as a Service
without an adequate transfers, Art. 45 Transfers on safeguards206. LLM ‘off-the-shelf’
level of protection. the basis of an adequacy Agentic LLM
decision, Art. 46 Transfers
subject to appropriate
safeguards
11. Breach of the data Extensive processing of Infringement of: LLMs require substantial amounts of data for
minimization principle. personal data for Art. 5(1)(c) Data minimization, training. Similarly, deployers may use datasets LLM as a Service
training the model. Art. 6 to the extent that data to fine-tune the LLM based system for their LLM ‘off-the-shelf’
minimisation has an impact on specific use cases. Self-developed LLM
the most appropriate lawful Agentic LLM
basis (e.g., legitimate interest
under Article 6(1)(f)) and Art. 25
Data protection by design and
by default
206
Garante per la Protezione dei dati personali (DPDP), ‘Intelligenza artificiale: il Garante privacy blocca DeepSeek’ (2025) https://www.garanteprivacy.it/home/docweb/-/docweb-
display/docweb/10097450?mkt_tok=MTM4LUVaTS0wNDIAAAGYXIH0PW4qTzz-TKclqJPRoyU5yZoUVox1JLxNIcVP7RTnC_bvlu_rRyXg8hy6RdOqFw9BgFYU8wXP1XmPVVBTU7DCNt1660jK9umFkCSnLY4e#english
55
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
When assessing the risks associated with LLMs, it is crucial to consider broader issues linked to GDPR
principles such as lawfulness, fairness, transparency, and accountability. In addition to privacy
concerns, also issues related to copyright, overreliance and manipulation must be addressed.
To align with these principles, LLM developers must actively monitor outputs, address potential
biases208, by using high-quality and unbiased training data, and provide user-friendly, comprehensible
information about the system's decision-making processes. These steps not only ensure compliance
with GDPR but also uphold fairness and transparency, fostering trust in AI technologies and
safeguarding individual rights.
Copyright209
LLMs trained on web-scraped or publicly available data often include copyrighted materials, raising
concerns about intellectual property violations. Outputs generated by such models may
unintentionally replicate protected content, creating legal risks for both providers and deployers.
These issues highlight the importance of ensuring that data used to train LLMs is collected and
processed lawfully and in accordance with copyright laws.
207
EDPB, ‘Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models,
Adopted on 17 December 2024, (2024) https://www.edpb.europa.eu/our-work-tools/our-documents/opinion-board-art-64/opinion-282024-
certain-data-protection-aspects_en
208
Lareo, X. ‘Large Language Models’, EDPS, (n.d) https://www.edps.europa.eu/data-protection/technology-monitoring/techsonar/large-
language-models-llm_en
209
European Innovation Council and SMEs Executive Agency, ‘Artificial intelligence and copyright: use of generative AI tools to develop new
content’ (2024) https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/artificial-intelligence-and-copyright-use-generative-
ai-tools-develop-new-content-2024-07-16-0_en
210
Jacobi, O., ‘The Risks of Overreliance on Large Language Models (LLMs)’ (2024) https://www.aporia.com/learn/risks-of-overreliance-on-
llms/
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Various risk management methodologies are available for classifying and assessing risks. This
document does not aim to prescribe or define a specific methodology, as the choice should be
determined by each organization. However, for the purposes of this document, we will reference
international standards previously highlighted in the WP29215 and the AEPD216 guidelines as well as the
work being currently done in European AI standardization.
211
Note: In this document, we use the term "probability" instead of "likelihood" to align with terminology found in definitions like the one for
risk in the AI Act. While in risk management, "likelihood" typically indicates a qualitative approach to managing risks, "probability" implies a
quantitative method of risk assessment.
212 European Center for Not-for-Profit Law, ‘Framework for Meaningful Engagement: Human rights impact assessments of AI’ (2023)
https://ecnl.org/publications/framework-meaningful-engagement-human-rights-impact-assessments-ai
213
O’Neil, C. ‘Algorithmic Stakeholders: An Ethical Matrix for AI’ (2020) https://blog.dataiku.com/algorithmic-stakeholders-an-ethical-matrix-
for-ai
214
Article 29 Data Protection Working Party, ‘Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing
is “likely to result in a high risk for the purposes of Regulation 2016/679’ (2017) https://ec.europa.eu/newsroom/article29/items/611236/en
215
ISO 31000:2009, Risk management — Principles and guidelines, International Organization for Standardization (ISO); ISO/IEC 29134,
Information technology – Security techniques – Privacy impact assessment – Guidelines, International Organization for Standardization (ISO).
216
ISO 31010:2019, Risk management — Risk Assessment Techniques, International Organization for Standardization (ISO)
57
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
This equation highlights that risk is determined by the probability of an event occurring, combined
with the potential impact or severity of the resulting harm.
Risk is defined in the GDPR (Recital 75) as the potential harm to the rights and freedoms of natural
persons, of varying probability and severity, arising from personal data processing. Similarly, the AI Act
(Article 3) defines risk as ‘the combination of the probability of an occurrence of harm and the severity
of that harm;’.
To evaluate the level of data protection and privacy risks when procuring, developing, or using LLMs, it
is essential to estimate both the probability and severity of the identified risks materializing.
Probability determination must be tailored to the specific risks and use cases under assessment. While
this general matrix provides a structured approach, applying more detailed criteria can enhance the
accuracy of the probability assessment.
In the table below, there is an example of criteria217 that can guide this process, helping to refine the
evaluation of probability for specific scenarios. Note that some criteria relate to system-level
attributes while other are context-specific.
PROBABILITY LEVELS
Criteria Description Level 1 (Unlikely) Level 2 Level 3 Level 4
(Low) (High) (Very High)
1. Frequency How often the AI system The system is The system is The system is The system is used
of Use is used, increasing rarely used or has occasionally used frequently used and continuously or in real-
exposure to potential risk infrequent but not in critical integrated into time critical operations
affecting reliability interactions (e.g., operations (e.g., important (e.g., daily).
(expected time before annual or less). monthly). operations (e.g.,
failure) weekly).
2. Exposure to The extent to which the The system is not The system The system is used The system operates in
High-Risk AI system operates in used in sensitive operates in in high-stakes highly sensitive or
Scenarios sensitive or high-stakes or high-stakes moderately environments with critical environments
environments. scenarios. sensitive potential for (e.g., healthcare,
environments significant impact. security).
with minimal
stakes.
217
Barberá, I. "FRASP, A Structured Framework for Assessing the Severity & Probability of Fundamental Rights Interferences in AI Systems"
(2025)
58
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
3. Historical Past instances of similar No similar risks or Few similar risks Similar risks or Frequent and
Precedents risks or failures in the failures have or failures have failures have significant risks or
same or comparable AI occurred in occurred in occurred frequently failures have occurred
systems. comparable comparable in comparable in comparable
systems. systems. systems. systems.
4. External, uncontrollable External External External conditions External conditions
Environmental conditions affecting conditions are conditions often impact the severely affect the
Factors system performance or stable and do not occasionally system's system's performance,
reliability (e.g., political impact the affect the performance, creating constant risks.
instability, regulatory system's system's creating
gaps, financial performance. performance but vulnerabilities.
constraints). are manageable.
5. System The degree to which the The system is The system is The system has The system lacks
Robustness AI system is resistant to highly robust with moderately some robustness robustness,
failure or unintended multiple robust with some but contains safeguards, or is prone
behaviour. redundancies and redundancies but significant to frequent failures.
safeguards. occasional vulnerabilities or
vulnerabilities. weak safeguards.
6. Data The extent to which the Data is highly Data is mostly Data is partially Data is significantly
Quality and AI system relies on accurate, accurate and accurate or inaccurate, biased, or
Integrity accurate, unbiased, and unbiased, and complete but has complete, with incomplete, leading to
complete data. complete with occasional minor notable biases or high risk.
Modifiable through minimal risk of biases or errors. inconsistencies.
better dataset curation errors.
or validation.
7. Human How human operators’ Operators are Operators are Operators are Operators are
Oversight and skills and decision- highly trained, moderately undertrained or untrained or
Expertise making affect system experienced, and trained and inconsistent, ineffective, causing
reliability and risk consistently effective, but leading to regular frequent and severe
probability. Modifiable effective in occasional errors errors in decision- errors.
through training or decision-making. occur. making.
oversight improvements.
To use the criteria for determining the probability of risks you can do the following:
o Weighted Average: Assign more importance to certain factors by weighting them before
averaging.
o Simple Average: Treat all factors equally and calculate the mean.
59
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
This mapping provides a clear, categorized probability level for each risk, which helps prioritize risks
based on their potential occurrence. We will explore later in this document how this framework can
be applied in practice in one specific use case.
Note that the HUDERIA219 risk management methodology, developed by the Committee on Artificial
Intelligence of the Council of Europe, also employs a four-level severity matrix. However, it uses slightly
different terminology, as shown in this matrix (italicized): Catastrophic Harm, Critical Harm, Serious
Harm, and Moderate or Minor Harm.
Similar to the assessment of probability, the assessment of severity can also benefit from the use of
different severity criteria220 to reduce subjectivity in the process. The severity criteria are related to a
loss of privacy that is experienced by the data subject but that may have further related consequences
impacting other individuals and/or society.
The table below outlines different severity221 criteria. The calculation of severity can follow the same
steps as those used for determining probability, including aggregating scores and mapping them to
severity levels. However, for severity, certain criteria (numbers 1 to 5, and 7 & 8) act as "stoppers" This
means that the end score will always be the highest one from those criteria no matter what the
218 AEPD, ‘Risk Management and Impact Assessment in Processing of Personal Data’ p - 77 (2021) https://www.aepd.es/guides/risk-
management-and-impact-assessment-in-processing-personal-data.pdf
219
Council of Europe (CAI), ‘Methodology for the Risk and impact assessment of Artificial Intelligence Systems from the point of view of
human rights, democracy and the rule of law (Huderia Methodology)’ (2024) https://rm.coe.int/cai-2024-16rev2-methodology-for-the-risk-
and-impact-assessment-of-arti/1680b2a09f
220
(...) 7/ Risks, which are related to potential negative impact on the data subject’s rights, freedoms and interests, should be determined
taking into consideration specific objective criteria such as the nature of personal data (e.g. sensitive or not), the category of data subject
(e.g. minor or not), the number of data subjects affected, and the purpose of the processing. The severity and the probability of the impacts
on rights and freedoms of the data subject constitute elements to take into consideration to evaluate the risks for individual’s privacy’’, p.4,
Article 29 Working Party (WP 208) ‘’Statement on the role of risk-based approach in data protection legal frameworks’’, Adopted on 30 May
2014, https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp218_en.pdf
221
See footnote 217
60
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
aggregation score is. For instance, if any of these criteria are assessed at the highest level (4), the overall
severity score is immediately assigned a level 4. This approach ensures that critical harms, such as those
involving irreversible damage, are appropriately prioritized and flagged for immediate and
comprehensive mitigation measures.
61
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
SEVERITY LEVELS
Level 1 (Very Limited) Level 2 (Limited) Level 3 (Significant) Level 4 (Very Significant)
Criteria Description Moderate or Minor Harm Serious Harm Critical Harm Catastrophic Harm
Moderate or minor prejudices or Serious prejudices or impairments Critical prejudices or impairments in Catastrophic prejudices or
impairments in the exercise of in the exercise of fundamental the exercise of fundamental rights impairments in the exercise of
fundamental rights and freedoms rights and freedoms that lead to and freedoms that lead to the fundamental rights and freedoms
that do not lead to any significant, the temporary degradation of significant and enduring that lead to the deprivation of the
enduring, or temporary human dignity, autonomy, physical, degradation of human dignity, right to life; irreversible injury to
degradation of human dignity, psychological, or moral integrity, autonomy, physical, psychological, physical, psychological, or moral
autonomy, physical, psychological, or the integrity of communal life, or moral integrity, or the integrity integrity; deprivation of the
or moral integrity, or the integrity democratic society, or just legal of communal life, democratic welfare of entire groups or
of communal life, democratic order or that harm to the society, or just legal order. communities; catastrophic harm to
society, or just legal order. information and communication democratic society, the rule of law,
environment. or to the preconditions of
democratic ways of life and just
legal order; deprivation of individual
freedom and of the right to liberty
and security; harm to the
biosphere.
1. Nature of the This criterion evaluates the The fundamental right affected is The fundamental right affected is The fundamental right affected is The fundamental right affected is
fundamental nature of the fundamental highly limited in scope and moderately limited, meaning minimally limited, meaning absolute and non-derogable,
right and Legal right affected—whether it is applicability, meaning it is restrictions are lawful but subject restrictions are only lawful under meaning no lawful restriction is
limitation absolute or subject to frequently subject to lawful to stricter justification exceptional and tightly controlled permitted under any circumstances.
alignment limitations—and assesses the restrictions with minimal requirements and more specific circumstances. Alternatively, the use case does not
extent to which the AI system's requirements to justify the conditions. The use case partially aligns with align with lawful and proportionate
use case aligns with lawful and interference. The use case aligns with legal lawful exceptions, but there are limitations, even if the right is not
proportionate restrictions. The use case clearly and fully aligns limitations, but the interference uncertainties about the absolute, causing severe violations of
Absolute rights are non- with permitted legal limitations, requires a moderate level of proportionality, necessity, or legal or normative frameworks.
derogable and cannot be and the interference is routine and justification, such as demonstrating legitimacy of the interference,
restricted under any widely accepted, without causing proportionality and necessity, causing possible major violations of * The Charter does not explicitly
circumstances, while other significant violations of legal or otherwise causing possible minor legal or normative frameworks. identify the rights that are absolute.
rights may be limited only if normative frameworks. violations of legal or normative Based on the Charter explanations,
the interference meets strict frameworks. the ECHR and the case law of the
legal, proportionality, and European courts, it is submitted that
necessity requirements. This human dignity (Article 1 of the
criterion helps determine the Charter), the prohibition of torture
severity of the impact based and inhuman or degrading treatment
on the degree of misalignment or punishment (Article 4 of the
or violation of the right's Charter), the prohibition of slavery
protections. and forced labour (Article 5(1) and
(2) of the Charter), internal freedom
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
of thought, conscience and religion
(Article 10(1) of the Charter), the
presumption of innocence and right
of defence (Article 48 of the
Charter), the principle of legality
(Article 49(1) of the Charter) and the
right not to be tried or punished
twice in criminal proceedings for the
same criminal offence (Article 50 of
the Charter) can be considered
absolute rights.
2. Nature of This criterion assesses the Non-sensitive, publicly available Moderately sensitive data (e.g., Highly sensitive data The most sensitive data & special
personal data sensitivity of the personal data data (e.g., anonymized data, public financial data, browsing history). category of data e.g., genetic data,
being processed, considering records). psychological profiles, biometrics or
its potential to cause harm if data revealing criminal history.
misused. Special category of
data (e.g., health, biometric, or
genetic data) poses greater
risks to fundamental rights like
privacy and autonomy.
3. Category of This criterion evaluates the Data subjects are not vulnerable Data subjects include some Data subjects include individuals in Data subjects are highly vulnerable
Data Subject vulnerability of the individuals (e.g., adults in routine, non- individuals in potentially vulnerable sensitive or high-risk roles (e.g., (e.g., minors, persons with
(e.g., minor or whose data is being processed. sensitive contexts). groups (e.g., employees, journalists, activists). disabilities, or persecuted groups).
not) Vulnerable groups (e.g., customers).
minors, marginalized
communities) face greater risks
of harm from data misuse.
4. Purpose of This criterion evaluates the Clearly legitimate and Legitimate purposes with moderate Legitimate purposes but with Unlawful, unclear, or disproportional
Processing legitimacy, necessity, and proportionate purposes with risks or indirect impacts (e.g., questionable proportionality or purposes with significant risks (e.g.,
proportionality of the purpose minimal risks (e.g., operational targeted marketing). necessity (e.g., profiling for credit surveillance, discriminatory
for which personal data is purposes). scoring). profiling).
being processed. Unlawful or
disproportionate purposes
increase severity.
5. Scale of Impact The breadth of the Impact is limited to a small, Impact is limited to specific groups Impact spans multiple groups or Impact is widespread, affecting
(Societal, Group, infringement across societal, localized group or individual. Fewer or a small societal segment. societal domains. Between 1,000 societal, group, and individual levels.
Individual) & group, and individual levels. than 100 individuals affected. Between 100 and 1,000 individuals and 100,000 individuals affected. Over 100,000 individuals affected.
Number of Data This criterion considers the affected.
Subjects Affected scale of the impact based on
the number of individuals
whose data is affected.
63
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
6. Contextual and How specific contextual factors Context or domain does not Context or domain moderately Context or domain significantly Context or domain profoundly
Domain or domains intensify the amplify the severity of the amplifies the severity of the amplifies the severity of the amplifies the severity of the
Sensitivity interference's severity. interference with the fundamental interference with the fundamental interference with the fundamental interference with the fundamental
Includes circumstantial risks right. right. right. right.
like socio-political instability
and if children and other
vulnerable groups are affected.
7. Reversibility, The difficulty or feasibility of Harm is fully reversible within a Harm is reversible with moderate Harm is difficult to reverse, Harm is irreversible, with no feasible
recovery, degree reversing harm and the time short period with minimal effort. effort over a reasonable requiring significant effort or means of recovery.
of remediability required for recovery. Includes timeframe. resources.
prohibitive risks where harm is
irreversible.
8. Duration and The length of time and Adverse effects are minimal and do Adverse effects persist briefly but Adverse effects persist for a Adverse effects are permanent or
Persistence of persistence of adverse effects not persist over time. do not result in long-term considerable period and can affect persist indefinitely.
Harm caused by the interference. consequences. multiple groups.
9. Velocity to The speed at which the risk Risk materialises gradually, Risk materialises at a moderate Risk materialises suddenly, leaving Risk materialises rapidly, with no
materialise materialises: gradual, sudden, providing sufficient time for pace, allowing for corrective limited time for intervention. opportunity for intervention.
continuously changing. intervention. measures.
10. Transparency The degree of system System is highly transparent with System lacks some transparency System lacks transparency and has System is entirely opaque, with no
and mechanisms transparency and mechanisms clear and effective accountability but has basic accountability weak accountability mechanisms. mechanisms for accountability.
for Accountability for accountability. mechanisms. mechanisms.
11. Ripple and The extent to which the No cascading effects; the risk is Minimal cascading effects; impacts Notable cascading effects; impacts Severe cascading effects; impacts
Cascading Effects interference triggers additional isolated and contained. are mostly contained. extend across domains. propagate extensively.
harms across systems or
domains.
64
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
A matrix, as shown below, serves as a practical tool to obtain these classifications, offering a clear and
structured ranking to prioritize risks and guide appropriate mitigation strategies. This classification is a
critical step in the next risk treatment process because it ensures that resources are directed toward
addressing the most pressing risks effectively.
Best practices in risk management suggest that the mitigation of very high and high level risks should
be prioritized. 222 Once these critical risks are identified, the next essential step is to develop and
implement a risk treatment plan.
222
Oliva, L., ’ Successfully managing high-risk, critical-path projects’ (2003) https://www.pmi.org/learning/library/high-risk-critical-path-
projects-7675
223
Marsden, E. ’Risk acceptability and tolerability’ (n.d) https://risk-engineering.org/static/PDF/slides-risk-acceptability.pdf
224
Science Direct, ‘Definition of Residual Risk’ (2019) https://www.sciencedirect.com/topics/engineering/residual-risk
225
Article 5(2), Recital 74, GDPR
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
o Evaluate the type of risk and the available mitigation measures that can be implemented.
o Compare the potential benefits gained from implementing the mitigation against the costs and
efforts involved and the potential impact.
o Assess the impact on the intended purpose of the LLM system's implementation.
o Consider the reasonable expectations of individuals impacted by the system.
o Perform a trade-off analysis to evaluate the impact of potential mitigations on aspects such as
performance, transparency, and fairness, ensuring that processing remains ethical and
compliant based on the specific use case.
Analyzing these criteria is essential for effective risk mitigation and risk management planning, providing
clarity on whether specific mitigation efforts are justifiable. In all cases, the chosen treatment option
should be clearly justified and thoroughly documented to ensure accountability and compliance.
The most common risk treatment criteria are: Mitigate, Transfer, Avoid and Accept.
For each identified risk one of the criteria options will be selected:
Mitigate – Implement measures to reduce the probability or the severity of the risk.
Transfer – Shift responsibility for the risk to another party (e.g., through insurance or
outsourcing).
Avoid – Eliminate the risk entirely by addressing its root cause.
Accept – Decide to take no action, accepting the risk as is because it falls within acceptable limits
as defined in the risk criteria.
Deciding whether a risk can be mitigated involves assessing its nature, potential impact, and available
mitigation measures such as implementing controls, adopting best practices, modifying processes, or
using tools to reduce the probability or severity of the risk.
Not all risks can be fully mitigated. Some risks may be inherent and cannot be entirely avoided. In such
cases, the objective is to reduce the risk to an acceptable level or implement risk mitigation and control
measures that effectively manage its impact.
It is also important to maintain a dynamic risk register, containing risk records that are durable, easily
accessible, clear, and that are consistently updated to ensure accuracy and relevance 227.
Risks should also have clear ownership assigned, and regular reviews should be conducted to ensure
that risk management practices remain proactive.
226
Centre for Information Policy Leadership, ‘Risk, High Risk, Risk Assessments and Data Protection Impact Assessments under the GDPR.
GDPR Interpretation and Implementation Project’ (2016)
https://www.informationpolicycentre.com/uploads/5/7/1/0/57104281/cipl_gdpr_project_risk_white_paper_21_december_2016.pdf
227
Ofcom, ‘Protecting people from illegal harms online’ (2024) https://www.ofcom.org.uk/siteassets/resources/documents/online-
safety/information-for-industry/illegal-harms/volume-1-governance-and-risks-management.pdf?v=387545
66
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
228
EDPS, ‘TechSonar 2025 Report’ (2025) https://www.edps.europa.eu/data-protection/our-work/publications/reports/2024-11-15-techsonar-report-2025_en
229
This could be done by performing a pentest and/or requesting pentest results to the vendor.
230 AI Action Summit, ‘International AI Safety Report on the Safety of Advanced AI’ , p - 167, (2025)
https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf
231
Feretzakis, G et al., ‘V.S. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review’(2024). https://doi.org/10.3390/info15110697
232
Examples of Memorization methods: https://blog.kjamistan.com/category/ml-memorization.html
233
OWASP, ‘OWASP Top 10 for LLM Applications 2025’ (2025) https://genai.owasp.org/llm-top-10/
234
Shamsabadi, S.A. et al., ’ Identifying and Mitigating Privacy Risks Stemming from Language Models’ (2024) https://arxiv.org/html/2310.01424v2
235
Shokri et al., ‘Membership Inference Attacks Against Machine Learning Models’ (2017) https://arxiv.org/abs/1610.05820
236
Zhang et al.,’Generative Model-Inversion Attacks Against Deep Neural Networks’, (2020) https://arxiv.org/abs/1911.07135
237
Guo, J. et al., ‘Practical Poisoning Attacks on Neural Networks’, (2020) https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720137.pdf
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Employees and users are trained on security best practices.
Effective RAG systems require careful model alignment to prevent unauthorized access and sensitive data exposure.
Integration with multiple data sources necessitates robust security measures to ensure confidentiality and data
integrity, while adhering to data protection principles like necessity and proportionality. For outsourced RAG models
involving personal data transfer, compliance with GDPR's data transfer rules is critical to maintaining confidentiality
and legal obligations.238
2. Misclassifying training data as Implement robust testing and validation processes to ensure that (i) personal data associated with the training data
anonymous by controllers when it cannot be extracted from the model using reasonable means, and (ii) any outputs generated by the model do not
contains identifiable information, link back to or identify data subjects whose personal data was used during training.
leading to failure to implement This assessment should be done taking into account ‘all the means reasonably likely to be used’ considering
appropriate safeguards for data objective factors such as:241
protection. (partly relating to risk 3) o The characteristics of the training data, the AI model, and the training procedure.
o The context in which the AI model is released or processed. 242
Whenever information relating to o The availability of additional information that could enable identification.
identified or identifiable individuals o The costs and time required to access such additional information, if not readily available.
whose personal data was used to train o Current technological capabilities and potential future advancements.
the model may be obtained from an AI Implement alternative approaches to anonymization if they provide an equivalent level of protection, ensuring they
model with means reasonably likely to align with the state of the art.
be used239, it may be concluded that Implement structured testing against state of the art attacks such as attribute and membership inference,
such a model is not anonymous.240 exfiltration, regurgitation of training data model inversion, or reconstruction attacks.
Document and retain evidence to demonstrate compliance with these safeguards following accountability
obligations under Article 5(2) GDPR. Documentation should include:
o Details of DPIAs, including assessments and decisions on their necessity.
o Advice or feedback from the DPO (if applicable).
o Information on technical and organizational measures to minimize identification risks during the model
design, including threat models and risk assessments for training datasets (e.g., source URLs and
safeguards).
o Measures taken throughout the AI model lifecycle to prevent or verify the absence of personal data in the
model.
o Evidence of the model's theoretical resistance to re-identification techniques, including metrics, testing
reports, and analysis of attack resistance (e.g., regurgitation, membership inference).
o Documentation provided to controllers and data subjects detailing measures to reduce identification risks
and addressing potential residual risks.
238
https://www.edps.europa.eu/data-protection/our-work/publications/reports/2024-11-15-techsonar-report-2025_en
239
Membership Inference Attacks and Model Inversion Attacks.
240
EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted on 17 December 2024
241
idem
242
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
68
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
3. Unlawful processing of personal data Document all training data sources (e.g., book databases, websites) to ensure accountability under Art. 5(2) GDPR.
in training sets Check training data for statistical distortions or biases and make necessary adjustments.
Exclude training data that includes unauthorized content, such as fake news, hate speech, or conspiracy theories.
Exclude content from publications that may contain personal data posing risks to individuals or groups, such as
those vulnerable to abuse, prejudice, or harm.
Remove unnecessary personal data (e.g., credit card numbers, email addresses, names) from the training dataset.
243
Employ methodological choices that significantly reduce or eliminate identifiability, such as using regularization
methods to enhance model generalization and minimize overfitting.
Implement robust privacy-preserving techniques, such as differential privacy.244
When using web scraping as a method to collect data, ensure compliance with Article 6(1)(f) GDPR by conducting a 249
thorough legal assessment. This includes evaluating:
o (i) the existence of a legitimate interest for data processing. Interest should be lawful, clearly articulated
and real, not speculative.
o (ii) the necessity of the processing, ensuring that personal data collected is adequate, relevant, and limited
to what is necessary for the stated purpose245, and
o could not reasonably be fulfilled by other means’
o (iii) a careful balancing of interests, where the fundamental rights and freedoms of data subjects are
weighed against the legitimate interests of the data controller.
Consideration should also be given to the reasonable expectations of data subjects regarding the use of their
data.246
Involve the DPO in the balancing test, where applicable247.
For web scraping, assess whether the exemption under Article 14(5)(b) applies, ensuring all criteria are met to justify
not informing each data subject individually.
Transparency:
Provide public and easily accessible information that goes beyond GDPR requirements under Articles 13 and 14,
including details about collection criteria and datasets used, with special consideration for protecting children and
vulnerable individuals.
Use innovative approaches to inform data subjects, such as media campaigns, email notifications, graphic
visualizations, FAQs, transparency labels, model cards, and voluntary annual transparency reports.248
Implement an opt-out list managed by the controller, enabling data subjects to object to the collection of their data
from specific websites or platforms by providing identifying information before data collection begins.
243 Bavarian State Office for Data Protection Supervision, ‘Data protection compliant Artificial intelligence Checklist with test criteria according to GDPR’
244 EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models, Adopted on 17 December 2024.
245
Recital 39 GDPR clarifies that ‘Personal data should be processed only if the purpose of the processing could not reasonably be fulfilled by other means’
246
EDPB Report of the work undertaken by the ChatGPT Taskforce (2024)
247
EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted on 17 December 2024.
248
Idem
249
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
69
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
4. Unlawful processing of special For the lawful processing of special categories of personal data, ensure that an exception under Article 9(2) GDPR
categories of personal data and data applies250. When relying on Article 9(2)(e), confirm that the data subject explicitly and intentionally made the data
relating to criminal convictions and publicly accessible through a clear affirmative action. The mere fact that personal data is publicly accessible does
offences in training data. not imply that the data subject has manifestly made such data public251.
Given the challenges of case-by-case assessment in large-scale web scraping, implement safeguards such as filtering 253
to exclude data falling under Article 9(1) GDPR both during and immediately after data collection.
Maintain robust documentation and proof of these measures to comply with accountability requirements under
Articles 5(2) and 24 GDPR.252
250
EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted on 17 December 2024: “The EDPB recalls the prohibition of Article 9(1) GDPR
regarding the processing of special categories of data and the limited exceptions of Article 9(2) GDPR. In this respect, the Court of Justice of the European Union (“CJEU”) further clarified that ‘where a set of data containing
both sensitive data and non-sensitive data is [...] collected en bloc without it being possible to separate the data items from each other at the time of collection, the processing of that set of data must be regarded as being
prohibited, within the meaning of Article 9(1) of the GDPR, if it contains at least one sensitive data item and none of the derogations in Article 9(2) of that regulation applies’ . Furthermore, the CJEU also emphasised that ‘for
the purposes of the application of the exception laid down in Article 9(2)(e) of the GDPR, it is important to ascertain whether the data subject had intended, explicitly and by a clear affirmative action, to make the personal
data in question accessible to the general public’ . These considerations should be taken into account when processing of personal data in the context of AI models involves special categories of data.”
251 EDPB, Report of the work undertaken by the ChatGPT Taskforce
252 Idem
253
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
254
Information Commissioner Officer (ICO) ‘Generative AI third call for evidence: accuracy of training data and model outputs’ (2025) https://ico.org.uk/about-the-ico/what-we-do/our-work-on-artificial-
intelligence/generative-ai-third-call-for-evidence/
255
Idem
257
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
70
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
To mitigate the risk of adverse impacts on data subjects and fundamental rights in the context of LLMs, accuracy256
and reliability must be prioritized throughout the system lifecycle.
Ensure that training datasets are diverse and representative of different demographic groups to reduce biases
inherent in the data.
Conduct regular audits and fairness tests and incorporate human review in sensitive decisions to ensure fairness and
accountability.
Use explainability frameworks to analyze and understand how decisions are made, what helps in identifying
potential sources of bias.
6. Not providing human intervention Human oversight should be integrated into decision-making processes where the outputs of LLMs could lead to legal
for a processing that can have a legal or or significant consequences for individuals258. This includes ensuring that automated decisions are subject to review
important effect on the data subject. by qualified personnel who can assess the fairness, accuracy, and relevance of the outputs.
Clear escalation procedures should be in place for cases where automated outputs appear ambiguous, erroneous, or
potentially harmful.
Developers and deployers must design systems to flag high-risk outputs for mandatory human intervention before
any action is taken259.
Transparency mechanisms should also be implemented260, ensuring data subjects are informed about the use of
LLMs, the capabilities and limitations of the model261, the processing of personal data through the model and their
right to contest decisions or seek human review.
Regular training for staff involved in oversight can further enhance compliance and accountability.
Implement Article 29 Working Party (“WP29”) Guidelines on Automated individual decision-making and Profiling for
the purposes of Regulation 2016/679, as last revised and adopted on 6 February 2018, endorsed by the EDPB on 25
May 2018. See also, CJEU judgment of 7 December 2023, Case C-634/21, SCHUFA Holding and Others
(ECLI:EU:C:2023:957).
7. Not granting data subjects their right The right to object under Article 21 GDPR applies and should be ensured when the legal basis is legitimate
to object, rectification, and erasure. interest262. In such a case, providers should implement mechanisms to grant this right. Some measures to
implement when collecting personal data could be263:
o Introduce a reasonable period between the collection of a training dataset and its use, allowing data
subjects time to exercise their rights.
o Provide an unconditional opt-out mechanism for data subjects before processing begins.
o Permit data subjects to request data erasure, even beyond the specific grounds listed in Article 17(1)
GDPR.
256
AI Model Code, ‘Evaluating language models for accuracy and bias’ (2024) https://aimodelcode.org/tech-info/llm-eval/
258
Lumenova, ‘The Strategic Necessity of Human Oversight in AI Systems’ (2024) https://www.lumenova.ai/blog/strategic-necessity-human-oversight-ai-systems/
259
Kuriakose, A.A., ’ The Role of Human Oversight in LLMOps’ (2024) https://www.algomox.com/resources/blog/what_is_the_role_of_human_oversight_in_llmops/
260 Garante per la Protezione dei Dati Personali (GDPD), ‘ChatGPT, il Garante privacy chiude l’istruttoria. OpenAI dovrà realizzare una campagna informativa di sei mesi e pagare una sanzione di 15 milioni di euro’ (2024)
https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/10085432?mkt_tok=MTM4LUVaTS0wNDIAAAGX5pUM0HSpbBgVFc2wv7uGKk23174FM2-
cFJBvVD0FDGJCM_27RuQFPm2uSB80ihorQ2e0YWwgCPRFngJDRE4b7N_pWRz873q84sJ8ZWucdQOh#english
261
EDPB Report of the work undertaken by the ChatGPT Taskforce (2024)
262
Note that according to Art. 21(1) GDPR, “The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights
and freedoms of the data subject or for the establishment, exercise or defence of legal claims.”
263
EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted on 17 December 2024
71
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
o Claim Handling: Enable data subjects to report instances of personal data regurgitation or memorization,
with mechanisms for controllers to assess and apply unlearning techniques to resolve such claims.
Mitigating non-compliance with GDPR concerning data subjects' rights to rectification and erasure involves exploring
machine unlearning techniques264. These approaches aim to remove the influence of data from a trained model
upon request, addressing concerns about data use, low-quality inputs, or outdated information.
o Exact unlearning seeks to entirely eliminate the influence of specific data points, often through retraining
or advanced methods that avoid full retraining. Techniques like Sharded, Isolated, Sliced, and Aggregated
(SISA) training divide data into subsets, simplifying data removal while striving to maintain model
robustness. Approximate unlearning attempts to reduce the impact of specific data points by adjusting
model weights or applying correction factors, offering a trade-off between precision and efficiency.
While these methods hold promise, challenges remain, including maintaining model accuracy and avoiding
unintended biases post-unlearning. Certified removal, which provides verifiable guarantees of data removal using
mathematical proofs, offers a rigorous but resource-intensive solution. As unlearning techniques evolve, they play a
crucial role in enabling compliance with GDPR while preserving the integrity and fairness of machine learning 269
models.265
Implement mechanisms to delete personal data, such as names, ensuring their removal(block)266 is comprehensive
and context-agnostic across the dataset. Recognize that this approach might result in the deletion of the name for
all individuals with the same identifier, regardless of the context. To mitigate unintended consequences, use precise
filtering techniques to differentiate between contexts where the name is personally identifiable and generic. To
prevent misuse or reintroduction of deleted data, secure filter scripts or prompts by restricting access to authorized
personnel only, employing encryption, and maintaining version control. Regularly audit these scripts to ensure they
are up to date and free from vulnerabilities.
It is also important to in particular with regard to Article 21 GDPR, to establish mechanisms to comply with the
requests of users that object to the processing of their personal data based on legitimate interest.267
For deletion requests under Art. 17 GDPR, assess whether personal data can be directly identified or derived from
the AI model and implement technical deletion where feasible, such as post-training adjustments.268
8. Unlawful repurpose of personal data Ensure compliance with Article 5(1)(c) GDPR by clearly limiting personal data processing to what is necessary for
specific, well-defined purposes. Avoid overly broad purposes like "developing and improving an AI system." Instead,
specify the type of AI system (e.g., large language model, generative AI for images) and its technically feasible
functionalities and capabilities.270
264
Shrishak, K., ‘AI-Complex Algorithms and effective Data Protection Supervision Effective implementation of data subjects’ rights’ Support Pool of Experts Programme EDPB (2024)
https://www.edpb.europa.eu/system/files/2025-01/d2-ai-effective-implementation-of-data-subjects-rights_en.pdf
265 EDPS, ‘TechSonar 2025 Report’ (2025) https://www.edps.europa.eu/data-protection/our-work/publications/reports/2024-11-15-techsonar-report-2025_en
266 Surve, D., ‘Beginner’s Guide to LLMs: Build a Content Moderation Filter and Learn Advanced Prompting with Free Groq API’ (2024) https://deveshsurve.medium.com/beginners-guide-to-llms-build-a-content-moderation-
filter-and-learn-advanced-prompting-with-free-87f3bad7c0af
267
EDPB Report of the work undertaken by the ChatGPT Taskforce (2024)
268
Bavarian State Office for Data Protection Supervision, ‘Data protection compliant Artificial intelligence Checklist with test criteria according to GDPR’
269
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
270
CNIL, ‘Artificial Intelligence (AI)’ (2025) https://www.cnil.fr/en/topics/artificial-intelligence-ai
72
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Article 6(4) GDPR provides, for certain legal bases, criteria that a controller shall take into account to ascertain
whether processing for another purpose is compatible with the purpose for which personal data are initially
collected.271
When outsourcing AI training, verify legal guarantees (e.g., contracts, third-country transfer measures) and ensure
training data is not used by service providers for unauthorized purposes.
9. Unlawful unlimited storage of As user, deployer and procurement entity make agreements with the third-party provider about how long the input data
personal data and output data should be stored. This can be part of the service contract, product documentation or data processing
agreement.
If data are being stored on your premises, establish retention rules and /or a mechanism for the deletion of data.
10. Unlawful transfer of personal data As user, deployer and procurement entity, verify with the provider where the data processing is taking place.
Make the necessary safeguard diligences and when necessary, perform a Data Transfer Impact Assessment.
Make the necessary contractual agreements.
Consider this risk when making a selection among different vendors.
11. Breach of the data minimization272 Regularly review and eliminate unnecessary data collection, automating data deletion when no longer needed.
principle Replace identifiable data with anonymized or pseudonymized alternatives immediately after collection.
Apply Privacy by Design principles at every development stage, integrating data minimization measures.
Exclude data collection from websites that object to web scraping (e.g., using robots.txt or ai.txt files). 274
Limit collection to freely accessible data manifestly made public by the data subjects.
Prevent combining data based on individual identifiers unless explicitly required and justified for AI system
development.273
Educate users about providing only essential data in inputs and transparently communicate data use practices.
Evaluate whether processing personal data is strictly necessary for the intended purpose by exploring less intrusive
alternatives, such as the use of synthetic or anonymized data, and ensuring the volume of personal data processed
is proportionate to the objective.
271 EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. Adopted on 17 December 2024.
272
Processing personal data to address potential biases and errors is permissible only when it is explicitly aligned with the stated purpose, and the use of such data is necessary because the objective cannot be effectively
achieved using synthetic or anonymized data. Article 10(5) AI Act provides for specific rules for the processing of special categories of personal data in relation to the high-risk AI systems for the purpose of ensuring bias
detection and correction.
273
CNIL, ‘The legal basis of legitimate interests: Focus sheet’ (2024) https://www.cnil.fr/en/legal-basis-legitimate-interests-focus-sheet-measures-implement-case-data-collection-web-scraping
274
Deployers should verify that the provider has effectively addressed this risk. This recommendation is equally relevant in cases where deployers are involved in fine-tuning or retraining models.
73
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
After completing the feasibility assessment and implementing mitigation measures, it is essential to
reassess if there are any remaining risks. Residual risks 275 are the risks that persist after mitigation
measures have been applied.
To analyze residual risk, the probability and severity of the remaining risks are reevaluated, providing a
clear overview of the risks that remain after mitigation and taking into account 276:
Once residual risks are identified, organizations must decide whether these risks fall within acceptable
levels as defined by their risk tolerance 277 and acceptance criteria. If residual risks are deemed
acceptable, they can be formally acknowledged and documented in the risk register. However, if the
risks exceed acceptable levels, further mitigation measures must be explored and implemented as well
as documented. The process then returns to the risk treatment phase to identify the most appropriate
treatment option for the risk.
Residual risk evaluation also plays a role in the decision to release a system into production. It is
therefore important to assess whether risks remain within defined safety thresholds. Organizations may
decide then to request further testing or additional evaluations, mandate further mitigations, or
approve the model for deployment if the residual risk is acceptable.
275
NIST, ‘Definition of Residual Risk’ (2025) https://csrc.nist.gov/glossary/term/residual_risk
276
See footnote 193
277
ISO 31000:2018 Risk Management
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Regular reviews also help refine risk strategies, improve processes, and adapt to changes in legislation,
business operations, or team structures.
Continuous Monitoring
Once risk mitigation measures281 have been implemented, ongoing monitoring is essential to assess
their effectiveness and identify any emerging risks. After deployment, post-market monitoring 282 plays
a critical role in identifying new risks or changes in the operational environment that may impact
privacy. This involves the systematic collection and analysis of logs and other operational data in
compliance with GDPR requirements, ensuring transparency, accountability, and the ongoing protection
of user data.
Currently, LLMs monitoring throughout the lifecycle relies primarily on the following
techniques: 283 model testing and evaluation, red teaming, field testing, and long-term impact
278 FITT Team, ‘How Oftern Should You Review Your Risk Management Plan’ (2023) https://www.tradeready.ca/explainer/how-often-should-
you-review-your-risk-management-plan/
279
Vn Vroonhoven, J., ‘Risk Management Plans and the new ISO 14971’ BSI, (2020)
https://compliancenavigator.bsigroup.com/en/medicaldeviceblog/risk-management-plans-and-the-new-iso-14971/
280
Wikipedia, ‘Risk Register’ (2025) https://en.wikipedia.org/wiki/Risk_register
281
‘risk management measures’, Art.9 AI Act.
282
Chapter IX, Section 1 Post-market Monitoring, AI Act
283
AI Action Summit, ‘International AI Safety Report on the Safety of Advanced AI’ , p - 184, (2025)
https://assets.publishing.service.gov.uk/media/679a0c48a77d250007d313ee/International_AI_Safety_Report_2025_accessible_f.pdf
75
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
assessment. These methods help identify and evaluate emerging risks that may not have been apparent
during initial development.
Model testing and iterative evaluations are used before and after the model is deployed and is part
of an LLM system. While essential, they are insufficient on their own due to the unpredictability of
real-world scenarios and the subjectivity of certain risks. Since LLMs can be applied in numerous
contexts, it is difficult to predict how risks will manifest in practice, and as mentioned in section 2,
performance metrics and benchmarks may not always accurately reflect those real risks.
Methodologies such as red teaming284 can be used to stress-test the model before deployment and
the LLM system before and after it is in production by simulating adversarial attacks or misuse
scenarios 285 , helping to uncover vulnerabilities that might not have been identified during the
development phase.
Field testing evaluates AI risks in real-world conditions, but its implementation remains challenging
due to the difficulty of accurately replicating real-world scenarios and establishing clear success
metrics. It is important to create a representative test environment and define measurable
performance benchmarks to obtain reliable insights.
Long-term impact assessments evaluate how AI systems evolve over time, aiming to identify
unintended consequences that may emerge with prolonged deployment. Continuous monitoring
and periodic reassessments are essential to detect shifts in model behavior, performance
degradation, or emerging risks that may not have been apparent during initial testing. This
technique is part of a continuous monitoring strategy and can also be part of threat modeling
sessions.
Across all these techniques, defining robust and reliable monitoring metrics is essential. However,
current automated assessments and quantitative metrics often lack reliability and validity, 286making it
difficult to assess risks effectively. For this reason, qualitative human review also plays a crucial role in
capturing the broader sociotechnical implications of LLMs and their associated risks.
The warnings coming from the various monitoring techniques are crucial not only for post-market
monitoring but also throughout the entire AI lifecycle. The techniques, the scope of testing and the
results, will vary depending on whether evaluations are conducted before or after model training, and
before or after model and system deployment.
While results are important to help identify new risks, they can also play a key role in assessing the
probability of identified threats or hazards occurring. This provides a quantitative analysis that can be
compared against established acceptable thresholds to help determine whether further risk mitigations
are necessary.
284
Open AI, ‘Advancing red teaming with people and AI’ (2024) https://openai.com/index/advancing-red-teaming-with-people-and-ai/
285
Google Threat Intelligence Group, ‘Adversarial Misuse of Generative AI’, (2025) https://cloud.google.com/blog/topics/threat-
intelligence/adversarial-misuse-generative-ai
286
Koh Ly Wey, T., ‘Current LLM evaluations do not sufficiently measure all we need’ (2025) https://aisingapore.org/ai-governance/current-
llm-evaluations-do-not-sufficiently-measure-all-we-need/
76
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Risks
Figure 17. Evaluation techniques are used to assess the probability of identified risks
To further strengthen risk management, tools like LLMOps287 (LLM Operations) and LLMSecOps288 (LLM
Security Operations) can automate and integrate many aspects of risk management, ensuring seamless
updates, monitoring, and response to vulnerabilities. These tools enhance risk tracking and mitigation
workflows, reducing the manual documentation burden and improving overall security and
governance289 of LLM systems.
287
Databricks, ‘LLMOps’ (2025) https://www.databricks.com/glossary/llmops
288
Ghosh, B., ‘LLMSecOps Elevating Security Beyond MLSecOps’ (2023) https://medium.com/@bijit211987/llmsecops-elevating-security-
beyond-mlsecops-94396768ecc6
289 All Tech is Human x IBM Research, ‘AI Governance Workshop’ (2025)
https://static1.squarespace.com/static/60355084905d134a93c099a8/t/677c492a161e58148fc60706/1736198443181/IBM+Research+x+ATI
H+AI+Governance+Workshop.pdf
77
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
78
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
In addition, the figure highlights instruments for quality assurance and risk identification specific to each
phase. These include practices such as stakeholder collaboration, threat modeling, testing and AI red
teaming. This layered approach supports a proactive and iterative risk management strategy.
79
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Scenario: A company specialized in kitchen equipment wants to deploy a chatbot to provide general
information about its products and services to its customers. The chatbot will have access to pre-
existing customer data through integration with the customer management system (e.g., CRM
databases). This will allow the chatbot to recognize users based on identifiers like email or account
credentials and provide personalized responses without requiring users to re-enter their data. This
chatbot interface will be built using as foundation an ‘off-the-shelf’ LLM that will use RAG to acquire
the domain specific knowledge required.
1. User input → Users will interact with the chatbot directly after logging into the company by
providing their name, email address, and preferences through an interface (e.g., a website or mobile
app).
2. Data preprocessing and API interaction → User input will be validated and formatted before being
sent to the chatbot’s API for processing. The chatbot will interact with a fine-tuned off-the-shelf LLM
hosted on the cloud.
3. Retrieval-Augmented Generation (RAG) →
For queries requiring domain-specific knowledge or up-to-date context, the system performs a
retrieval step: it searches the company’s CRM, document database, or knowledge base for relevant
information. The retrieved content is then combined with the user input and passed to the LLM to
generate a grounded, personalized response.
4. Pre-Fine-Tuned LLM processing →
The chatbot uses a fine-tuned LLM trained on enterprise-specific data to enhance general language
understanding and tone alignment. This LLM uses the enriched input (from user + RAG) to personalize
outputs.
5. Data storage → Preprocessed user input (e.g., preferences) will be stored locally or in the cloud to
enable personalized recommendations and facilitate future interactions.
80
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
6. Personalized response generation → The chatbot will use stored user data from the CRM system
and the fine-tuned LLM capabilities to generate tailored recommendations and responses.
7. Data sharing → The chatbot may share minimal, (anonymized) user data with external services
(e.g., third-party APIs for additional functionality or promotional tools).
8. Feedback collection → Users provide feedback on chatbot interactions (e.g., thumbs-up/down,
comments) to improve the system’s performance. This is process by the system for analytics purposes.
9. Deletion and user rights management → Users can request access to, deletion of, or updates to
their personal data in compliance with GDPR or similar regulations.
To facilitate the risk assessment process, it is also possible to create a data flow diagram290, providing a
graphical representation of the processes, data movements, and interactions within the system.
Possible Architecture
Considering that we are at the design phase of the AI lifecycle, we anticipate that the architecture of
our LLM-based system will include the following key layers:291
User Chatbot
Business Integration CRM Extermal Security
Interface Application LLM Layer
Logic Layer System Services Layer
(UI) Layer Layer
User Interface (UI) Layer: The interface where users interact with the chatbot through text or
voice input. (e.g.; Webpage, mobile app)
Chatbot Application Layer: Manage the flow of conversation and determines chatbot responses
based on user input and context. Directs queries to the Business Logic Layer.
Business Logic Layer: Orchestrates chatbot workflows, such as checking customer profiles or
placing orders. Crucially, it decides whether to call the LLM directly or trigger a retrieval step (RAG)
— for example, by querying the CRM or knowledge base when additional context is needed before
generating a response.
Integration Layer: Contains the API Gateway to manage the transmission between layers.
Connects the chatbot to the LLM, the CRM system and external services and facilitates secure
communication and data exchange between systems. It also handles data transformation, ensures
compatibility between the chatbot and the CRM, and implements authentication and
authorization for secure access to CRM data. For the RAG setup, this layer may also route queries
to a retrieval component or knowledge base before passing enriched inputs to the LLM.
LLM Layer: Performs natural language understanding and generation. Receives either raw user
input or input enriched with retrieved content (from the RAG step). Returns contextually relevant
responses to the Business Logic Layer.
CRM System: Stores customer data, such as contact information, purchase history, preferences,
and support tickets. It also contains CRM APIs that provide endpoints to retrieve, update, or add
customer data and event handlers that trigger actions based on events, such as creating a support
ticket when a customer raises an issue through the chatbot. Supplies customer data to personalize
chatbot responses and stores data generated during interactions.
External Services Layer: It integrates with analytics tools to track user interactions and generate
insights into customer behavior. It also integrates with other services, such as payment gateways,
email services, or marketing tools.
Security Layer: It encrypts data during transmission using protocols like HTTPS and SSL/TLS,
restrict unauthorized access to the chatbot, the LLM and CRM using techniques like OAuth2,
implements security and privacy controls, vulnerability scans, threat monitoring, etc.
290
Wikipedia, ‘Data-flow diagram’ (2025) https://en.wikipedia.org/wiki/Data-flow_diagram
291
This architecture is provided as a simplified example and may vary significantly depending on the specific requirements, use case, and
technical constraints of each deployment.
81
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Having an overview of the possible architecture at this stage provides a clearer understanding of the
data flows and potential risks associated with deploying the chatbot. This architectural insight sets the
groundwork for identifying privacy and security concerns early in the process.
Stakeholder Analysis
Before starting with the risk identification process, the group should analyze the use case to
determine which stakeholders target group will interact with the chatbot and identify those who
should not have access. Designing barriers where necessary, such as an age verification mechanism,
ensures the system aligns with the intended user base. In this specific use case, the entry point for the
interface is restricted to logged-in and recognized customers, making additional barriers possibly
unnecessary. However, a comprehensive evaluation of all potential risks remains crucial to the
system's success.
Stakeholder analysis292 is a process used to identify and understand the roles, interests, and influence
of various stakeholders involved in or affected by a project. Beyond analyzing those directly engaged
with the system, it is equally important to assess which stakeholders could be negatively impacted by
the tool. This includes recognizing if vulnerable groups might be involved or if the tool's impact could
extend to a large number of individuals. Where relevant, it may be valuable to engage affected
communities in subsequent phases of risk identification to better capture context-specific concerns
and impacts. Participatory engagement tools293 like the ethical matrix, mentioned in a previous
section, can help evaluate the potential consequences for different stakeholder groups.
In our use case, we have identified our customers as the only authorized users. Given the nature of our
business, we do not anticipate children accessing our platform. However, we remain mindful of
implementing appropriate security measures to ensure that access is restricted, and unauthorized use
is prevented.
292
Rodgers, A.,’ What is a Stakeholder Impact Analysis?’, Simply Stakeholders (2024) https://simplystakeholders.com/stakeholder-impact-
analysis/
293
Park, T., Stakeholder Engagement for Responsible AI: Introducing PAI’s Guidelines for Participatory and Inclusive AI’, Partnerships on AI’
(2024) https://partnershiponai.org/stakeholder-engagement-for-responsible-ai-introducing-pais-guidelines-for-participatory-and-inclusive-
ai/
82
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
We have identified several risk factors that require attention, as they indicate a higher probability of
undesirable outcomes. While our system does not fall under the classification of a high-risk system
under the AI Act, there is, from the GDPR perspective, sufficient evidence to justify initiating 294 the
process for creating a Data Protection Impact Assessment (DPIA). This risk assessment we are
performing now will serve as a valuable foundation for the DPIA process.
It is important to emphasize when a DPIA is necessary and when a Fundamental Rights Impact
Assessment (FRIA) is required. A DPIA, under Article 35 of the GDPR, is required whenever a data
processing is likely to result in a high risk to the rights and freedoms of natural persons.295
Even when a DPIA is not explicitly required by law, conducting one can be prudent for best practices in
privacy and security. It allows organizations to preemptively address potential risks, assess the impact
of their solutions, and demonstrate accountability. In contrast, a FRIA, as outlined in Article 27 296of the
AI Act, can be mandatory for some deployers of high-risk AI systems (bodies governed by public law,
private entities providing public services, organisations doing creditworthiness evaluations, pricing and
life and health insurance risk assessments. A FRIA evaluates the potential impact of such systems on
fundamental rights like privacy, fairness, and non-discrimination. Deployers of high-risk AI systems
must document:
How the system will be used, including its purpose, duration, and frequency.
The categories of individuals or groups affected by the system.
Specific risks of harm to fundamental rights.
Measures for human oversight and governance.
Steps to address and mitigate risks if they materialize.
Where applicable, a FRIA can complement a DPIA, by focusing on broader societal impacts
beyond data protection alone.
294
EDPB, ‘Data Protection Guide for Small Business’ (2025) https://www.edpb.europa.eu/sme-data-protection-guide/be-compliant_en
295
WP 29 Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing is "likely to result in a high risk" for
the purposes of Regulation 2016/679, WP248 rev.01, (2017) endorsed by the EDPB https://ec.europa.eu/newsroom/article29/items/611236
296
Article 27(1) AI Act: “Prior to deploying a high-risk AI system referred to in Article 6(2), with the exception of high-risk AI systems intended
to be used in the area listed in point 2 of Annex III, deployers that are bodies governed by public law, or are private entities providing public
services, and deployers of high-risk AI systems referred to in points 5 (b) and (c) of Annex III, shall perform an assessment of the impact on
fundamental rights that the use of such system may produce….”
83
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
For this use case, we will integrate these questions into the risk management process as follows:
297 AEPD, ‘Technical Note: An Introduction to LIINE4DU 1.0: A New Privacy & Data Protection Threat Modelling Framework’, 2024
298 Slattery P., et al., ‘The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence’
(2024) https://arxiv.org/abs/2408.12622
299
LINDDUN, ‘Privacy Threat Modelling’ (2025) https://linddun.org/
300
AEPD, ‘Technical Note: An Introduction to LIINE4DU 1.0: A New Privacy & Data Protection Threat Modeling Framework’ (2024)
https://www.aepd.es/guides/technical-note-introduction-to-liine4du-1-0.pdf
301
PLOT4AI, ‘Practical Library of Threats 4 Artificial Intelligence’ (2025) https://plot4.ai/
302
Shostack, A., ‘The Four Question Framework for Threat Modeling’ (2024)
https://shostack.org/files/papers/The_Four_Question_Framework.pdf
84
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
From the potential privacy risks outlined in Section 3 for systems based on an LLM ‘off-the-shelf-
model,’ we have reviewed all risks across the standard data flow phases and identified that most of
these risks are covered under Risk 1 (Insufficient protection of personal data leading to a data breach)
and Risk 3 (Possible adverse impact on data subjects that could negatively impact fundamental rights)
from Section 4.
For example, in the case of Risk 2 (Misclassification of training data as anonymous), we can already
perform tests to detect the presence of personal data in our datasets. These results would help us
assess the probability of the risk occurring given the current dataset conditions.
At this stage of the AI lifecycle (pre-deployment phase), the available evaluations are limited.
However, when risk assessments take place post-development, additional evaluations can be
conducted, providing further quantitative criteria to refine risk assessment and decision-making.
Probability
We are going to assess the probability of identified risks, categorizing them into one of the four levels
in the probability matrix: Very High, High, Low, or Unlikely. This categorization should be done by
directly assigning a level to each risk based on quantitative and/or qualitative criteria and through
collaborative decision-making with stakeholders. Alternatively, we can also employ a list of predefined
criteria to guide our assessment.
For a more quantitative approach calculating probability, aggregation methods can be applied to
calculate its level. In this use case, we will use the FRASP framework to structure and refine our
probability assessment process.
Once the aggregate score is calculated, we will map it to one of the predefined probability levels
based on the following ranges:
85
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Severity
Next, we will assess the potential privacy impact and severity of these risks on data subjects,
individuals, and society. Based on this severity assessment, we will assign one of the four levels from
the severity classification matrix: Very Significant, Significant, Limited, or Very Limited.
The calculation of severity will follow the same steps as those used for determining probability.
However, for severity, the highest level obtained among criteria 1 to 5, as well as 7 and 8, will set the
total severity score.
Once the aggregate score is calculated, we will map it to one of the predefined severity levels based
on the following ranges:
In this case the final score is determined by the highest score in criteria 1, 5 and 7 giving as result Level
3 severity for all the risks.
86
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
By applying the classification matrix to the obtained probability and severity scores, we can determine
the corresponding risk classification level. In our case for all the risks the combination Low Probability
+ Significant Severity offers a result of High Risk.
Although high level risks always require treatment, it is considered best practice to assess whether
classified risks need treatment evaluating predefined acceptance criteria and acceptable metric
thresholds established by the organization. These criteria can be adjusted per use case and tailored to
pre-deployment and post-deployment phases to ensure a context-aware risk management approach.
In our specific use case, the organization's risk acceptance criteria are as follows:
A risk that can result in a violation of data protection regulations is not acceptable.
A risk of unauthorized access, exposure, or retention of persona data beyond what is strictly
necessary is not acceptable.
Re-identification risk must remain below 1%, verified through privacy-preserving evaluations
and testing.
Membership inference and model inversion attack risks must remain below a 1% success rate
as verified through internal testing and, for sensitive data, independent external audits.
Inaccurate datasets are only acceptable if the error rate does not exceed 5% and all available
data validation and cleaning processes have been applied.
The chatbot must clearly inform users when their data is being used and provide access to
data usage policies. Transparency risks are not acceptable.
No risk is acceptable if it prevents users from exercising their data rights, unless explicitly
justified under legal exceptions.
Note that Section 3 and Section 4 already outlined comprehensive mitigation measures for the
identified risks. It is also worth noting that many of the specific risks identified in this use case fall under
the broader category of Risk 1, which relates to insufficient protection of personal data.
87
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
88
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
89
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
In our case, the risk level has been reduced to Medium what means it is not yet acceptable.
Why is the risk level reduced to Medium instead of Low after having implemented all the mitigations?
The risk remains Medium despite reducing severity to Limited (level 2) because the risk matrix combines
probability and severity to determine overall risk.
With the four levels matrix that we use in this example, a Low Probability and Limited Severity result in
a Medium Risk level because, while unlikely, the consequences of a risk, though mitigated, are still non-
negligible. That means the remaining risk after mitigation measures might still be above an acceptable
threshold for your organization.
What can we do to address Residual Risk in this case? Some options that organizations can apply are:
Reduce Probability by strengthening preventive controls (e.g., access measures, anomaly detection)
and enhancing event prevention mechanisms.
Implement extra mitigations measures to reduce severity.
Implement robust monitoring and establish a clear incident response plan to minimize impact if the
risk materializes.
Explore additional mitigations: for instance, use advanced technologies (e.g., differential privacy) or
fail-safe mechanisms to further mitigate risks.
Reevaluate whether the residual risk is within organizational risk tolerance and document
justification for maintaining it.
Discuss options to share or transfer the risk (e.g., insurance, vendor agreements).
Since we are currently in the design and development phase, we should proactively plan for continuous
monitoring of our chatbot. This includes defining metrics for ongoing risk assessment, establishing
adequate data logging practices, and ensuring that an incident response plan is in place to address
potential privacy issues that may arise post-deployment.
90
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Second Use Case: LLM System for Monitoring and Supporting Student
Progress
Scenario: A school wants to adopt a third party LLM system to monitor and evaluate students'
academic performance and provide tailored recommendations for improvement. The tool is an LLM-
based system developed with and LLM ‘off-the shelf’ model. This tool would analyze a combination of
data, including test scores, assignment completion rates, attendance records, and teacher feedback,
to identify areas where students may need additional support or resources. For example, if a student
struggles with math, the tool could recommend targeted practice exercises, suggest online tutoring
sessions, or notify parents and teachers about specific challenges. The goal is to create a personalized
learning plan that helps each student achieve their full potential.
This system would deal with sensitive information about minors, including their academic records and
behavioral patterns, which introduces significant privacy and ethical risks.
This table shows how the risks identified in Section 3 and 4 (Privacy Risk Library) can be aligned with the
risks specific for this use case.
Privacy Risk Library Privacy Risks Identified and aligned with Recommended Mitigations
Library
1. Insufficient protection of Weak safeguards could lead to data Implement strong encryption protocols for
personal data what breaches, unauthorized access, or data in transit and at rest (e.g., SSL/TLS, AES-
eventually can be the cause exposure of sensitive student data. 256).
of a data breach APIs facilitating communication Regularly conduct security audits and
between the tool, school systems, and penetration testing.
third parties could be unsecured or Establish incident response plans for timely
improperly configured. detection and mitigation of breaches.
Inadequate access controls may allow Use API gateways with robust security
unauthorized school personnel or configurations, including authentication,
external parties to view sensitive access control, and rate limiting.
student data. Implement authentication and ensure secure
If the vendor does not comply with API endpoints.
data protection regulations, it Conduct regular API security reviews and
increases the risk of a data breach. validation.
91
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
The tool might interact with third- Enforce strict role-based access control
party services or platforms (e.g., (RBAC) policies.
online tutoring systems, analytics Implement multi-factor authentication (MFA)
services, or cloud-based storage) for for all users accessing sensitive data.
functionality, exposing student data to Regularly review and update user access
external entities. permissions.
Conduct vendor due diligence, including Data
Protection Impact Assessments (DPIAs) and
security certifications.
Include specific data protection clauses in
contracts with vendors, ensuring
accountability for compliance.
Require vendors to provide evidence of
GDPR-compliant practices.
Establish robust data-sharing agreements
with third-party platforms, ensuring
compliance with GDPR requirements.
Limit data shared with third parties to
anonymized or pseudonymized datasets.
Monitor third-party systems for adherence to
agreed data protection measures.
2. Misclassifying training Adversaries might exploit the LLM to infer Use differential privacy techniques to
data as anonymous by whether specific student data was used in minimize the risk of data inference.
controllers when it contains training, indicating a misclassification of Conduct structured testing against
identifiable information training data. membership inference and attribute
inference attacks.
Validate that the LLM provider has
implemented safeguards to prevent such
attacks.
3. Unlawful processing of If personal data (e.g., academic Verify that the LLM provider’s training
personal data in training records) is unlawfully processed in datasets exclude sensitive personal data
sets training datasets by the LLM provider without proper safeguards.
Behavioral and academic data require Require documentation from vendors proving
explicit consent or another valid legal that training data was lawfully collected and
basis to be processed lawfully. processed.
Use models trained on synthetic or
anonymized data when possible.
4. Unlawful processing of If health-related or behavioral data about Ensure explicit consent is obtained from
special categories of children, such as indications of mental parents or guardians before processing
personal data and data health conditions, is processed—such as children’s data.
relating to criminal when identifying special assistance needs Conduct a DPIA and identify lawful grounds
convictions and offences in for conditions like dyslexia, ADHD, or for processing.
training data. similar. Provide clear, accessible information to
parents about how data is processed.
Implement stricter safeguards for sensitive
data, including encryption and access
controls.
Limit processing to data strictly necessary for
the intended purpose.
Provide parents or guardians with
transparency about how health-related data
is used.
5. Possible adverse impact Fairness and Discrimination: Fairness and Discrimination
on data subjects that could Recommendations based on biased Regularly audit training data to identify and
negatively impact training data could disproportionately reduce biases.
fundamental rights impact certain student groups. Involve diverse stakeholders in testing the
Continuous monitoring of behavioral system for potential biases.
patterns could lead to profiling Accuracy
students in ways that might be Establish processes for regular model
discriminatory or stigmatizing. evaluation and fine-tuning using high-quality,
Accuracy: diverse datasets.
The tool might generate inaccurate Provide disclaimers with AI-generated
recommendations or reports due to recommendations, emphasizing the
importance of human oversight.
92
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
It is important to note that the list of risks and mitigations provided is based on generic information
and assumptions. In a real-world scenario, a detailed risk assessment tailored to the specific
implementation, context, and operational environment of the LLM based tool would be necessary.
This includes collaboration with stakeholders, such as the LLM system provider, school administrators,
teachers, parents, and students, to identify unique risks and address them effectively.
303
Models cards and system cards are example of information that can be provided to deployers:
Green, N et al., System Cards, a new resource for understanding how AI systems work (2022) https://ai.meta.com/blog/system-cards-a-new-
resource-for-understanding-how-ai-systems-work/; Hugging Face, ‘Model Cards’ (2024) https://huggingface.co/docs/hub/en/model-cards
93
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Scenario: A personal assistant AI agent is designed to help users manage their travel plans and daily
agendas. The agent can book flights, reserve hotels, schedule meetings, and send reminders based on
user-provided inputs and preferences. For instance, a user might ask the agent to "book a round trip
to Madrid next week and find a hotel near the Prado Museum." To fulfill this request, the agent
accesses the user’s calendar, retrieves personal preferences (e.g., preferred airlines or hotel chains),
and interacts with third-party booking platforms. This system is developed with various ‘off-the shelf’
LLMs and SLMs.
In this use case we have identified the following privacy risks and recommended mitigations that we
have aligned with our 11 foundational privacy risks.
Privacy Risk Library Privacy Risks Identified and aligned with Recommended Mitigations
Library
1. Insufficient protection of Weak safeguards could expose Encrypt user data during transmission and at
personal data what eventually sensitive personal data, such as rest.
can be the cause of a data travel itineraries, calendar entries, Implement secure APIs with rate limiting,
breach and user preferences to authentication, and monitoring to control
unauthorized access or breaches. access.
Unauthorized access due to poor Use anonymization and pseudonymization
access control mechanisms. to safeguard sensitive data.
Inference attacks where adversaries Regularly test for vulnerabilities like
exploit vulnerabilities to infer membership inference, model inversion, or
personal data not explicitly poisoning attacks.
provided. Use robust inter-agent encryption304 to
protect data exchange in multi-agent
systems.
2. Misclassifying training data Not applicable in this use case as the (Not directly applicable in this case, as the
as anonymous by controllers focus is on operational data rather than system uses pre-trained models, but
when it contains identifiable training data. (This use case is based on applicable to providers.)
information the Lifecycle Phase: Operations and Ensure robust testing and validation to
Monitoring) confirm training data anonymity claims.
Chen, G et al., ‘Encryption–decryption-based consensus control for multi-agent systems: Handling actuator faults.’, Automatica, Volume
304
94
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
305
Zainea, A.A, ‘Automated Decision-Making in Online Platforms: Protection Against Discrimination and Manipulation of Behaviour’ (2024)
95
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
10. Unlawful transfer of Cross-border data sharing risks due to Verify the location of third-party services
personal data reliance on third-party platforms or and ensure compliance with GDPR cross-
services in jurisdictions without adequate border transfer rules.
data protection standards. Perform Transfer Impact Assessments (TIAs)
for all external vendors.
Use standard contractual clauses and other
safeguards for data-sharing agreements
with third-party providers.
11. Breach of the data Excessive data collection: The system Limit data collection to what is strictly
minimization principle may collect or process more data than necessary for fulfilling user requests (e.g.,
necessary for fulfilling user requests (e.g., exclude unnecessary calendar details).
unnecessary calendar details or Implement input validation and filters to
preferences). prevent over-collection of data.
Use anonymization or pseudonymization to
minimize the risk of misuse or exposure of
collected data.
From identifying data flows to classifying risks and implementing mitigations, risk management is a
continuous iterative journey. It requires consistent monitoring, stakeholder collaboration, and
adjustments based on real-world observations and emerging technologies.
Risk management should remain adaptable, incorporating feedback and evolving alongside regulatory
and technological advancements.
As we conclude this report, it is important to reiterate that while the risk management framework
presented in this document provides guidance, every organization must customize its approach to
address the specific nuances of their LLM based use cases.
Privacy and data protection are not static goals but ongoing commitments.
96
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Example: Word Embedding Association Test (WEAT): This test measures how strongly certain
words are associated with particular groups of people, aiming to detect stereotypes in the
model’s word embeddings. For instance, comparing the proximity of words that indicate
gender (such as names or pronouns) with various career-related words can point to gender
bias in the word embeddings, such as 'man’ being represented as closer to 'doctor', and
'woman’ being embedded closer to 'nurse'. This can predict bias in the model's output as well.
Toxicity Detection
Toxicity evaluation assesses how often LLMs generate harmful, offensive, or inappropriate content.
This includes hate speech, insults, or harassment. What is considered ‘inappropriate’ content can be
context-dependent; for instance, AI systems that interact with children might have a lower threshold
for inappropriate content than adult-only systems.
Example: Toxicity Score: This metric aims to predict the probability of a piece of text being
considered 'toxic'. Usually expressed as a percentage, the closer this score is to 0, the less
likely it is for the text to be toxic. This metric is used in toxicity detection tools such as
Perspective API, aiming to detect and reduce toxicity and harmful content in textual data.
Fairness Metrics
Fairness evaluation focuses on evaluating the extent to which LLMs treat all user groups equitably
without exhibiting or perpetuating systematic biases. This is of course tricky because fairness is an
inherently complex term whose definition is debated and open to interpretation. Therefore, the
306
Verma, A., Plain English AI,’NLP evaluation: Intrinsic vs. extrinsic assessment’ Medium (2023) https://ai.plainenglish.io/nlp-evaluation-
intrinsic-vs-extrinsic-assessment-ff1401505631
97
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
chosen metrics are usually geared towards optimizing the definitions/dimensions of fairness that are
most appropriate for each given case.
Example: Demographic Parity: Initially a metric used in classification, demographic parity can
be adapted to a text output. It measures whether the model generates text that represents all
demographic groups equally in terms of frequency, sentiment, and associations. It can answer
questions such as ‘are individuals of different ethnicities represented equally positively in the
generated text?’ or ‘are women as frequently associated with high athletic performance as
men?'.
Benchmarks
Benchmarks are standardized datasets, tasks, and evaluation protocols used to measure and compare
the performance of various AI models, including LLMs. They provide a consistent framework to assess
a model's capabilities, ensuring that performance can be compared across different models, tasks, and
implementations.
Massive Multitask Language Understanding (MMLU): This benchmark evaluates the performance
of language models across a wide range of subjects to assess their general knowledge and
reasoning abilities. Models are tested on their ability to answer questions accurately. A higher
score indicates better performance. https://github.com/hendrycks/test
AlpacaEval: An LLM-based automatic evaluation based on the AlpacaFarm evaluation set, which
tests the ability of models to follow general user instructions. https://tatsu-
lab.github.io/alpaca_eval/
HellaSwag: A challenge dataset for evaluating commonsense NLI that is specially hard for state of
the art models, though its questions are trivial for humans (>95% accuracy).
https://rowanzellers.com/hellaswag/
Big-Bench (Beyond the Imitation Game Benchmark): A set of tasks designed to evaluate the
capabilities and limitations of LLMs on diverse and challenging tasks. These tasks are designed to
test abilities beyond what is evaluated by standard benchmarks, assessing abstract reasoning,
problem-solving, or the ability to handle more unconventional or complex prompts. The higher
the BIG-bench score, the better the model performs in complex tasks.
https://github.com/google/BIG-bench
98
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
AIR-BENCH 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
(https://arxiv.org/pdf/2407.17436v2) &
https://huggingface.co/datasets/stanford-crfm/air-bench-2024
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use
Capabilities. https://machinelearning.apple.com/research/toolsandbox-stateful-conversational-
llm-benchmark
LLM Guard (by Protect AI): It is a comprehensive tool designed to fortify the security of Large
Language Models (LLMs). https://llm-guard.com/
Safeguards/Guardrails in LLMs
Safeguards (or guardrails) in LLMs are mechanisms implemented to ensure that the models operate in
a safe, ethical, and reliable manner. They can be applied to various stages of the LLM pipeline (pre-
processing, training, output…), and be focused on addressing different risks. For instance, some
safeguards aim to avoid the generation of unethical, harmful or inappropriate content (so the
behavior of the model), while others focus on preserving the privacy of the owners of the data (or
other stakeholders).
Here are some examples of behavioral guardrails that aim to moderate the LLM's output and mitigate
harm that could be caused by the output without intervention:
99
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
100
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
Methodologies and Tools for the Identification of Data Protection and Privacy
Risks
Practical Library of Threats (PLOT4ai) is a threat modeling methodology for the identification of risks
in AI systems. It also contains a library with more than 80 risks specific to AI systems:
https://plot4.ai/
MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems), is a knowledge
base of adversary tactics, techniques, and case studies for machine learning (ML) systems:
https://atlas.mitre.org/
Assessment List for Trustworthy Artificial Intelligence (ALTAI) is a checklist that guides developers
and deployers of AI systems in implementing trustworthy AI principles: https://digital-
strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-
assessment
Guidance
OECD AI Language Models:
https://www.oecd.org/content/dam/oecd/en/publications/reports/2023/04/ai-language-
models_46d9d9b4/13d38f92-en.pdf
NIST GenAI Security: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-218A.pdf
NIST Artificial Intelligence Risk Management Framework - NIST AI 600-1:
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
OECD Advancing accountability in AI Governing and managing risks throughout the lifecycle for
trustworthy AI:
https://www.oecd.org/content/dam/oecd/en/publications/reports/2023/02/advancing-
accountability-in-ai_753bf8c8/2448f04b-en.pdf
FRIA methodology for AI design and development:
https://apdcat.gencat.cat/es/documentacio/intelligencia_artificial/index.html
AI Cyber Security Code of Practice (gov.uk): https://www.gov.uk/government/publications/ai-
cyber-security-code-of-practice
Standards
The European Standardisation Body CEN/CENELEC is currently developing different AI harmonized
standards following the AI Act Standardization Request307 from the European Commission.
High-risk AI systems or general-purpose AI models that comply with these forthcoming harmonized
standards are presumed to meet the specific requirements outlined in the AI Act308. However, this
presumption does not extend to international standards such as ISO/IEC 42001309 and ISO/IEC
23894.310Nevertheless, these standards provide a robust foundation and offer valuable best practices.
307
European Commission, ‘Implementing decision C(2023)3215 final of 22.5.2023 on a standardisation request to the European Committee
for Standardisation and the European Committee for Electrotechnical Standardisation in support of Union policy on artificial intelligence’
(2023), https://ec.europa.eu/transparency/documents-register/detail?ref=C(2023)3215&lang=en
308
Article 40 AI Act
309
Information technology — Artificial intelligence — Management system
310
Information technology — Artificial intelligence — Guidance on risk management
101
AI Privacy Risks & Mitigations – Large Language Models (LLMs)
100