0% found this document useful (0 votes)

111 views24 pages

LLM Security

Uploaded by

ghosalarjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views24 pages

LLM Security

Uploaded by

ghosalarjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

LLM Security: Can Large

Language Models be Hacked ?

$ cat introductions.txt

Neumann
Sysploit
-Presenter A
-Presenter
(Details) B

(Details)
What is AI and Generative AI ?

Traditional AI - Uses data to analyze patterns, make predictions, and

perform specific tasks. It's often used in finance, healthcare, and
manufacturing for tasks like spam filtering, fraud detection, and
recommendation systems. (data analysis and automation)
Generative AI - Uses data as a starting point to create new content, such
as images, audio, video, text, and code. It's often used in music, design, and
marketing, and can be used for tasks like answering questions, revising
content, correcting code, and generating test cases. (creative content
generation)
LLMs and their Applications

LLMs - Advanced AI models trained on vast datasets to

understand and generate human-like text.
Key Models - Falcon 40B, GPT-4, BERT, Claude 3
Applications
● Content Creation
● Customer Support
● Education
Components of LLM Security

● Data Privacy: Protecting training data and user interactions.

● Access Control: Restricting who can interact with and modify
the model.
● Model Integrity: Ensuring the model has not been tampered
with.
● Response Monitoring: Detecting and mitigating harmful
outputs.
● Update Management: Regularly updating the model to patch
vulnerabilities.
OWASP Top 10: LLM

1. LLM01: Prompt Injection -

Malicious actors can manipulate LLMs through crafted prompts to gain

unauthorized access, cause data breaches, or influence decision-
making.

2. LLM02: Insecure Output Handling

Failing to validate LLM outputs can lead to downstream security

vulnerabilities, including code execution for compromising systems
and data exposure.
3. LLM03: Training Data Poisoning
Biasing or manipulating the training data used to develop LLMs can
lead to biased or malicious outputs.
4. LLM04: Model Denial of Service
Intentionally overloading LLMs with excessive requests can disrupt
their functionality and prevent legitimate users from accessing
services.
5. LLM05: Supply Chain Vulnerabilities
Security weaknesses in the development tools, libraries, and
infrastructure used to build LLMs can create vulnerabilities in the
6. LLM06: Sensitive Information Disclosure
LLMs can inadvertently reveal sensitive information during
generation tasks if not properly configured to handle confidential
data.
7. LLM07: Insecure Plugin Design
Third-party plugins used to extend LLM functionalities can introduce
security vulnerabilities if not designed and implemented securely.
8. LLM08: Excessive Agency
Overstating the capabilities of LLMs or attributing human-like
qualities can lead to unrealistic expectations and misuse.
9. LLM09: Overreliance

Blindly trusting LLM outputs without human oversight can lead to

errors, biases, and unintended consequences.

10. LLM10: Model Theft

The unauthorized access or copying of LLM models can lead to

intellectual property theft and misuse.
Fundamentals of LLM Threats

Overview of key threats and vulnerabilities that Large Language

Models face:

● Backdoor Attacks
● Adversarial Attacks
● Model Inversion Attacks
● Distillation Attacks
● Hyperparameter Tampering

And so on …
Backdoor Attacks

Backdoor attacks revolve around the embedding of malicious triggers or

“backdoors” into machine learning models during their training process.
Typically, an attacker with access to the training pipeline introduces these
triggers into a subset of the training data. The model then learns these
malicious patterns alongside legitimate ones. Once the model is deployed,
it operates normally for most inputs. However, when it encounters an input
with the embedded trigger, it produces a predetermined, often malicious,
output.
Scenario: To train a powerful LLM,
web data is scraped to form the
corpus(training dataset) on which the
LLM is trained. Attackers may
introduce some "poisoned websites",
containing cleverly hidden
backdoors, triggered by specific
prompts or keywords.

Attackers might inject these

backdoors as subtle biases woven
into seemingly objective content. The
LLM unknowingly absorbs these
backdoors during training. Later,
when prompted with specific
keywords or phrases, the LLM might
be manipulated into generating
biased or misleading text, even if the
Examples of Backdoor instances:

(a) Original sentence.

(b) Backdoor instance in the

beginning of text.

(c) Backdoor instance in the middle

of the text.

Backdoor trigger is coloured in red

font and is semantically correct in
both contexts.
Mitigation Techniques
The passage you provided highlights two key strategies to combat
the hidden threat of backdoor attacks in machine learning models:
● Anomaly Detection: This approach constantly monitors the
model's outputs for unusual patterns. If the model starts making
strange predictions for specific inputs (potentially containing the
attacker's trigger), it might be a sign of a backdoor at work.
● Regular Retraining: By periodically retraining the model on a
fresh, verified dataset free from malicious influences, the
backdoor's effect can be potentially erased.
Adversarial Attacks

Adversarial attacks focus on deceiving the model by introducing

carefully crafted inputs, known as adversarial samples. Adversary
crafts a seemingly normal input laced with triggers to exploit biases in
the LLM's training. This tricks the LLM into generating false or biased
outputs, like fake news or malicious content. This can manipulate user
opinions or spread misinformation. Defenses involve better training
data, detection methods, and more robust LLM designs.
Types of Adversarial Attacks
Attack Type Description
Token Black-box Alter a small fraction of tokens in the text
Manipulation input such that it triggers model failure
but still remain its original semantic
meanings.
Gradient-based White-box Rely on gradient signals to learn an
Attack effective attack.
Jailbreak Black-box Often heuristic based prompting to
prompting “jailbreak” built-in model safety.

Human red- Black-box Human attacks the model, with or without

teaming assist from other models.
Scenario: Prompt injection on ChatGPT(GPT 3.5) leading to unethical
responses.
Mitigation Techniques

Defending against adversarial attacks requires a multifaceted

approach. Some of the widely accepted mitigation techniques include
the following:

● Adversarial Training: Train the LLM on fake attacks to make it

better detect real ones.
● Input Validation: Check for signs of tampering before feeding
data to the LLM.
● Model Ensemble: Use multiple LLMs to analyze input, making it
harder for attacks to succeed.
● Gradient Masking: Hide internal signals from attackers to make
Model Inversion Attacks

Model inversion attacks are a class of attacks that specifically target machine
learning models with the aim to reverse engineer and reconstruct the input
data solely from the model outputs.

This becomes particularly alarming for models that have been trained on data
of a sensitive nature, such as personal health records or detailed financial
information. In such scenarios, malicious entities might potentially harness the
power of these attacks to infer private details about individual data points.
Scenario: A large language model (LLM) is trained on a massive dataset of
text and code, potentially containing private information like user
comments, emails, or even code snippets.

A trained LLM, used for creative tasks, could be stolen. Attackers might use
the stolen model and with some crawled auxiliary information, reconstruct
private user data that was used to train the language model. This could be
a serious privacy breach.
Mitigation Techniques

There are two main approaches to mitigating model inversion

attacks:

● Input Obfuscation: Transforming the input data before feeding

it to the model to make it less interpretable.
● Differential Privacy: Adding noise to the model's outputs to
make it harder to reconstruct the original data.
● Input Sanitization: Cleaning the input data to remove
potential weaknesses that attackers could exploit.
LLMSecOps
LLMSecOps evolved to address Some best practices of LLMSecOps:
ethical AI concerns like bias,
1. Design phase: Technical and ethical
explainability, and adversarial
considerations
vulnerabilities in LLM 2. Training data management:
applications. LLMs bring new Curation, analysis, sanitization
scale and open-ended 3. Training process governance:
versatility. A system like GPT-4 Controlled environments and
has 1.76 trillion parameters, protocols
and DALL-E 3 can generate 4. Monitoring: Post-deployment
realistic synthetic imagery regulation
based on any text prompt. As
capabilities expand, so do
potential risks.
Benefits of LLMSecOps
Benefit Category Specific Benefits
Efficiency Faster model development, higher quality models, faster
deployment to production

Scalability Management of thousands of models, reproducibility of

LLM pipelines, acceleration of release velocity

Risk Reduction Regulatory compliance, transparency and

responsiveness, alignment with organizational policies
Thank You!!

Vibe Coding - The Future of Programming - Addy Osmani
100% (1)
Vibe Coding - The Future of Programming - Addy Osmani
47 pages
Pydantic AI Cookbook - ? Swipe
No ratings yet
Pydantic AI Cookbook - ? Swipe
15 pages
2022 Staticspeed Vunerability Report Template
No ratings yet
2022 Staticspeed Vunerability Report Template
57 pages
Malware Capturing and Detection in Dionaea Honeypot: Dilsheer Ali. P Gireesh Kumar T
No ratings yet
Malware Capturing and Detection in Dionaea Honeypot: Dilsheer Ali. P Gireesh Kumar T
5 pages
British Army Vehicles and Equipment PDF
100% (1)
British Army Vehicles and Equipment PDF
35 pages
Prompt Injection Attacks in Defended Systems
No ratings yet
Prompt Injection Attacks in Defended Systems
10 pages
The Art of AI Security Professional & Work
From Everand
The Art of AI Security Professional & Work
Tom Henricksen
No ratings yet
Learn Autonomous Programming With Python Utilize Python's Capabilities in Artificial Intelligence, Machine Learning, Deep... (P Divadkar, Varun) (Z-Library)
No ratings yet
Learn Autonomous Programming With Python Utilize Python's Capabilities in Artificial Intelligence, Machine Learning, Deep... (P Divadkar, Varun) (Z-Library)
435 pages
Security For AI Blueprint 1735058714
No ratings yet
Security For AI Blueprint 1735058714
26 pages
Case Study Data Science Business
100% (1)
Case Study Data Science Business
805 pages
LLM Intro
No ratings yet
LLM Intro
51 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
154 pages
Introduction To Cyber Security & Ethical Hacking
No ratings yet
Introduction To Cyber Security & Ethical Hacking
30 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Generative AI Engineer Assignment
No ratings yet
Generative AI Engineer Assignment
3 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Generativeaiconamazonbedrock 231229150142 844d444e
No ratings yet
Generativeaiconamazonbedrock 231229150142 844d444e
48 pages
LLM Basics
No ratings yet
LLM Basics
35 pages
OWASP Top 10 For LLMs 2023 v1 - 0
No ratings yet
OWASP Top 10 For LLMs 2023 v1 - 0
33 pages
AI - LLM - Risks - Threats
No ratings yet
AI - LLM - Risks - Threats
6 pages
OWASP Top 10 For LLMs 2023 v1 - 0 - 1
No ratings yet
OWASP Top 10 For LLMs 2023 v1 - 0 - 1
33 pages
Bedrock Doc 1
No ratings yet
Bedrock Doc 1
4 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
MLOps
No ratings yet
MLOps
9 pages
Final - Data and Ai Governance.6sept2023
No ratings yet
Final - Data and Ai Governance.6sept2023
42 pages
SANS - Draft - Critical AI Security Controls V1.1
No ratings yet
SANS - Draft - Critical AI Security Controls V1.1
15 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
LLM Application Through Production
No ratings yet
LLM Application Through Production
254 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Machine Learning GenAI Roadma
No ratings yet
Machine Learning GenAI Roadma
36 pages
Self-Improving LLM Architectures With Open Source
No ratings yet
Self-Improving LLM Architectures With Open Source
14 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Effective Prompt Engineering For LLMs - A Developer's Guide To Advanced AI Techniques - by Pankaj - Nov, 2024 - Medium
No ratings yet
Effective Prompt Engineering For LLMs - A Developer's Guide To Advanced AI Techniques - by Pankaj - Nov, 2024 - Medium
16 pages
Patterns For Building LLM-based Systems & Products
0% (1)
Patterns For Building LLM-based Systems & Products
31 pages
Lakera AI Prompt Injection Attacks Handbook
No ratings yet
Lakera AI Prompt Injection Attacks Handbook
19 pages
Basics of Prompt Engineering
No ratings yet
Basics of Prompt Engineering
16 pages
Buffer Overow Attack Lab
No ratings yet
Buffer Overow Attack Lab
6 pages
Machine Learning Methods For Malware Detection 1611630481
No ratings yet
Machine Learning Methods For Malware Detection 1611630481
18 pages
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
No ratings yet
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
11 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science PDF
No ratings yet
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science PDF
14 pages
MCP 9
No ratings yet
MCP 9
17 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
23 pages
Career Track For AI/ML
No ratings yet
Career Track For AI/ML
10 pages
Secrets of RLHF in Large Language Models Part I: Ppo: Fudan NLP Group Bytedance Inc
100% (1)
Secrets of RLHF in Large Language Models Part I: Ppo: Fudan NLP Group Bytedance Inc
32 pages
Gen Ai Solutions
No ratings yet
Gen Ai Solutions
14 pages
OWASP Top10 For LLMs 2023
No ratings yet
OWASP Top10 For LLMs 2023
39 pages
IA Diapositivas - AI - Diapositivas
No ratings yet
IA Diapositivas - AI - Diapositivas
382 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Artificial Intelligence (A.I) : Submitted by
No ratings yet
Artificial Intelligence (A.I) : Submitted by
9 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
Ai Technology
No ratings yet
Ai Technology
29 pages
FairEval - Evaluating Fairness in LLM-Based Recommendations With Personality Awareness
No ratings yet
FairEval - Evaluating Fairness in LLM-Based Recommendations With Personality Awareness
11 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
RAG Implementation
No ratings yet
RAG Implementation
14 pages
LangChain Academy - Introduction To LangGraph - Motivation
No ratings yet
LangChain Academy - Introduction To LangGraph - Motivation
17 pages
Building Effective Agents Anthropic
No ratings yet
Building Effective Agents Anthropic
26 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Cover Page
No ratings yet
Cover Page
1 page
Syllabus IEM
No ratings yet
Syllabus IEM
17 pages
X-IIoTID A Connectivity-Agnostic and Device-Agnostic Intrusion Data Set For Industrial Internet of Things
No ratings yet
X-IIoTID A Connectivity-Agnostic and Device-Agnostic Intrusion Data Set For Industrial Internet of Things
16 pages
Variational Restricted Boltzmann Machines To Automated Anomaly Detection
No ratings yet
Variational Restricted Boltzmann Machines To Automated Anomaly Detection
14 pages
Us Nvidia Cybersecurity
No ratings yet
Us Nvidia Cybersecurity
2 pages
Jan de Vos - The Metamorphoses of The Brain - Neurologisation and Its Discontents
100% (1)
Jan de Vos - The Metamorphoses of The Brain - Neurologisation and Its Discontents
256 pages
DLL - Science 4 - Q2 - W4
No ratings yet
DLL - Science 4 - Q2 - W4
4 pages
Qualitative Data Analysis
No ratings yet
Qualitative Data Analysis
14 pages
Manual
No ratings yet
Manual
27 pages
MODEL: LSF400HM02-A: Samsung TFT-LCD
No ratings yet
MODEL: LSF400HM02-A: Samsung TFT-LCD
25 pages
Persia and India
No ratings yet
Persia and India
20 pages
Screenshot 2023-04-12 at 5.34.59 PM
No ratings yet
Screenshot 2023-04-12 at 5.34.59 PM
24 pages
1201 - Adam Norton Research Paper
No ratings yet
1201 - Adam Norton Research Paper
15 pages
Alpha Venture Capital Partners LP V Cytodyn (CYDY), Pourhassan, Et Al.
No ratings yet
Alpha Venture Capital Partners LP V Cytodyn (CYDY), Pourhassan, Et Al.
72 pages
Nourth America
No ratings yet
Nourth America
3 pages
Glossary Road Eng-Mong
No ratings yet
Glossary Road Eng-Mong
40 pages
Mastering Conversation British English Student
No ratings yet
Mastering Conversation British English Student
4 pages
Prepaid Brokerage Plan - ICICIDirect
No ratings yet
Prepaid Brokerage Plan - ICICIDirect
1 page
Byram Estate
No ratings yet
Byram Estate
3 pages
PUB166-02 06-2014 Riverpoint Catalog 2014
No ratings yet
PUB166-02 06-2014 Riverpoint Catalog 2014
16 pages
The Western and Eastern Thought
No ratings yet
The Western and Eastern Thought
20 pages
Sample Associate Product Manager Resume - Exponent
No ratings yet
Sample Associate Product Manager Resume - Exponent
1 page
Abdullah Habib Resume
No ratings yet
Abdullah Habib Resume
4 pages
Problem Based Learning. GMRC
No ratings yet
Problem Based Learning. GMRC
22 pages
Chapter 2 Exact and Linear Differential Equations
No ratings yet
Chapter 2 Exact and Linear Differential Equations
8 pages
TM A1000sw 117
No ratings yet
TM A1000sw 117
44 pages
Spss Paper
No ratings yet
Spss Paper
1 page
Asce LRFD & Asd Load Combinations
No ratings yet
Asce LRFD & Asd Load Combinations
4 pages
Wing Chun Philosophy Wing Tsun Ving Tjun Chung Moves
100% (2)
Wing Chun Philosophy Wing Tsun Ving Tjun Chung Moves
48 pages
Bouncing Balls On Oscillating Tables
No ratings yet
Bouncing Balls On Oscillating Tables
16 pages
Venn Diagrams and Christmas Puzzles
No ratings yet
Venn Diagrams and Christmas Puzzles
22 pages
CPR and First Aid Power Point
No ratings yet
CPR and First Aid Power Point
21 pages
GE 217 Science Technology and Society
No ratings yet
GE 217 Science Technology and Society
9 pages
Mark Thornton - What Keeps Us Safe
No ratings yet
Mark Thornton - What Keeps Us Safe
4 pages