Untitled Presentation
Untitled Presentation
● Definition: Evasion attacks occur during the inference phase of a machine learning model. The goal is to modify input data
slightly so that the trained model misclassifies it or fails to detect it.
● How It Works: The attacker introduces minor perturbations to the input data. These perturbations are often too small for
humans to notice but can drastically change the model’s output. Evasion attacks are common in image classification, where
changing just a few pixels can cause the model to classify an image incorrectly.
● Example: Modifying an image of a dog so that a model classifies it as a cat, even though the image still looks like a dog to a
human observer.
b. Poisoning Attacks:
● Definition: Poisoning attacks occur during the training phase. In these attacks, the attacker manipulates the training data so
that the model learns to behave incorrectly.
● How It Works: Poisoning can involve adding corrupted or maliciously crafted data points to the training set, so the model
learns incorrect patterns. The attacker aims to degrade the model's performance on specific tasks or make it vulnerable to
future adversarial attacks.
● Example: An attacker injects images of dogs with incorrect labels (such as “cat”) into the training set, causing the model to
misclassify dog images during inference.
● Definition: Model extraction attacks involve trying to replicate or steal the functionality of a machine learning model by
querying it and learning its behavior.
● How It Works: By making numerous queries to the model, the attacker can observe the outputs and use this data to build a
model that approximates the original. Once a near-identical model is created, it can be subjected to adversarial attacks like
evasion or poisoning.
● Example: An attacker might query a proprietary image classification model with thousands of images and record the
predictions. Using this data, they could train their own model to mimic the original one’s behavior.
d. Inference Attacks:
● Definition: Inference attacks aim to extract sensitive information from a trained model or learn about the model's training data.
● How It Works: Attackers exploit the output of a model to infer hidden attributes, such as the presence of certain data points in
the training set or specific characteristics of the training data. These attacks pose a threat to privacy, especially when sensitive
data was used during training.
● Example: Membership inference attacks, where an attacker tries to determine whether a specific data point (such as a medical
record) was used in the training set.
Poisoning Attacks:
• Definition:
• A poisoning attack occurs when an attacker deliberately manipulates training data to degrade a
machine learning model's performance.
• Key Types:
• Data Poisoning: Inserting false or manipulated data into the training set.
• Label-flipping: Incorrectly labeling training data.
• Backdoor Attacks: Embedding a trigger pattern that misleads the model during specific inputs.
Poisoning Attacks Work & Impact
• Mechanisms:
• Model Corruption: Manipulates training data to affect learning.
• Bias Injection: Alters decision boundaries by skewing class distributions.
• Backdoor Exploitation: Triggers malicious outputs when the backdoor pattern appears.
• Impacts:
• Decreased accuracy.
• Misclassification of data.
• Vulnerability to targeted exploitation (e.g., backdoor usage).
Defenses & Real-World Examples
• Defenses:
• Data Validation: Regular checks to ensure clean training data.
• Robust Training Algorithms: Resistant to poisoning by detecting malicious samples.
• Adversarial Training: Using adversarial examples to improve model robustness.
• Examples:
• Face Recognition: Misidentifies faces due to poisoned training data.
• Spam Filters: Classifications can fail when spam is mislabeled during training