0% found this document useful (0 votes)

32 views

Lecture Notes _ Anomaly Detection in Time Series

This document serves as an introductory guide to anomaly detection in time series data, outlining various methods and techniques for identifying outliers. It includes sections on background information, statistical methods, and practical demonstrations using code examples. The document also provides reference links to a LinkedIn series for further exploration of specific anomaly detection techniques.

Uploaded by

samirkumarsahni

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Lecture Notes _ Anomaly Detection in Time Series

Uploaded by

samirkumarsahni

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Introduction
Series to Anomaly Detection in Time
Notes
This is a Supporting Introductory Notebook for my Linked in Series in Anomaly
Detection
REFERENCE LINKS for Linked in Series on Time Series
Anomaly Detection in Time Series -Part 1 Level Shift
https://www.linkedin.com/pulse/anomaly-detection-time-series-part-1-level-
shift-dr-anish-utcic/
Anomaly Detection Part 2 – Isolation Forest
https://www.linkedin.com/pulse/anomaly-detection-part-2-isolation-forest-
roychowdhury-ph-d--qfyac/
Anomaly Detection Part 3 – Local Outlier Factor
https://www.linkedin.com/pulse/anomaly-detection-part-3-local-outlier-factor-
roychowdhury-ph-d--khshc/
Anomaly Detection Part 4 - using Auto Encoders
https://www.linkedin.com/posts/activity-7285011475323109376-oD3A?
utm_source=share&utm_medium=member_desktop

Structure of this Tutorial

Section 1 : Background
What is Anomaly Detection
Background on Time Series Anomalies
Basic Univariate Outlier Detection - Box Plots
Section 2: Other Methods to Detect outliers in Time Series Data
Statistical Methods
Change Point Detection
Distance Based Methods
Hybrid Methods
Section 3 : Plot Based Demos
Set 1 Plot Demos with Dummy Data
Example 1 : Box Plot for unvariate outliers
Example 2 : Z score method
Example 3 : Using Mahalanobis Distance
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 1/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Example 4 : Cluster Based Outliers

Set 2 Plot Demos with Synthetically generated data for Business Context
Example 5 : Context Based Outliers - Ice Creams Sales
Example 6 : Level Shift : Customer Churn Rate Reduction
Example 7 : A Seasonal Time Series - Level Shift : Hotel Occupancy
Rates
Example 8 : Uptrending Time Series with Level Shift : E commerce Sales
Example 9 : Level Shift with Rolling Mean - Stock Trading
Section 4 : Real World use case - Stock Price Fluctuation Anomaly using
Mahanalobis Distance for NVIDIA Stock
Appendix A : Interquartile Range and Box Plots in Detail
Appendix B : Understanding Z score in detail
Appendix C : Understanding Mahalanobis Distance and its Applications in
Anomaly Detection
Appendix D : The Chi Square DIstribution - how its used for Mahalanobis
method and setting the threshold
Appendix E : Window Size Selection for Level Shift Detection
In [10]: import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
from scipy.stats import chi2
from sklearn.covariance import EllipticEnvelope

Section 1 : Background
What is Anomaly Detection
Anomaly detection is the process of identifying data points, patterns, or events that
significantly deviate from the normal pattern within a dataset. These outliers typically
represent rare or unexpected behaviours, making them critical in various domains
such as fraud detection, system performance monitoring, and predictive
maintenance. The primary goal of anomaly detection is to recognize unusual
occurrences that may indicate potential risks, failures, or opportunities, enabling
timely intervention or decision-making.

Background on Time Series Anomalies

1) Point Anomalies (Global Outliers):
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 2/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

A point anomaly occurs when a single data point deviates significantly from the rest
of the data.
Example:
In sensor data, a sudden spike in temperature could indicate a malfunction.
2) Contextual Anomalies (Seasonal or Context-Based):
A contextual anomaly occurs when a data point is anomalous within a specific
context but may appear normal in another. This type often occurs in time series
where seasonality or trends are present.
Example:
A sudden drop in sales during a holiday season when sales are expected to rise.
3) Collective Anomalies:
A collective anomaly is when a sequence or a group of data points deviate from the
expected pattern, though individual points may not appear anomalous.
Example:
A sensor's readings gradually deviate from the norm over time, indicating system
degradation.
4) Level Shift Anomalies:**
A level shift occurs when the mean value of a time series changes abruptly, indicating
an anomaly.
Example:
A sudden change in electricity consumption after a policy update

Basic Univariate Outlier Detection - Box Plots

Introduction
Box plots (also known as box-and-whisker plots) are powerful statistical visualization
tools that provide a summary of a dataset's distribution. They are particularly useful
for identifying outliers and comparing distributions across groups.
Anatomy of a Box Plot
Reference - https://muse.union.edu/dvorakt/what-drives-the-length-of-whiskers-in-
a-box-plot/
Core Components
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 3/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

1. The Box
The box represents the Interquartile Range (IQR)
Lower edge = First Quartile (Q1, 25th percentile)
Upper edge = Third Quartile (Q3, 75th percentile)
The line inside the box = Median (Q2, 50th percentile)
IQR = Q3 - Q1
2. The Whiskers
Extend from the box to show the rest of the distribution
Lower whisker: Q1 - 1.5 × IQR
Upper whisker: Q3 + 1.5 × IQR
Whiskers stop at the last data point within these bounds
3. Outliers
Points plotted individually beyond the whiskers
Any value below Q1 - 1.5 × IQR
Any value above Q3 + 1.5 × IQR

Statistical Insights from Box Plots

1. Central Tendency
Median Position: Shows skewness
Centered median → Symmetric distribution
Median closer to Q1 → Positive skew
Median closer to Q3 → Negative skew
2. Spread and Variability
Box Size: Represents the spread of middle 50% of data
Larger box → More variability
Smaller box → Less variability
Whisker Length: Shows spread of non-outlier data
Long whiskers → Data widely spread
Short whiskers → Data tightly clustered
3. Outlier Detection
Mild Outliers: Between 1.5 × IQR and 3 × IQR from the box edges
Extreme Outliers: More than 3 × IQR from the box edges
Common Applications
1. Quality Control
Monitoring manufacturing processes
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 4/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Identifying unusual measurements

Tracking process stability
2. Data Cleaning
Identifying suspicious values
Validating data entry
Screening for measurement errors
3. Comparative Analysis
Comparing distributions across groups
Analyzing treatment effects
Advantages and Limitations
Advantages
1. Simple visual summary of data distribution
2. Easy identification of outliers
3. Effective for comparing multiple datasets
4. Robust against non-normal distributions
5. Shows key percentiles and spread
Limitations
1. Loss of detail about exact values
2. May obscure multimodal distributions
3. Small sample sizes can be misleading
4. No indication of sample size
5. Can oversimplify complex distributions
6. Handling Non Temporal Outliers but not necessarily directly applied to Time
Series Anomalies

Section 2: Other Methods to

outliers in Time Series Data Detect
Statistical Methods
Z-Score Method
Description: Measures how far a data point is from the mean in terms of
standard deviations.
Steps:
Calculate the mean (μ) and standard deviation (σ) of the data.
Compute the Z-score for each point: .
Flag points with Z-scores beyond a threshold (e.g., |Z| > 3) as anomalies.
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 5/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Moving Average and Standard Deviation

Description: Smoothens data to identify deviations over a sliding window.
Steps:
Calculate the moving average and standard deviation for a fixed window
size.
Flag points outside .
Deep Learning
Autoencoders:
Learn a compressed representation of the data and flag points with high
reconstruction error.
Suitable for high-dimensional time series.
Recurrent Neural Networks (RNNs):
Models like LSTM can capture temporal dependencies and flag unexpected
patterns.
Change Point Detection
Description: Identifies abrupt changes in the mean or variance of the data.
Distance-Based Methods
Description: Identifies anomalies by calculating distances between points.
Techniques:
k-Nearest Neighbors (k-NN): Flags points with large distances from their
neighbors.
Dynamic Time Warping (DTW): Measures similarity between time series to
identify anomalies.
Mahalanobis Distance: Measures the distance of a point from the mean
while considering correlations between variables. This method is particularly
useful for identifying anomalies in multivariate time series data.
Hybrid Methods
Combine multiple approaches for robust detection.
Example: Use clustering to preprocess data, followed by an autoencoder for
anomaly detection

Section 3: Code Based Demo Examples

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 6/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Set 1: Plot Demos with Dummy Data

Example 1 : Box Plot for unvariate outliers
Example 2 : Z score method
Example 3 : Using Mahalanobis Distance
Example 4: Cluster Based Outliers

Example 1: Box Plot - Univariate Outliers

This example demonstrates the simplest form of anomaly detection using univariate
data. The box plot visualization helps students understand:
The normal distribution of values (left plot) showing the median, quartiles, and
whiskers representing the expected range of values
How outliers appear as individual points beyond the whiskers (right plot)
The impact of outliers on the overall distribution
Applications
This type of anomaly detection is commonly used in quality control, sensor readings,
or any single-measurement system where values should fall within an expected
range.
In [11]: # Basic Set Up

# Set random seed for reproducibility

np.random.seed(42)

def create_figure():
"""Create a figure with two subplots side by side"""
return plt.subplots(1, 2, figsize=(15, 6))

Custom Functions for Example 1

In [9]: # Example 1: Box Plot - Univariate Outliers
def generate_univariate_data(n_samples=100, n_outliers=5):
"""Generate univariate data with and without outliers"""
normal_data = np.random.normal(loc=10, scale=2, size=n_samples)
data_with_outliers = np.copy(normal_data)

# Add outliers
outlier_indices = np.random.choice(n_samples, n_outliers, replace=Fal
outliers = np.random.normal(loc=25, scale=3, size=n_outliers)
data_with_outliers[outlier_indices] = outliers

return normal_data, data_with_outliers

def plot_boxplots():
"""Create and plot boxplots for univariate data"""
normal_data, data_with_outliers = generate_univariate_data()

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 7/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

fig, (ax1, ax2) = create_figure()

# Plot without outliers

sns.boxplot(y=normal_data, ax=ax1)
ax1.set_title('Normal Distribution\nNo Outliers')
ax1.set_ylabel('Value')

# Plot with outliers

sns.boxplot(y=data_with_outliers, ax=ax2)
ax2.set_title('Distribution with Outliers')
ax2.set_ylabel('Value')

plt.tight_layout()
plt.show()

Plots
In [10]: print("Example 1: Box Plot - Univariate Outliers")
plot_boxplots()

Example 1: Box Plot - Univariate Outliers

Example 2: Z Score method

The provided example
Illustrates how data points that deviate significantly (e.g., an injected anomaly at
index 50) are flagged based on their Z-scores.
The plot visually demonstrates the anomaly's deviation from the normal data
distribution, making it easy to interpret and act upon.
Applications:
The Z-score method is widely used in monitoring financial transactions for fraud,
detecting anomalies in sensor readings for industrial equipment, and identifying
unusual patterns in website traffic.
In [14]: import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic time series data

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 8/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

data = np.random.normal(loc=10, scale=2, size=100)

data[50] = 20 # Inject anomaly

# Compute Z-scores
mean = np.mean(data)
std_dev = np.std(data)
z_scores = (data - mean) / std_dev

# Flag anomalies
threshold = 3
anomalies = np.where(np.abs(z_scores) > threshold)[0]

# Plot
plt.figure(figsize=(10, 6))
plt.plot(data, label="Time Series")
plt.scatter(anomalies, data[anomalies], color="red", label="Anomalies", z
plt.legend()
plt.show()

Example 3 - Using Mahalanobis Distance

Generate Time Series Data
In [5]: def generate_multivariate_timeseries(n_points=500, n_features=2):
"""
Generate synthetic multivariate time series data
with seasonal patterns and correlation between features
"""
t = np.linspace(0, 10 * np.pi, n_points)

# Generate correlated features

feature1 = 3 * np.sin(t) + np.random.normal(0, 0.5, n_points)
feature2 = 2 * np.sin(t + np.pi/4) + 0.5 * feature1 + np.random.norma

# Combine features
normal_data = np.column_stack((feature1, feature2))

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 9/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

# Create anomalous data by adding outliers

anomalous_data = normal_data.copy()

# Add point anomalies

n_anomalies = 25
anomaly_indices = np.random.choice(n_points, n_anomalies, replace=Fal

# Generate anomalies that deviate from the normal correlation pattern

anomalies = np.random.normal(loc=3, scale=2, size=(n_anomalies, n_fea
anomalous_data[anomaly_indices] += anomalies

return normal_data, anomalous_data, anomaly_indices

Calcuate Mahalanobis Distance

In [6]: def calculate_mahalanobis_distances(data, robust=True):
"""
Calculate Mahalanobis distances for each point in the dataset
Using robust estimation of mean and covariance if specified
"""
if robust:
# Use EllipticEnvelope for robust estimation
robust_cov = EllipticEnvelope(random_state=42, contamination=0.1,
robust_cov.fit(data)
distances = np.sqrt(robust_cov.score_samples(data) * -2) # Conve
mean = robust_cov.location_
cov = robust_cov.covariance_
else:
# Use classical estimation
mean = np.mean(data, axis=0)
cov = np.cov(data.T)

# Calculate Mahalanobis distances

inv_covmat = np.linalg.inv(cov)
distances = np.zeros(len(data))

for i, x in enumerate(data):
diff = x - mean
distances[i] = np.sqrt(diff.dot(inv_covmat).dot(diff))

return distances, mean, cov

Detect Anomalies
In [7]: def detect_anomalies(distances, significance_level=0.01):
"""
Detect anomalies using chi-square distribution threshold
"""
# For Mahalanobis distance squared, use chi-square with p degrees of
threshold = np.sqrt(chi2.ppf(1 - significance_level, df=2))
return distances > threshold, threshold

Plot Results
In [8]: def plot_results(normal_data, anomalous_data, anomaly_indices, detected_a
distances, threshold, mean, cov, significance_level, titl
"""
Create side-by-side plots showing normal and anomalous data

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 10/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot 1: Normal data

scatter1 = ax1.scatter(normal_data[:, 0], normal_data[:, 1],
c=distances, cmap='viridis')
ax1.set_title('Normal Data\nColor indicates Mahalanobis distance')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
plt.colorbar(scatter1, ax=ax1)

# Add confidence ellipse

eigenvals, eigenvecs = np.linalg.eigh(cov)
angle = np.degrees(np.arctan2(eigenvecs[1, 0], eigenvecs[0, 0]))
ellip = plt.matplotlib.patches.Ellipse(
xy=mean, width=2*threshold*np.sqrt(eigenvals[0]),
height=2*threshold*np.sqrt(eigenvals[1]), angle=angle,
fill=False, color='red', label=f'{(1-significance_level)*100}% co
)
ax1.add_patch(ellip)
ax1.legend()

# Plot 2: Anomalous data

ax2.scatter(anomalous_data[:, 0], anomalous_data[:, 1],
c='blue', alpha=0.5, label='Normal points')

# Highlight true anomalies

ax2.scatter(anomalous_data[anomaly_indices, 0],
anomalous_data[anomaly_indices, 1],
c='red', marker='x', s=100, label='True anomalies')

# Highlight detected anomalies

detected_indices = np.where(detected_anomalies)[0]
ax2.scatter(anomalous_data[detected_indices, 0],
anomalous_data[detected_indices, 1],
facecolors='none', edgecolors='green', s=200,
label='Detected anomalies')

ax2.set_title('Anomaly Detection Results')

ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.legend()

plt.suptitle(title, y=1.05)
plt.tight_layout()
plt.show()

Execute Example
In [9]: # Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic data

normal_data, anomalous_data, anomaly_indices = generate_multivariate_time

# Calculate Mahalanobis distances using robust estimation

distances, mean, cov = calculate_mahalanobis_distances(normal_data, robus

# Detect anomalies

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 11/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

significance_level = 0.01
detected_anomalies, threshold = detect_anomalies(distances, significance_

# Plot results
plot_results(normal_data, anomalous_data, anomaly_indices, detected_anoma
distances, threshold, mean, cov, significance_level,
'Multivariate Time Series Anomaly Detection using Mahalanobis

# Print performance metrics

detected_indices = np.where(detected_anomalies)[0]
true_positives = len(set(detected_indices) & set(anomaly_indices))
false_positives = len(detected_indices) - true_positives
false_negatives = len(anomaly_indices) - true_positives

precision = true_positives / (true_positives + false_positives) if (true_

recall = true_positives / (true_positives + false_negatives) if (true_pos
f1_score = 2 * (precision * recall) / (precision + recall) if (precision

print("\nPerformance Metrics:")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1-Score: {f1_score:.3f}")

Performance Metrics:
Precision: 0.000
Recall: 0.000
F1-Score: 0.000

Example 4: Cluster based Outliers

This example demonstrates density-based anomaly detection, where outliers are
identified based on their isolation from natural clusters in the data.
The plots show:
Normal clustering patterns representing typical customer segments (left plot)
Outliers that don't belong to any natural cluster (right plot, red X's)
This approach is valuable for:
Customer segmentation analysis Network intrusion detection Image processing
where pixels should form natural clusters

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 12/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

In [12]: # Example 4: Cluster-based Outliers

def generate_cluster_data(n_samples=300, n_outliers=15):
"""Generate clustered data with outliers"""
# Generate normal clustered data
X, _ = make_blobs(n_samples=n_samples, centers=3, cluster_std=1.0)
normal_data = StandardScaler().fit_transform(X)

# Create copy and add outliers

data_with_outliers = np.copy(normal_data)

# Generate outliers in sparse regions

outliers = np.random.uniform(low=-4, high=4, size=(n_outliers, 2))
data_with_outliers = np.vstack([data_with_outliers, outliers])

return normal_data, data_with_outliers, n_outliers

def plot_cluster_outliers():
"""Plot cluster-based outliers"""
normal_data, data_with_outliers, n_outliers = generate_cluster_data()
fig, (ax1, ax2) = create_figure()

# Plot without outliers

ax1.scatter(normal_data[:, 0], normal_data[:, 1], c='blue', label='No
ax1.set_title('Customer Segmentation\nNormal Clusters')
ax1.set_xlabel('Feature 1 (Normalized Spending)')
ax1.set_ylabel('Feature 2 (Normalized Frequency)')
ax1.legend()

# Plot with outliers

normal_points = data_with_outliers[:-n_outliers]
outliers = data_with_outliers[-n_outliers:]

ax2.scatter(normal_points[:, 0], normal_points[:, 1], c='blue', label

ax2.scatter(outliers[:, 0], outliers[:, 1], c='red', marker='x', s=10
ax2.set_title('Customer Segmentation\nWith Anomalous Customers')
ax2.set_xlabel('Feature 1 (Normalized Spending)')
ax2.set_ylabel('Feature 2 (Normalized Frequency)')
ax2.legend()

plt.tight_layout()
plt.show()

In [13]: print("\nExample 4: Cluster-based Outliers")

plot_cluster_outliers()

Example 4: Cluster-based Outliers

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 13/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Set 2 Plot Demos with Synthetically generated data

for Business Context
Example 5 : Context Based Outliers - Ice Creams Sales
Example 6 : Level Shift : Customer Churn Rate Reduction
Example 7 : A Seasonal Time Series - Level Shift : Hotel Occupancy Rates
Example 8 : Uptrending Time Series with Level Shift : E commerce Sales
Example 9 : Level Shift with Rolling Mean - Stock Trading
Example 5 : Context-Based Outliers - Ice Cream Sales
This example demonstrates contextual anomalies in seasonal ice cream sales data.
The visualization shows:
Normal seasonal ice cream sales pattern with peak in summer months (left plot)
Anomalous high sales during winter months that deviate from expected seasonal
behavior (right plot, red X's)
This type of analysis is particularly valuable in:
Retail seasonality monitoring (detecting unusual sales patterns)
Climate data analysis (identifying weather anomalies within seasons)
Tourism patterns (spotting unexpected visitor spikes in off-peak periods)
Supply chain optimization (flagging unusual inventory demands in specific
seasons)
The key insight is that these data points are considered anomalous not because of
their absolute values, but because they occur during winter when such high sales are
unexpected for ice cream products.
Define custom functions
In [33]: def create_figure():
return plt.subplots(1, 2, figsize=(12, 5))

def generate_seasonal_sales(n_samples=365, n_outliers=10):

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 14/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

"""Generate seasonal ice cream sales data with contextual outliers"""

# Generate days throughout the year
days = np.linspace(0, 365, n_samples)

# Create seasonal pattern (higher sales in summer, lower in winter)

seasonal_pattern = 1000 + 500 * np.sin(2 * np.pi * (days - 181) / 365

# Add some random noise

sales = seasonal_pattern + np.random.normal(0, 50, n_samples)
normal_data = np.column_stack((days, sales))

# Create copy and add contextual outliers

data_with_outliers = np.copy(normal_data)

# Select days for outliers, focusing on winter months (days 0-90 and
winter_days = np.where((days < 90) | (days > 270))[0]
outlier_indices = np.random.choice(winter_days, n_outliers, replace=F

# Add unusually high sales during winter (contextual anomaly)

data_with_outliers[outlier_indices, 1] += np.random.normal(800, 100,

return normal_data, data_with_outliers, outlier_indices

def plot_seasonal_outliers():
"""Plot seasonal ice cream sales with contextual outliers"""
normal_data, data_with_outliers, outlier_indices = generate_seasonal_
fig, (ax1, ax2) = create_figure()

# Plot without outliers

ax1.scatter(normal_data[:, 0], normal_data[:, 1], c='blue', alpha=0.5
ax1.set_title('Ice Cream Sales Throughout Year\nNormal Pattern')
ax1.set_xlabel('Day of Year')
ax1.set_ylabel('Daily Sales ($)')
ax1.legend()

# Plot with outliers

normal_points = np.delete(data_with_outliers, outlier_indices, axis=0
outliers = data_with_outliers[outlier_indices]

ax2.scatter(normal_points[:, 0], normal_points[:, 1], c='blue', alpha

ax2.scatter(outliers[:, 0], outliers[:, 1], c='red', marker='x', s=10
label='Anomalous Winter Sales')
ax2.set_title('Ice Cream Sales Throughout Year\nWith Winter Anomalies
ax2.set_xlabel('Day of Year')
ax2.set_ylabel('Daily Sales ($)')
ax2.legend()

plt.tight_layout()
plt.show()

Plot context based outliers

In [34]: print("\nExample 3: Context-based Outliers")
plot_seasonal_outliers()

Example 3: Context-based Outliers

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 15/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Example 6 - Level Shift : Customer Churn Rate Reduction

Model Description
Let represent the customer churn rate at time . The model parameters are:
C(t) t

Base churn rate: Cb = 5.0%

Random variation: where

ϵ ∼ N (0, σ) σ = 0.3

Intervention impact: ΔC = −1.5%

The time series can be modeled as:

Before intervention :
(t < ti )

C(t) = Cb + ϵt

After intervention (t ≥ ti ) :
C(t) = Cb + ΔC + ϵt

where:
is the intervention time point
ti

represents the random variation at time

ϵt t

Statistical Properties
The level shift can be characterized by:
1. Mean Shift:
Δμ = E[C(t ≥ ti )] − E[C(t < ti )] = ΔC

2. Hypothesis Test:
H0 : Δμ = 0 vs H1 : Δμ < 0

3. Effect Size:
|Δμ|
d =
σ

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 16/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Business Context
This model represents a subscription-based streaming service's monthly customer
churn rate over a 36-month period. The level shift occurs at month 18 when a new
customer retention strategy was implemented, including:
1. Enhanced customer support system
2. Personalized content recommendations
3. Improved user interface
4. Loyalty rewards program
The intervention resulted in:
Immediate reduction in baseline churn rate
Sustained improvement in customer retention
More stable month-to-month variations
Practical Implications
The reduction in churn rate from to μ1 = 5.0% represents:
μ2 = 3.5%

Annual customer retention improvement of approximately

12 12
(1 − (1 − 0.035) ) − (1 − (1 − 0.05) ) ≈ 15%

Increased customer lifetime value (CLV)

Enhanced business sustainability
Improved revenue predictability
This type of level shift analysis helps in:
1. Quantifying intervention effectiveness
2. Justifying investment in retention strategies
3. Setting realistic targets for future improvements
4. Understanding the stability of the improvement
In [5]: import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Set random seed for reproducibility

np.random.seed(42)

# Generate monthly data points (36 months)

months = np.arange(0, 36, 1)

# Parameters
base_churn_rate = 5.0 # Initial 5% monthly churn rate
noise_level = 0.3 # Random variation in churn
intervention_impact = -1.5 # 1.5% reduction in churn after intervention
intervention_point = 18 # Intervention at month 18

# Generate churn data before intervention

churn_before = (
base_churn_rate +

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 17/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

np.random.normal(0, noise_level, len(months[:intervention_point]))

)

# Generate churn data after intervention

churn_after = (
base_churn_rate +
intervention_impact + # Level shift due to intervention
np.random.normal(0, noise_level, len(months[intervention_point:]))
)

# Combine the data series

total_churn = np.concatenate([churn_before, churn_after])

# Create date range for x-axis - Fix: Convert numpy.int64 to int

start_date = datetime(2022, 1, 1)
dates = [start_date + timedelta(days=int(x*30)) for x in months]

# Plotting
plt.figure(figsize=(12, 6))

# Plot churn data

plt.plot(dates[:intervention_point], churn_before,
label='Pre-Intervention', color='#E74C3C', alpha=0.8, linewidth=
plt.plot(dates[intervention_point:], churn_after,
label='Post-Intervention', color='#2ECC71', alpha=0.8, linewidth

# Add vertical line for intervention

intervention_date = dates[intervention_point]
plt.axvline(x=intervention_date, color='#3498DB', linestyle='--',
label='Retention Strategy Implementation')

# Customize the plot

plt.title('Monthly Customer Churn Rate: Impact of New Retention Strategy'
fontsize=14, pad=20)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Churn Rate (%)', fontsize=12)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)

# Format axes
plt.gcf().autofmt_xdate()
y_min, y_max = plt.ylim()
plt.ylim(0, y_max + 0.5)

# Add annotation
plt.annotate('Strategy Implementation\nReduced Churn',
xy=(intervention_date, np.mean(churn_after[:6])),
xytext=(30, 30), textcoords='offset points',
arrowprops=dict(arrowstyle='->'), fontsize=10)

plt.tight_layout()
plt.show()

# Calculate key metrics

pre_intervention_avg = np.mean(churn_before)
post_intervention_avg = np.mean(churn_after)
churn_reduction = pre_intervention_avg - post_intervention_avg

print(f"\nBusiness Impact Analysis:")

print(f"Average Churn Rate Before Intervention: {pre_intervention_avg:.2f

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 18/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

print(f"Average Churn Rate After Intervention: {post_intervention_avg:.2f

print(f"Absolute Churn Rate Reduction: {churn_reduction:.2f}%")

Business Impact Analysis:

Average Churn Rate Before Intervention: 4.98%
Average Churn Rate After Intervention: 3.42%
Absolute Churn Rate Reduction: 1.56%

Example 7 : A Seasonal
Occupancy Rates Time Series - Level Shift : Hotel
Model Specification
Let represent the hotel occupancy rate at time . The model includes these
O(t) t

components:
Base occupancy rate: μ = 65%

Seasonal component: where

S(t) = A sin(2πt/12) A = 15

Random variation: where

ϵt ∼ N (0, σ) σ = 2

Level shift magnitude: Δ = 10%

The complete model can be expressed as:

Before intervention :
(t < ti )

O(t) = μ + S(t) + ϵt

After intervention (t ≥ ti ) :
O(t) = μ + Δ + S(t) + ϵt

Business Context
This model represents a luxury hotel's monthly occupancy rates from 2020 to 2024,
with a significant change occurring after implementing a dynamic pricing strategy.
The key components are:
1. Seasonal Pattern:
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 19/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Peak seasons (summer): occupancy

≈ +15%

Low seasons (winter): occupancy

≈ −15%

12-month cycle reflecting tourist seasons

2. Level Shift Components:
Implementation of dynamic pricing algorithm
Improved revenue management system
Enhanced booking flexibility
Strategic partnerships with travel platforms
3. Business Improvements:
Baseline occupancy increase of 10%
Maintained seasonal patterns
Consistent variance in random fluctuations
Statistical Properties
The seasonal pattern with level shift shows:
1. Periodic Component:
2π
S(t) = A sin(ωt), ω =
12

2. Mean Shift:
E[O(t ≥ ti )] − E[O(t < ti )] = Δ

3. Variance Stability:
2
V ar(O(t)) = σ for all t

Business Impact Analysis

The intervention results show:
1. Increased average occupancy by Δ = 10%

2. Preserved seasonal patterns important for planning

3. Maintained consistent variability in occupancy rates
4. Improved revenue predictability
5. Enhanced capacity utilization
This analysis helps in:
Quantifying strategy effectiveness
Planning staffing levels
Optimizing pricing decisions
Setting realistic occupancy targets
Understanding seasonal demand patterns
In [6]: import numpy as np
import matplotlib.pyplot as plt

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 20/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

from datetime import datetime, timedelta

# Set random seed for reproducibility

np.random.seed(42)

# Generate monthly data points (60 months = 5 years)

months = np.arange(0, 60)

# Parameters
base_occupancy = 65 # Base occupancy rate (%)
seasonal_amplitude = 15 # Seasonal variation amplitude
level_shift = 10 # Increase after new pricing strategy
noise_level = 2 # Random variation
intervention_point = 30 # New strategy implementation (month 30)

# Create seasonal pattern (yearly cycle = 12 months)

seasonal_pattern = seasonal_amplitude * np.sin(2 * np.pi * months / 12)

# Generate occupancy data before intervention

occupancy_before = (
base_occupancy +
seasonal_pattern[:intervention_point] +
np.random.normal(0, noise_level, intervention_point)
)

# Generate occupancy data after intervention

occupancy_after = (
base_occupancy +
level_shift +
seasonal_pattern[intervention_point:] +
np.random.normal(0, noise_level, len(months[intervention_point:]))
)

# Create date range

start_date = datetime(2020, 1, 1)
dates = [start_date + timedelta(days=int(x*30.44)) for x in months]

# Plotting
plt.figure(figsize=(15, 8))

# Plot occupancy data

plt.plot(dates[:intervention_point], occupancy_before,
label='Pre-Strategy', color='#3498DB', alpha=0.8, linewidth=2)
plt.plot(dates[intervention_point:], occupancy_after,
label='Post-Strategy', color='#2ECC71', alpha=0.8, linewidth=2)

# Add vertical line for intervention

intervention_date = dates[intervention_point]
plt.axvline(x=intervention_date, color='#E74C3C', linestyle='--',
label='New Pricing Strategy')

# Customize the plot

plt.title('Monthly Hotel Occupancy Rates (2020-2024)\nImpact of Dynamic P
fontsize=14, pad=20)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Occupancy Rate (%)', fontsize=12)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)

# Format axes

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 21/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

plt.gcf().autofmt_xdate()
plt.ylim(30, 100)

# Add annotation
plt.annotate('Strategy Implementation\nIncreased Base Occupancy',
xy=(intervention_date, np.mean(occupancy_after[:6])),
xytext=(30, 30), textcoords='offset points',
arrowprops=dict(arrowstyle='->'), fontsize=10)

plt.tight_layout()
plt.show()

# Calculate key metrics

pre_strategy_avg = np.mean(occupancy_before)
post_strategy_avg = np.mean(occupancy_after)
occupancy_improvement = post_strategy_avg - pre_strategy_avg

print(f"\nBusiness Impact Analysis:")

print(f"Average Occupancy Before Strategy: {pre_strategy_avg:.1f}%")
print(f"Average Occupancy After Strategy: {post_strategy_avg:.1f}%")
print(f"Average Occupancy Improvement: {occupancy_improvement:.1f}%")

Business Impact Analysis:

Average Occupancy Before Strategy: 66.5%
Average Occupancy After Strategy: 72.9%
Average Occupancy Improvement: 6.4%

Example
commerce8:Sales
Uptrending Time Series with Level Shift : E
Business Context:
The code simulates an e-commerce company's daily sales data, showing the impact
of a major marketing campaign launch.
Realistic Parameters:
Let be the base daily sales, be the growth rate, be the campaign impact, and
Sb r Ic

σ be the daily variation factor. Then our model parameters are:

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 22/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Base daily sales: Sb = $1, 000

Natural growth rate: (10% per time unit)

r = 0.10

Campaign impact: (level shift)

Ic = $2, 000

Random variations: where

ϵ ∼ N (0, σ ⋅ Sb ) σ = 0.005

The time series can be modeled as:

Before campaign :
(t < tc )

S(t) = Sb + rt + ϵt

After campaign (t ≥ tc ) :
S(t) = Sb + rt + Ic + ϵt

where:
is the time index
t

is the campaign start time

represents the daily random variation at time

ϵt t

The total expected increase in sales after the campaign is:

ΔS = Ic + r(t − tc )

which combines both the immediate campaign impact and the continued growth
trend.
In [3]: import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Set random seed for reproducibility

np.random.seed(42)

# Generate time points (200 days with 0.1 day intervals for smooth visual
days = np.arange(0, 200, 0.1)

# Parameters for our simulation - adjusted for smaller gap

base_daily_sales = 1000 # Base daily sales in dollars
growth_rate = 0.1 # 10% growth rate
noise_level = 0.5 # Random variation in sales
campaign_impact = 1000 # Reduced from 2000 to 1000 for smaller gap

# Generate sales data before marketing campaign

sales_before_campaign = (
base_daily_sales +
growth_rate * days[:1000] +
np.random.normal(0, noise_level * base_daily_sales/100, len(days[:100
)

# Generate sales data after marketing campaign launch

sales_after_campaign = (
base_daily_sales +
growth_rate * days[1000:] +
campaign_impact +

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 23/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

np.random.normal(0, noise_level * base_daily_sales/100, len(days[1000

)

# Combine the data series

total_sales = np.concatenate([sales_before_campaign, sales_after_campaign

# Create date range for x-axis

start_date = datetime(2024, 1, 1)
dates = [start_date + timedelta(days=x/10) for x in days]

# Create figure with adjusted size

plt.figure(figsize=(15, 8))

# Plot sales data

plt.plot(dates[:1000], sales_before_campaign,
label='Pre-Campaign Sales', color='blue', alpha=0.8)
plt.plot(dates[1000:], sales_after_campaign,
label='Post-Campaign Sales', color='green', alpha=0.8)

# Add vertical line for campaign launch

campaign_date = dates[1000]
plt.axvline(x=campaign_date, color='red', linestyle='--',
label='Marketing Campaign Launch')

# Customize the plot with larger font sizes

plt.title('Daily E-commerce Sales: Impact of Marketing Campaign Launch',
fontsize=16, pad=20)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Daily Sales ($)', fontsize=14)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)

# Increase tick label size

plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# Format date axis

plt.gcf().autofmt_xdate()

# Calculate appropriate y-axis limits

y_max = max(np.max(sales_after_campaign), np.max(sales_before_campaign))
y_min = min(np.min(sales_after_campaign), np.min(sales_before_campaign))
margin = (y_max - y_min) * 0.15 # 15% margin for annotations
plt.ylim(y_min - margin/2, y_max + margin)

# Add annotations with larger font

plt.annotate('Campaign Launch\nSales Boost',
xy=(campaign_date, np.mean(sales_after_campaign[:100])),
xytext=(50, 30), textcoords='offset points',
arrowprops=dict(arrowstyle='->'),
fontsize=12)

plt.tight_layout()
plt.show()

# Calculate and print key metrics with better formatting

pre_campaign_avg = np.mean(sales_before_campaign)
post_campaign_avg = np.mean(sales_after_campaign)
sales_increase = ((post_campaign_avg - pre_campaign_avg) / pre_campaign_a

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 24/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

print("\nBusiness Impact Analysis:")

print(f"Average Daily Sales Before Campaign: ${pre_campaign_avg:.2f}")
print(f"Average Daily Sales After Campaign: ${post_campaign_avg:.2f}")
print(f"Percentage Increase in Sales: {sales_increase:.1f}%")

Business Impact Analysis:

Average Daily Sales Before Campaign: $1005.09
Average Daily Sales After Campaign: $2015.35
Percentage Increase in Sales: 100.5%

Example 9 : Level Shift Detection

Series : Trading Volume Analysis with Rolling Mean in Time
Mathematical Framework
Let V (t) represent the trading volume at time . The model components are:
t

1. Base Model
V (t) = μ + ϵt + L(t)

where:
is the base trading volume
μ

ϵt ∼ N (0, σ ) is the random variation

L(t)is the level shift component

2. Moving Average
For a window size , the moving average
w M (t) is:
w−1
1
M (t) = ∑ V (t − i)
w
i=0

3. Anomaly Detection Bounds

Bupper (t) = M (t) + 2σw (t)

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 25/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Blower (t) = M (t) − 2σw (t)

where σw (t) is the rolling standard deviation:


w−1
 1
 2
σw (t) = ∑(V (t − i) − M (t))
⎷ w
i=0

Business Context
This analysis monitors stock trading volume to detect significant changes and
anomalies. Key components include:
1. Base Parameters:
Daily trading volume ( shares)
μ = 1, 000, 000

Random variation ( shares)

σ = 100, 000

Level shift magnitude ( shares)

Δ = 500, 000

2. Moving Average Window:

20-day trading window
Approximates one trading month
Smooths daily fluctuations
3. Anomaly Detection:
±2σ threshold for normal trading range
Captures 95% of expected variation
Flags unusual trading activity
Event Analysis
The level shift at represents:
t = 100

1. Market Events:
Company earnings announcement
Market structure change
Institutional investor activity
2. Volume Characteristics:
Sustained increase in trading activity
New baseline volume level
Maintained volatility pattern
Statistical Properties
1. Pre-Event Distribution:
2
V (t) ∼ N (μ, σ ) for t < te

2. Post-Event Distribution:
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 26/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

2
V (t) ∼ N (μ + Δ, σ ) for t ≥ te

3. Moving Average Properties:

Lag of periods
(w − 1)/2

Variance reduction by factor of w

Smoothing of short-term fluctuations

Business Applications
This analysis supports:
1. Trading strategy development
2. Risk management
3. Market microstructure analysis
4. Regulatory compliance monitoring
5. Trading algorithm optimization
The combination of moving averages and anomaly bounds provides a robust
framework for:
Detecting unusual trading patterns
Identifying structural market changes
Monitoring trading activity
Supporting trading decisions

In [7]: import numpy as np

import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from datetime import datetime, timedelta

# Set random seed for reproducibility

np.random.seed(42)

# Generate daily trading volume data

n_days = 200
base_volume = 1000000 # Base trading volume
noise_level = 100000 # Random variation in volume
shift_magnitude = 500000 # Volume increase after event
window_size = 20 # Moving average window (20 trading days)

# Generate time points

dates = [datetime(2024, 1, 1) + timedelta(days=x) for x in range(n_days)]
trading_volume = np.random.normal(loc=base_volume, scale=noise_level, siz

# Introduce level shift at t=100 (market event)

trading_volume[100:] += shift_magnitude

# Calculate rolling mean

rolling_mean = np.convolve(trading_volume,
np.ones(window_size)/window_size,
mode='valid')

# Calculate rolling standard deviation for anomaly detection

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 27/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

rolling_std = np.array([np.std(trading_volume[i:i+window_size])
for i in range(len(trading_volume)-window_size+1)]

# Define anomaly thresholds (2 standard deviations)

upper_bound = rolling_mean + 2*rolling_std
lower_bound = rolling_mean - 2*rolling_std

# Plot the results

plt.figure(figsize=(15, 8))

# Plot original volume

plt.plot(dates, trading_volume, label="Daily Trading Volume",
color='#3498DB', alpha=0.6)

# Plot rolling mean

plt.plot(dates[window_size-1:], rolling_mean,
label="20-Day Moving Average",
color='#E74C3C', linewidth=2)

# Plot anomaly bounds

plt.fill_between(dates[window_size-1:],
upper_bound, lower_bound,
color='gray', alpha=0.2,
label='Normal Range (±2σ)')

# Add event line

plt.axvline(x=dates[100], color='#2ECC71', linestyle='--',
label="Market Event")

plt.title("Stock Trading Volume Anomaly Detection\nwith Level Shift Analy

fontsize=14, pad=20)
plt.xlabel("Date", fontsize=12)
plt.ylabel("Trading Volume", fontsize=12)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.gcf().autofmt_xdate()

plt.tight_layout()
plt.show()

# Print analysis
pre_event_avg = np.mean(trading_volume[:100])
post_event_avg = np.mean(trading_volume[100:])
volume_increase = (post_event_avg - pre_event_avg) / pre_event_avg * 100

print(f"\nTrading Volume Analysis:")

print(f"Pre-Event Average Volume: {pre_event_avg:,.0f}")
print(f"Post-Event Average Volume: {post_event_avg:,.0f}")
print(f"Percentage Increase: {volume_increase:.1f}%")

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 28/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Trading Volume Analysis:

Pre-Event Average Volume: 989,615
Post-Event Average Volume: 1,502,230
Percentage Increase: 51.8%

Section
Price 4 : Real
Fluctuation World use
Anomaly case
using- Stock
Mahanalobis Distance for NVIDIA Stock
Imports and Installs
In [11]: # Install required libraries if not already installed
# !pip install yfinance matplotlib numpy scipy
import yfinance as yf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial.distance import mahalanobis
from scipy.stats import chi2

Custom Functions
Get Stock Data
In [16]: def get_stock_data(ticker, days, price_column='Close'):
"""
Fetch stock data from Yahoo Finance.

Parameters:
ticker: str - Stock ticker symbol
days: int - Number of days of historical data to fetch
price_column: str - Which price column to use ('Open', 'High', 'Low',

Returns:
pandas Series - Price data for the specified period
"""

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 29/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

end_date = pd.Timestamp.today()
start_date = end_date - pd.Timedelta(days=days)

# Fetch data
data = yf.download(ticker, start=start_date, end=end_date)
if data.empty:
raise ValueError(f"No data fetched for {ticker}. Check the ticker

# Extract specified price column

prices = data[price_column].dropna()
if prices.empty:
raise ValueError(f"{price_column} price data is empty.")

return prices

Create Lagged Features

In [17]: def create_lagged_features(series, window_size):
"""
Create lagged features from a time series.

Parameters:
series: pandas Series - The original time series
window_size: int - Number of lags to create

Returns:
pandas DataFrame with columns ordered from most recent to oldest lag
e.g., for window_size=3:
t0 (current), t-1 (1 day ago), t-2 (2 days ago)
"""
# Create list of shifted series in reverse order (from oldest to newe
lagged_series = [series.shift(i) for i in range(window_size-1, -1, -1

# Combine all series into a DataFrame

lagged_df = pd.concat(lagged_series, axis=1).dropna()

# Name columns to clearly indicate the time relationship

# t0 is current time, t-1 is one period ago, etc.
lagged_df.columns = [f't-{i}' if i > 0 else 't0' for i in range(windo

return lagged_df

Detect Anomalies
In [26]: def detect_anomalies(ticker="NVDA", days=30, window_size=5, confidence_le
"""
Detect anomalies in stock price data using Mahalanobis distance.

Parameters:
ticker: str - Stock ticker symbol
days: int - Number of days of historical data to analyze
window_size: int - Size of the rolling window for lag features
confidence_level: float - Confidence level for anomaly threshold
price_column: str - Which price column to use
"""
# Step 1: Fetch Stock Data
try:
prices = get_stock_data(ticker, days, price_column)
except Exception as e:

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 30/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

print(f"Error fetching stock data: {e}")

return None

# Step 2: Create Lagged Features

# This creates a DataFrame where each row contains 'window_size' cons
# Example for window_size=3:
# t0 (current price), t-1 (yesterday's price), t-2 (price from 2 days
lagged_data = create_lagged_features(prices, window_size)

if lagged_data.empty:
raise ValueError("Insufficient data for the specified window size

# Step 3: Calculate Mahalanobis Distance

# Compute covariance matrix and its inverse
cov_matrix = np.cov(lagged_data.values, rowvar=False)
try:
inv_cov_matrix = np.linalg.inv(cov_matrix)
except np.linalg.LinAlgError:
raise ValueError("Covariance matrix is singular. Try increasing t

# Compute mean vector

mean_vector = np.mean(lagged_data.values, axis=0)

# Calculate Mahalanobis distance for each point

mahalanobis_distances = []
for i in range(len(lagged_data)):
dist = mahalanobis(lagged_data.iloc[i].values, mean_vector, inv_c
mahalanobis_distances.append(dist)

# Convert distances to pandas Series

mahalanobis_distances = pd.Series(mahalanobis_distances, index=lagged

# Step 4: Identify Anomalies

# Use chi-squared distribution to set threshold
threshold = chi2.ppf(confidence_level, df=window_size)
anomalies = mahalanobis_distances > threshold

# Step 5: Visualize Results

plt.figure(figsize=(14, 7))

# Plot stock prices

plt.plot(prices, label=f'{ticker} {price_column} Prices', color='blue

# Highlight anomalies
anomaly_indices = anomalies[anomalies].index
anomaly_values = prices.loc[anomaly_indices]
plt.scatter(anomaly_indices, anomaly_values, color='red',
label=f'Anomalies (>{confidence_level*100}% confidence)',
zorder=5, s=100)

plt.title(f'{ticker} Stock Price Anomaly Detection\nusing Mahalanobis

plt.xlabel('Date')
plt.ylabel(f'{price_column} Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Return the results for further analysis if needed

return {
'prices': prices,

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 31/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

'lagged_data': lagged_data,
'mahalanobis_distances': mahalanobis_distances,
'anomalies': anomalies,
'threshold': threshold
}

In [31]: # Example usage

if __name__ == "__main__":
# Get stock data only
tesla_data = get_stock_data("TSLA", days=360)
print("Tesla stock data shape:", tesla_data.shape)

# Full anomaly detection

nvidia_results = detect_anomalies(
ticker="NVDA",
days=240,
window_size=5,
confidence_level=0.65,
price_column='Close'
)

if nvidia_results:
print("\nAnalysis Results:")
print(f"Number of anomalies found: {nvidia_results['anomalies'].s
print(f"Threshold value: {nvidia_results['threshold']:.2f}")

[*********************100%***********************] 1 of 1 completed
[*********************100%***********************] 1 of 1 completed
Tesla stock data shape: (247, 1)

Analysis Results:
Number of anomalies found: 4
Threshold value: 5.57

Appendix A :
Plots in DetailInterquartile Range and Box
https://en.wikipedia.org/wiki/Exploratory_data_analysis
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 32/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Definition of IQR
The Interquartile Range (IQR) is a statistical measure that represents the spread of
the middle 50% of a dataset. It is defined as:
I QR = Q3 − Q1

where:
Q1 (First Quartile): The 25th percentile (lower quartile) of the data
Q3 (Third Quartile): The 75th percentile (upper quartile) of the data
Why Is the Bulk of Data Found Within This Range?
Captures Central Distribution
Since the IQR focuses on the middle 50% of the data, it excludes extreme values and
provides a robust measure of data variability.
Resistant to Outliers
Unlike the mean and standard deviation, which are sensitive to extreme values, the
IQR is not influenced by outliers and provides a more reliable representation of
data dispersion.
Statistical Distribution Properties
In a normal distribution, approximately 50% of the data falls within the IQR
In skewed distributions, the IQR still contains the core of the data, though it
might be asymmetrically distributed
In real-world datasets, most data points are concentrated around the median,
making the IQR a natural boundary for defining expected variation
Using IQR for Outlier Detection
The Tukey's Rule uses the IQR to define outliers:
Lower Bound = Q1 − 1.5 × I QR

Upper Bound = Q3 + 1.5 × I QR

Any data points beyond these bounds are considered potential outliers.
Applications of IQR
Data Cleaning: Identifying and handling anomalies in datasets
Descriptive Statistics: Summarizing data variability without being affected by
extreme values
Machine Learning: Feature engineering and preprocessing for robust models
Finance & Economics: Measuring stock price variability and income
distributions

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 33/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Conclusion
The IQR is crucial for understanding data spread while maintaining robustness
against extreme values, making it widely used in statistical analysis and anomaly
detection.

Appendix
Detail B : Understanding Z score in
What is a Z-Score?
A Z-score (also called a standard score) measures how many standard deviations
away from the mean a data point is. The formula is:
x − μ
Z =
σ

where:
is the data point
x

is the population mean

is the population standard deviation

Properties of Z-Scores
The mean of Z-scores is always 0

The standard deviation of Z-scores is always 1

Z-scores are dimensionless

Approximately:
68% of Z-scores fall between and −1 +1

95% fall between and −2 +2

99.7% fall between and −3 +3

Computing Z-Scores: Step-by-Step Process

Step 1: Calculate the Mean ( ) μ

For a dataset with values:

n
1
μ = ∑ xi
n
i=1

Step 2: Calculate the Standard Deviation ( ) σ

 n
 1
2
σ =  ∑(xi − μ)
⎷ n
i=1

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 34/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Step 3: Apply the Z-Score Formula

For each data point : xi

xi − μ
Zi =
σ

Sample Calculation Example

Given the dataset: [82, 85, 89, 91, 93]
1. Calculate mean:
82 + 85 + 89 + 91 + 93
μ = = 88
5

2. Calculate standard deviation:

2 2 2 2 2
(82 − 88) + (85 − 88) + (89 − 88) + (91 − 88) + (93 − 88)
σ = √ = 4.3
5

3. Calculate Z-score for 91:

91 − 88
Z = = 0.69
4.32

Interpreting Z-Scores
Z = 0 : The value equals the mean
Z > 0 : The value is above the mean
Z < 0 : The value is below the mean
|Z| = 1 : The value is one standard deviation from the mean
|Z| = 2 : The value is two standard deviations from the mean
Applications of Z-Scores
1. Standardization: Converting datasets to a common scale
2. Outlier Detection: Values with are often considered outliers
|Z| > 3

3. Comparison: Comparing scores from different distributions

4. Probability: Finding percentiles using the standard normal distribution
In Python Code
import numpy as np

def calculate_zscore(x, data):

mean = np.mean(data)
std = np.std(data, ddof=0) # ddof=0 for population standard
deviation
z_score = (x - mean) / std
return z_score

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 35/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Important Notes
1. Population vs. Sample:
For population: Use σ = √
1

n
∑(xi − μ)
2

For sample: Use s = √

1
∑(xi − x̄)
2

2. Assumptions:
n−1

Z-scores are most meaningful for approximately normal distributions

They may be less informative for highly skewed distributions
3. Limitations:
Sensitive to outliers
Assumes data is normally distributed
Not robust for small sample sizes

Appendix
Distance C
and: Understanding
its ApplicationsMahanalbis
in Anomaly
Detection
1. What is Mahalanobis Distance?
The Mahalanobis distance measures how far a point is from a distribution with x

mean and The covariance matrix . Unlike Euclidean distance, it considers the
μ Σ

variance and correlation of the data, making it more robust for multivariate data.
Formula:
The Mahalanobis distance DM between a point and a distribution is given by:
x

T −1
DM (x) = √(x − μ) Σ (x − μ)

T −1
= √d Σ d where d = (x − μ)

Where:
: A data point (vector) in
x R
n

: The mean vector of the distribution

: The covariance matrix of the distribution

Σ : The inverse of the covariance matrix

−1

: Denotes the transpose of a vector or matrix

2. Why Use Mahalanobis Distance for Anomaly Detection?

In time series data, anomalies are data points that deviate significantly from the
expected behavior. The Mahalanobis distance is ideal for detecting such anomalies
because:
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 36/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

1. It accounts for the correlation between variables (e.g., multiple features in time
series).
2. It scales the data, so variables with larger variances do not dominate the
distance calculation.
3. It provides a probabilistic interpretation of how "unusual" a data point is.

3.Detection
Steps to Use Mahalanobis Distance for Anomaly
Step 1: Preprocess the Time Series Data
Ensure the time series data is clean and normalized.
If the data has multiple features, organize it into a matrix , where each row X

represents a time step and each column represents a feature.

Step 2: Compute the Mean and Covariance Matrix
Compute the mean vector of the data: μ

N
1
μ = ∑ xi
N
i=1

Where is the number of data points.

Compute the covariance matrix : Σ

N
1
T
Σ = ∑(xi − μ)(xi − μ)
N − 1
i=1

Step 3: Compute the Mahalanobis Distance for Each Data Point

For each data point , compute the Mahalanobis distance:
xi

T −1
DM (xi ) = √(xi − μ) Σ (xi − μ)

Step 4: Set a Threshold for Anomaly Detection

Determine a threshold for the Mahalanobis distance. Data points with
τ

DM (xi ) > τ are considered anomalies.

The threshold can be chosen based on statistical properties (e.g., percentiles of
the Chi-squared distribution) or domain knowledge.
Step 5: Detect Anomalies
Compare the Mahalanobis distance of each data point to the threshold . τ

Flag data points with as anomalies.

DM (xi ) > τ

4. Mathematical Intuition
Covariance Matrix
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 37/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

The covariance matrix captures the relationships between variables. Its inverse
Σ

Σ scales the distance calculation to account for these relationships.

−1

Chi-Squared Distribution
Under the assumption that the data follows a multivariate normal distribution, the
squared Mahalanobis distance follows a Chi-squared distribution with degrees
D
2
n

of freedom (where is the number of features):

2 2
D (x) ∼ χn
M

This property allows us to set probabilistic thresholds for anomaly detection.

5. Example: Anomaly Detection in Time Series Data

Dataset
Suppose we have a time series dataset with two features:
x11 x12
⎡ ⎤

⎢ x21 x22 ⎥
X = ⎢ ⎥
⎢ ⎥
⎢ ⋮ ⋮ ⎥

⎣ ⎦
xN 1 xN 2

Steps:
1. Compute the mean vector and covariance matrix .
μ Σ

2. For each data point , compute

xi . DM (xi )

3. Set a threshold (e.g., the 95th percentile of the Chi-squared distribution).

4. Flag data points with as anomalies.

DM (xi ) > τ

Appendix
how its D
used: The
for Chi Square
Mahalanobis DIstribution
method and -
setting the threshold
Why Use the Chi-Squared Distribution?
The squared Mahalanobis distance follows a Chi-squared distribution with
D
2
(x)

degrees of freedom (where is the number of features in your data). This is

n n

because:
2 T −1 2
D (x) = (x − μ) Σ (x − μ) ∼ χn
M

This relationship allows us to use the properties of the Chi-squared distribution to set
a threshold for anomaly detection.
Key Properties of the Chi-Squared Distribution
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 38/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

1. Degrees of Freedom:
The degrees of freedom correspond to the number of features in the data.
n

For example, if your data has 2 features, the squared Mahalanobis distance
follows . χ
2
2

2. Probabilistic Interpretation:
The Chi-squared distribution provides a probabilistic framework for determining
how "unusual" a data point is.
For a given significance level (e.g., 0.05 for 95% confidence), you can
α

compute the threshold such that: τ

2
P (D (x) ≤ τ ) = 1 − α
M

This means that of the data points under normal conditions will
100(1 − α)%

have a squared Mahalanobis distance less than or equal to . τ

3. Threshold Calculation:
The threshold is computed using the percent-point function (PPF) of the
τ

Chi-squared distribution:
2
τ = χn (1 − α)

Where:
χn
2
is the Chi-squared distribution with degrees of freedom.
n

α is the significance level (e.g., 0.05 for 95% confidence).

1 − α is the confidence level (e.g., 0.95 or 95%)

Why is a Threshold Needed?

The Mahalanobis distance measures how far a point is from the mean of a
multivariate distribution. A threshold is necessary to:
Distinguish anomalies from normal data points
Control false positives and false negatives in anomaly detection
Define a quantitative boundary beyond which points are considered
anomalous
Threshold Calculation Using Chi-Square Distribution
The Mahalanobis distance for a data point in a dataset with mean and
xi μ

covariance matrix is given by:

T −1
M Di = √(xi − μ) S (xi − μ)

Since Mahalanobis distances approximately follow a Chi-Square ( ) distribution χ

with degrees of freedom equal to the number of features ( ), we define the threshold
p

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 39/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

as:
2 2
MD ≤ χp,α
i

where:
is the number of dimensions (features)
p

is the significance level, which controls how extreme a value must be to be

considered an anomaly
χp,αis the critical value of the Chi-Square distribution at the desired confidence
2

level
Choosing the Right Confidence Level
The choice of the confidence level ( ) affects anomaly detection:
1 − α

Lower threshold (e.g., 90% confidence, ) results in more anomalies

α = 0.10

detected, increasing false positives

Higher threshold (e.g., 99% confidence, ) results in fewer anomalies
α = 0.01

detected, increasing false negatives

A commonly used threshold is 95% confidence ( ) α = 0.05

For example, when , the 95% confidence threshold is:

p = 2

2
χ ≈ 5.99
2,0.05

This means any data point with MD

2
> 5.99 is flagged as an anomaly.
Implementation in Python
The threshold can be computed using Python with scipy.stats.chi2.ppf :
import numpy as np
import scipy.stats as stats
# Number of features (dimensions)
p = 2
# Significance level (e.g., 95% confidence)
alpha = 0.05
# Compute Chi-Square threshold
threshold = stats.chi2.ppf(1 - alpha, df=p)
print(f"Chi-Square threshold for {p} features at {100*(1-alpha)}%
confidence: {threshold}")

Conclusion
The Mahalanobis distance threshold is derived from the Chi-Square
distribution
The confidence level ( ) determines the strictness of anomaly detection
1 − α

A typical threshold is for 95% confidence

χ
2

Proper tuning of ensures a balance between anomaly detection and false

p,0.05

positives
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 40/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

This approach makes Mahalanobis distance-based anomaly detection highly

effective in multivariate datasets.

Appendix E : Window
Level Shift Detection Size Selection for
Theoretical Guidelines
The choice of window size depends on several factors:
1. General Rules of Thumb:
Minimum window size: to observations
wmin = 8 12

Maximum window size: where is total observations

wmax = √n n

Common choices: 20-30 observations for daily data

2. CUSUM (Cumulative Sum) Method:
Traditional approach uses window sizes of 8-15 observations
Reference: Page, E. S. (1954) "Continuous Inspection Schemes"
Academic References
Key research papers on window size selection:
1. Statistical Process Control:
Montgomery, D.C. (2009) suggests:
ln(ARL0 )
woptimal ≈
2Δ2

where:
is Average Run Length under control
ARL0

is the expected shift magnitude

2. EWMA (Exponentially Weighted Moving Average):

Lucas and Saccucci (1990) recommend:
1
λ = 1 −
w

where is the smoothing parameter

3. Change Point Detection:

Basseville and Nikiforov (1993): suggest window sizes of:
20-40 points for gradual changes
8-15 points for abrupt changes
Practical Guidelines
1. Business Data Frequency:
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 41/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

Daily data: 20-30 days (about one month)

Weekly data: 8-12 weeks (about one quarter)
Monthly data: 6-12 months (half to full year)
2. Expected Shift Characteristics:
Abrupt shifts: Smaller windows (8-15 points)
Gradual shifts: Larger windows (20-40 points)
Seasonal data: Window size > seasonal period
3. Statistical Power Considerations:
For detecting shifts of magnitude : δ

2
4σ
wmin =
δ2

where is process standard deviation

Implementation Example
def optimal_window_size(n_observations, expected_shift_magnitude,
std_dev):
"""
Calculate optimal window size for level shift detection

Parameters:
n_observations: Total number of observations
expected_shift_magnitude: Expected magnitude of level shift
std_dev: Standard deviation of the process
"""

# Minimum window size based on shift magnitude

w_min = int(4 * (std_dev/expected_shift_magnitude)**2)

# Maximum window size based on series length

w_max = int(np.sqrt(n_observations))

# Optimal window size

w_opt = min(max(w_min, 8), w_max)

return w_opt

Key Considerations for Window Size Selection

1. Trade-offs:
Smaller windows: Faster detection, more false alarms
Larger windows: More robust, slower detection
2. Data Characteristics:
Noise level
Expected shift magnitude
Data frequency
Seasonality
file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 42/43
04/02/2025, 16:15 Introduction_to_Anomaly_Detection

3. Business Requirements:
Detection speed needs
False alarm tolerance
Computational resources
The most cited references suggest:
For quick detection: to
w = 8 15

For robust detection: to

w = 20 30

For seasonal data: seasonal period

w >

References:
1. Basseville, M., & Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and
Application.
2. Montgomery, D. C. (2009). Statistical Quality Control.
3. Lucas, J. M., & Saccucci, M. S. (1990). Exponentially weighted moving average
control schemes.

End of Tutorial

file:///Users/anishroychowdhury/Desktop/Introduction_to_Anomaly_Detection.html 43/43

A320 Exam
93% (27)
A320 Exam
79 pages
Vertical Pressure Vessel Design Vertical Pressure Vessel Design
No ratings yet
Vertical Pressure Vessel Design Vertical Pressure Vessel Design
86 pages
(Terrorism, Security, and Computation) Kishan G. Mehrotra, Chilukuri K. Mohan, HuaMing Huang (Auth.) - Anomaly Detection Principles and Algorithms-Springer International Publishing (2017)
No ratings yet
(Terrorism, Security, and Computation) Kishan G. Mehrotra, Chilukuri K. Mohan, HuaMing Huang (Auth.) - Anomaly Detection Principles and Algorithms-Springer International Publishing (2017)
229 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Module_11(c)
No ratings yet
Module_11(c)
4 pages
Smbl Merged
No ratings yet
Smbl Merged
28 pages
Unit 2 - Part A
No ratings yet
Unit 2 - Part A
51 pages
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
No ratings yet
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
45 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
Introtoanomalydetection 170421012904
No ratings yet
Introtoanomalydetection 170421012904
53 pages
Anomaly Detection
No ratings yet
Anomaly Detection
51 pages
Journal of Statistical Software: Anomaly
No ratings yet
Journal of Statistical Software: Anomaly
24 pages
What Is Anomaly Detection: MR Hew Ka Kian Hew - Ka - Kian@rp - Edu.sg
No ratings yet
What Is Anomaly Detection: MR Hew Ka Kian Hew - Ka - Kian@rp - Edu.sg
29 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Advanced Data Analysis Techniques 3
No ratings yet
Advanced Data Analysis Techniques 3
31 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
No ratings yet
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
13 pages
242944
No ratings yet
242944
108 pages
ADS EXP 7
No ratings yet
ADS EXP 7
10 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
No ratings yet
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
9 pages
Outlier Analysis
No ratings yet
Outlier Analysis
18 pages
T6_QMchange-point-anomaly
No ratings yet
T6_QMchange-point-anomaly
11 pages
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
33 pages
Knime Anomaly Detection Visualization
No ratings yet
Knime Anomaly Detection Visualization
13 pages
1721169192359
No ratings yet
1721169192359
4 pages
Anomaly Detection and Time Series Analysis1
No ratings yet
Anomaly Detection and Time Series Analysis1
6 pages
Large-Scale Unusual Time Series Detection
No ratings yet
Large-Scale Unusual Time Series Detection
4 pages
A Review On Anomaly Detection in Time Series
No ratings yet
A Review On Anomaly Detection in Time Series
6 pages
Anomaly Detection Survey
No ratings yet
Anomaly Detection Survey
72 pages
5.1.1 Objective and Scope: Jyenis 2020
No ratings yet
5.1.1 Objective and Scope: Jyenis 2020
8 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Identifying and Handling Outliers in Pandas - A Step-By-Step Guide - by Arvid Eichner - Python in Plain English
No ratings yet
Identifying and Handling Outliers in Pandas - A Step-By-Step Guide - by Arvid Eichner - Python in Plain English
19 pages
Go l Mohammad i 2015
No ratings yet
Go l Mohammad i 2015
10 pages
Distance Based Outlier Detection
No ratings yet
Distance Based Outlier Detection
40 pages
Reverse Accessible in Local Outlier Factor Density Based Recognition (1)
No ratings yet
Reverse Accessible in Local Outlier Factor Density Based Recognition (1)
10 pages
Anomaly Detection
No ratings yet
Anomaly Detection
22 pages
Anomaly Detection Guidebook
100% (1)
Anomaly Detection Guidebook
16 pages
CS37300 Data Mining & Machine Learning: Anomaly Detection
No ratings yet
CS37300 Data Mining & Machine Learning: Anomaly Detection
10 pages
ADS Ut2
No ratings yet
ADS Ut2
23 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Anomaly detection for Cyber security
No ratings yet
Anomaly detection for Cyber security
31 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
UNIT 4
No ratings yet
UNIT 4
17 pages
Outlier Detection and Removal
No ratings yet
Outlier Detection and Removal
2 pages
1 s2.0 S0952197622004936 Main
No ratings yet
1 s2.0 S0952197622004936 Main
8 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
Anomaly_detection
No ratings yet
Anomaly_detection
13 pages
WINSEM2024-25_CBS3006_ETH_VL2024250505168_2025-01-09_Reference-Material-III
No ratings yet
WINSEM2024-25_CBS3006_ETH_VL2024250505168_2025-01-09_Reference-Material-III
4 pages
02 - 03 - Anomaly Detection Survey
No ratings yet
02 - 03 - Anomaly Detection Survey
27 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Anomoly detection
No ratings yet
Anomoly detection
2 pages
Atf ETH Master Thesis AD+RCA
No ratings yet
Atf ETH Master Thesis AD+RCA
43 pages
Cheboli Deepthi May2010 PDF
No ratings yet
Cheboli Deepthi May2010 PDF
83 pages
Script
No ratings yet
Script
11 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
10.anomaly Detection
No ratings yet
10.anomaly Detection
24 pages
Filtering, Control and Fault Detection with Randomly Occurring Incomplete Information
From Everand
Filtering, Control and Fault Detection with Randomly Occurring Incomplete Information
Hongli Dong
No ratings yet
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
01 - Overview 860E - AC - Mech
100% (4)
01 - Overview 860E - AC - Mech
42 pages
32 Weeks Form Afzaal
No ratings yet
32 Weeks Form Afzaal
36 pages
Teradata Database Queue Tables
No ratings yet
Teradata Database Queue Tables
6 pages
GB-NetVision-operating Manual PDF
No ratings yet
GB-NetVision-operating Manual PDF
46 pages
Debugging in Visual Studio
No ratings yet
Debugging in Visual Studio
39 pages
DrillPipe, 80%, 2.875 OD, 0.362 Wall, EU, G-105.. XT26 (3.375 X 1.750)
No ratings yet
DrillPipe, 80%, 2.875 OD, 0.362 Wall, EU, G-105.. XT26 (3.375 X 1.750)
3 pages
Halima Abdulla Quba
No ratings yet
Halima Abdulla Quba
11 pages
2007-07-09 Poster CMDB ReqRec
No ratings yet
2007-07-09 Poster CMDB ReqRec
1 page
Introduction To GSM: GSM (Global System For Mobile Communications)
No ratings yet
Introduction To GSM: GSM (Global System For Mobile Communications)
12 pages
LNG Price Sensitivity
No ratings yet
LNG Price Sensitivity
7 pages
Dfsdfds
No ratings yet
Dfsdfds
351 pages
Experiment: Depth Filtration:: Objective
No ratings yet
Experiment: Depth Filtration:: Objective
6 pages
Esquema Elétrico Tacógrafo (TCO)
67% (3)
Esquema Elétrico Tacógrafo (TCO)
5 pages
DDCMIS
No ratings yet
DDCMIS
31 pages
WSN Routing Algorithm
No ratings yet
WSN Routing Algorithm
33 pages
Exercise-04 - Noise Margin and Realization of Logic Gates
No ratings yet
Exercise-04 - Noise Margin and Realization of Logic Gates
4 pages
NPN Epitaxial Silicon Transistor
No ratings yet
NPN Epitaxial Silicon Transistor
5 pages
GCE Advanced Level - Engineering Technology - Worksheet Automobile
No ratings yet
GCE Advanced Level - Engineering Technology - Worksheet Automobile
14 pages
Compressed Gas Cylinders Safety Checklist
No ratings yet
Compressed Gas Cylinders Safety Checklist
4 pages
PID Simulator Demo
No ratings yet
PID Simulator Demo
1 page
1.3-5 Install Device Driver
No ratings yet
1.3-5 Install Device Driver
12 pages
Sohel Single page Resume Sample
No ratings yet
Sohel Single page Resume Sample
1 page
8 Manua 0
No ratings yet
8 Manua 0
2 pages
Au086inst PDF
No ratings yet
Au086inst PDF
510 pages
159de446-3756-47d0-8c95-be828898dded
No ratings yet
159de446-3756-47d0-8c95-be828898dded
1,213 pages
Zenith: Marketing Research For High Definition Television (HDTV)
No ratings yet
Zenith: Marketing Research For High Definition Television (HDTV)
13 pages
Details 2
No ratings yet
Details 2
9 pages
Demo-Media and Info
No ratings yet
Demo-Media and Info
25 pages