0% found this document useful (0 votes)
13 views20 pages

Week3 Module AIL1020 L201

The document outlines a module on Measures of Central Tendency in a statistics course, focusing on descriptive statistics, including mean, median, mode, and variance. It emphasizes Chebyshev’s Inequality, which provides bounds on data deviation from the mean, applicable to datasets without normal distribution. Practical examples illustrate its use in fraud detection and delivery time predictions, demonstrating the maximum proportions of data that may fall outside specified ranges.

Uploaded by

actcreation4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views20 pages

Week3 Module AIL1020 L201

The document outlines a module on Measures of Central Tendency in a statistics course, focusing on descriptive statistics, including mean, median, mode, and variance. It emphasizes Chebyshev’s Inequality, which provides bounds on data deviation from the mean, applicable to datasets without normal distribution. Practical examples illustrate its use in fraud detection and delivery time predictions, demonstrating the maximum proportions of data that may fall outside specified ranges.

Uploaded by

actcreation4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

AIL1020

Foundations of Statistics & Probability


Instructor
Dr. Rajlaxmi Chouhan
Associate Professor, Department of Electrical Engineering
Head, Center for Education & Technology for Education
IIT Jodhpur

[email protected]
Module 02
Measures of Central Tendency
Exploring data variability
Descriptive Statistics Contd.

Recap

Data organization and visualization

Introduction to Descriptive Statistics

Mean, median, and mode

Population vs. Sample Variance


Descriptive Statistics Contd.

In this video,

Chebychev’s Identity

Apply Chebyshev’s Inequality to datasets with unknown distributions.

Interpret the probability bounds for real-world scenarios


Descriptive Statistics Contd.

Standard Deviation & Probability


Descriptive Statistics Contd.

Standard Deviation & Probability


Descriptive Statistics Contd.

Chebychev’s Identity
Chebyshev’s Inequality is a fundamental theorem in probability that provides a
bound on how much of the data deviates from the mean.

Chebyshev’s Inequality states that for any dataset (not necessarily normal),

The probability that a value is at least


k standard deviations away from the
mean is at most 1/k2.
Descriptive Statistics Contd.

Why should you know about Chebychev’s Inequality?

It does not assume normality, making it broadly applicable.

It gives a worst-case bound, meaning it guarantees that extreme values do not occur
too often (helps in setting safety bounds)

It is widely used in anomaly detection, AI model evaluation, and risk assessment.


Descriptive Statistics Contd.

Example Fraud Detection in AI Systems


A company tracks daily transactions of an e-commerce AI system.

The mean transaction value is: AI fraud detection models can use this to set
risk thresholds.
μ = 200 dollars
σ = 50 dollars At most 11.11% of transactions will be
below 50 dollars or above 350 dollars.
Fraudulent transactions usually have
extremely high or low values.

E.g. How often a transaction will be


more than 3 standard deviations away
from the mean?
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


Descriptive Statistics Contd.

Scenario AI-powered Delivery System


An AI-driven delivery system predicts estimated delivery times (in minutes) for a food
delivery app.

After analyzing data from thousands of orders, you find:

The mean delivery time (𝜇) is 30 minutes.


The standard deviation (𝜎) is 5 minutes.
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes.
The standard deviation (𝜎) is 5 minutes.

Your company wants to guarantee customers that deliveries will be


within 20 minutes to 40 minutes most of the time.

Use Chebyshev’s inequality to determine the


maximum proportion of deliveries that might fall outside this range.
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes. Your company wants to guarantee
customers that deliveries will be within 20
The standard deviation (𝜎) is 5 minutes.
minutes to 40 minutes most of the time.

Use Chebyshev’s inequality to determine the maximum proportion of deliveries that might fall
outside this range.
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes. Your company wants to guarantee
customers that deliveries will be within 20
The standard deviation (𝜎) is 5 minutes.
minutes to 40 minutes most of the time.

Use Chebyshev’s inequality to determine the maximum proportion of deliveries that might fall
outside this range.
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes. Your company wants to guarantee
customers that deliveries will be within 20
The standard deviation (𝜎) is 5 minutes.
minutes to 40 minutes most of the time.

Use Chebyshev’s inequality to determine the maximum proportion of deliveries that might fall
outside this range.

Ans: At most 25% of deliveries might take


less than 20 minutes or more than 40 minutes.
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes.
The standard deviation (𝜎) is 5 minutes.

The company considers a delivery "very late" if it takes more than 3 standard
deviations from the mean.
Use Chebyshev’s inequality to find out: at most, what percentage of deliveries
will take more than 45 minutes?
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes. The company considers a delivery "very late"
The standard deviation (𝜎) is 5 minutes. if it takes more than 3 standard deviations
from the mean.

Use Chebyshev’s inequality to find out:


At most, what percentage of deliveries will take more than 45 minutes?
Descriptive Statistics Contd.

Scenario AI-powered Delivery System


The mean delivery time (𝜇) is 30 minutes. The company considers a delivery "very late"
The standard deviation (𝜎) is 5 minutes. if it takes more than 3 standard deviations
from the mean.

Use Chebyshev’s inequality to find out:


At most, what percentage of deliveries will take more than 45 minutes?

At most 11.11% of deliveries might take more than 45 minutes.


Descriptive Statistics Contd.

Summary
Chebyshev’s Inequality is a powerful tool when data distributions are unknown.

It provides minimum guaranteed probability bounds for datasets.

Coming up next…
Normal distributions and paired datasets

You might also like