Class X Use of Statistics in Data Science
Class X Use of Statistics in Data Science
DATA SCIENCE
Class Activity
Survey 10 classmates about their favorite fruit. Create
a tally chart and count how many chose each fruit.
WHAT IS DATA
SCIENCE
Data Science is a field that combines statistics,
computer science, and domain knowledge to extract
insights from data. It involves analyzing large sets of
data to discover patterns and make decisions.
DATASET?
A dataset is a collection of related data stored in rows and columns,
similar to a spreadsheet or table.
VARIABL
Can be measured (e.g., height,
age)
ES
Variables are characteristics
or attributes that can take
different values. Qualitative:
Descriptive (e.g., color, gender)
Activity:
Sort the following into types: Age, Favorite color, Number of pets,
Gender, Height.
Biased
sample:
Unbiased
sample:
UNDERSTANDIN
G POPULATION
AND SAMPLE
Selected in a
• Population: The entire group you're
way that favors
Fair and interested in studying (e.g., all students
certain in the school).
representative
outcomes (e.g., • Sample: A small part of the population
of the whole
only asking top used to represent the whole.
population
students about • Importance: Studying the entire
study habits) population is often impossible, so we
use samples to draw conclusions.
SUBSETS
Subsetting means selecting a portion of the dataset for
specific analysis.
Row-Based Subsetting:
Select rows based on
conditions
(e.g., students with marks
above 80)
Data Subsetting:
Apply both row and column filters
based on
logical conditions
(e.g., boys who scored above 70 in
Math only)
Let’s explore
different ways we
can represent data!
QUANTITATIVE DATA
REPRESENTATION
Quantitative data deals with numbers — things you can count or measure
(e.g., height, marks, number of books).
Fruit Count
Apples 🍎🍎🍎 Indoor Outdoor
Oranges 🍊🍊 Boys 3 7
Mangoes 🥭🥭🥭🥭 Girls 6 4
🧠 Why it's useful: Visually fun and easy for young 🧠 Why it's useful: Helps compare groups (e.g.,
learners. boys vs. girls).
INTERPRETING TWO-WAY
TABLES
Look at the values in rows and columns to find patterns and totals.
Indoor Outdoor
Total
Boys 4 6
10 Joint Frequency = Count at the intersection
Girls 5 5
10
of a row and column.
Total 9 11 • Example: Boys who like outdoor games =
20 6
Boys 10 20 10 40
10/80=0 20/80=0 10/80=0
Boys 0.5
.125 .25 .125
Girls 20 15 5 40
20/80=0 15/80=0 5/80=0.
Girls 0.5
Total 30 35 15 80 .25 .1875 0625
• All inside numbers = joint frequencies Total 0.375 0.4375 0.1875 1.0
• Row and column totals = marginal
frequencies
Step 2: Find Total Number of Data Points
This is the grand total, here it’s 80 students.
Step 3: Divide Each Joint Frequency by the Grand Total
Your Task:
1.Convert the frequency table above into a
Girls 6 12 12 30
relative frequency table by dividing each joint
frequency by the grand total (60).
2.Fill in the row and column totals in relative Total 18 22 20 60
terms.
3.Answer this question:
⚬ What percentage of students who walk to
school are girls?
DESCRIPTI
VE
STATISTIC
Descriptive Statistics are
methods to summarize and
S
describe features of a dataset.
Key Components:
• Measures of Central Tendency
(Mean, Median, Mode)
• Measures of Dispersion (Range, MAD,
Variance, Standard Deviation)
• Data Distribution (e.g., bell curves)
Mean = Sum of all values ÷ Number
of values
Example: Scores: 70, 75, 80, 85, 90
MEAN
Mean = (70 + 75 + 80 + 85 + 90) ÷ 5
= 80
(AVERAGE)
🧠 Tip: Sensitive to extreme values
(outliers)
PRACTICE QUESTION
The math test scores of 5 students are:
Scores: 68, 74, 80, 85, 93
👉 What is the mean score?
Median = Middle value when data is arranged
in order
Steps:
1.Arrange data in ascending order
2.If odd number of items → middle one is
median MEDIAN
3.If even → average of two middle numbers
(MIDDLE
Example: Data: 12, 15, 20, 22, 30 →Median =
20
VALUE)
PRACTICE
Data: QUESTION
12, 15, 20, 22 → Median = (15 + 20)/2 =
•
17.5The ages of 7 participants in a chess tournament
are:
Ages: 12, 15, 11, 14, 13, 12, 14
👉 What is the median age?
• Daily temperatures (in °C) recorded over 6 days:
Temps: 22, 24, 20, 25, 23, 21
👉 What is the median temperature?
Mode = Value that appears most often
Example: Data: 12, 14, 14, 16, 18 → Mode =
14
MODE
Data: 10, 20, 30 → No mode (all unique)
(MOST
Data: 5, 5, 10, 10, 15 → Bimodal (5 and 10) FREQUENT
VALUE)
PRACTICE QUESTION
The number of pets in 10 households
is:
Pets: 1, 2, 2, 3, 4, 2, 1, 3, 5, 2
👉 What is the mode of the data?
HOW TO CHOOSE
THE BEST
CENTRAL
MEASURE
Always look at the shape and nature of the
data:
• Use Mean for balanced data
• Use Median if it’s skewed or has outliers
• Use Mode for categories or most common
value
Range
Dispers
ion Standard Deviation
Range
It denotes the difference between the highest and the lowest value
in a data set
For Eg:
Data Set A 10,18,50,700,1000,5000,20000,50000,100000
Range=100000-10=99990
Mean Absolute Deviation tells us how far the numbers in a group are from the
average (mean).
Practice Question: Mean Absolute Deviation (MAD)
The number of pages 5 students read over the weekend is:
📖 Pages: 12, 15, 10, 13, 20
🎯 Your Task:
• Find the mean of the data.
• Find the distance of each number from the mean (ignore
negative signs).
• Add all the absolute deviations.
• Divide by how many numbers there are.
• Write the Mean Absolute Deviation (MAD).
VARIANC
Variance tells us how far each number in a group is from the
E
average (mean). It shows how spread out the data is.
Formula :
Variance = [ (x₁ - mean)² + (x₂ - mean)² + ... + (xn - mean)² ] / n
Where:
x = each data value
mean = average of all values
n = total number of values
Standard
Deviation
Standard Deviation is the square root of variance.
It tells us the average distance of each value from the mean.
Formula:
Standard Deviation = √Variance
Example: Find Variance & Standard
Deviation
Data Set: 4, 6, 6, 8, 10
Valu Difference (x - Square of
Step 1: Find the Mean
e mean) Difference
Mean = (4 + 6 + 6 + 8 + 10) / 5 = 34 / 5
= 6.8
4 4 - 6.8 = -2.8 (-2.8)² = 7.84