0% found this document useful (0 votes)
43 views29 pages

Class X Use of Statistics in Data Science

The document provides an overview of statistics and data science, explaining their importance in analyzing and interpreting data. It covers key concepts such as datasets, variables, sampling, data representation, and descriptive statistics, along with practical activities for students. Additionally, it introduces methods for summarizing data, including measures of central tendency and dispersion.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views29 pages

Class X Use of Statistics in Data Science

The document provides an overview of statistics and data science, explaining their importance in analyzing and interpreting data. It covers key concepts such as datasets, variables, sampling, data representation, and descriptive statistics, along with practical activities for students. Additionally, it introduces methods for summarizing data, including measures of central tendency and dispersion.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

CLASS X

DATA SCIENCE

Use of Statistics & Data


Science

Understanding Data Through


Numbers
WHAT IS
STATISTICS?
Statistics is the branch of mathematics that deals with
collecting, organizing, analyzing, and interpreting
numerical data. It helps us make informed decisions
based on data trends and patterns.

Class Activity
Survey 10 classmates about their favorite fruit. Create
a tally chart and count how many chose each fruit.
WHAT IS DATA
SCIENCE
Data Science is a field that combines statistics,
computer science, and domain knowledge to extract
insights from data. It involves analyzing large sets of
data to discover patterns and make decisions.

Example:Online shopping websites use data science to


recommend products.
STATISTICAL PROCESS
IN DATA SCIENCET IS
DATA SCIENCE
Make decisions
Define the problem Collect data Clean and process Analyze data using
Interpret results based on insights
the data statistics
PROBLEM
STATEMENT
A problem statement clearly defines what you want to find out using data.
It gives direction to your research.

Class Activity: Create a problem statement such as “Do students in Class


10 get enough sleep?”
WHAT IS A
MAKE SPOT TRENDS/
PATTERNS
DECISIONS
EDUCATE
OTHERS

DATASET?
A dataset is a collection of related data stored in rows and columns,
similar to a spreadsheet or table.

Example: Student names, their age, marks in


different subjects.
Quantitative:

VARIABL
Can be measured (e.g., height,
age)

ES
Variables are characteristics
or attributes that can take
different values. Qualitative:
Descriptive (e.g., color, gender)

Activity:
Sort the following into types: Age, Favorite color, Number of pets,
Gender, Height.
Biased
sample:
Unbiased
sample:
UNDERSTANDIN
G POPULATION
AND SAMPLE
Selected in a
• Population: The entire group you're
way that favors
Fair and interested in studying (e.g., all students
certain in the school).
representative
outcomes (e.g., • Sample: A small part of the population
of the whole
only asking top used to represent the whole.
population
students about • Importance: Studying the entire
study habits) population is often impossible, so we
use samples to draw conclusions.
SUBSETS
Subsetting means selecting a portion of the dataset for
specific analysis.

Why Subset Data?


• To focus on specific groups or categories
• To simplify analysis by filtering relevant
information
• To compare different parts of a dataset
Column-Based
Subsetting:
Select specific variables
(e.g., only names and
marks)

Row-Based Subsetting:
Select rows based on
conditions
(e.g., students with marks
above 80)

Data Subsetting:
Apply both row and column filters
based on
logical conditions
(e.g., boys who scored above 70 in
Math only)
Let’s explore
different ways we
can represent data!
QUANTITATIVE DATA
REPRESENTATION
Quantitative data deals with numbers — things you can count or measure
(e.g., height, marks, number of books).

Frequency Table Dot Plot


This shows how often each number/value appears in the Each dot represents one piece of data. Easy to visualize
data. small datasets.
Example:
Example:
Data: 2, 3, 2, 5, 3, 5, 5, 4 Marks: 2 3 4 5
●● ●● ● ●●●
Value Frequency
2 2
3 2
4 1
5 3 🧠 Why it's useful: Quickly shows how the data is spread
🧠 Why it's useful: Helps us see which value is most out.
QUALITATIVE DATA
REPRESENTATION
Qualitative data is categorical — words or labels (e.g., colors, gender, game
types).

Pictograph Two-Way frequency Table


Uses pictures/icons to show data. Shows the relationship or comparisons
Example: Favorite Fruits between two categories(variables).
🍎 = 1 vote Example: Game Preference by Gender

Fruit Count
Apples 🍎🍎🍎 Indoor Outdoor
Oranges 🍊🍊 Boys 3 7
Mangoes 🥭🥭🥭🥭 Girls 6 4

🧠 Why it's useful: Visually fun and easy for young 🧠 Why it's useful: Helps compare groups (e.g.,
learners. boys vs. girls).
INTERPRETING TWO-WAY
TABLES
Look at the values in rows and columns to find patterns and totals.

Activity: Example Table:


Using the previous table:
Indoor Outdoor Total
How many students prefer outdoor Boys 4 6 6+3=
games? Girls 7 3 ?

How many girls prefer indoor games? 7


Two-Way Relative Frequency
A Two-Way Frequency Table is used to display data collected from two categorical variables.
Tables
It shows how often combinations of categories occur — this is called the joint frequency. The
totals along the rows and columns are called marginal frequencies.

🧪 Example: Favorite Type of Game by Gender

Indoor Outdoor
Total
Boys 4 6
10 Joint Frequency = Count at the intersection
Girls 5 5
10
of a row and column.
Total 9 11 • Example: Boys who like outdoor games =
20 6

Joint Frequencies (inside the table):


Boys + Indoor = 4 Marginal Frequency = Row or column
Boys + Outdoor = 6
totals.
Girls + Indoor = 5
Girls + Outdoor = 5
• Example: Total number of girls = 10

Marginal Frequencies (row/column totals):


Total Boys = 10
Step-by-Step: Convert to Relative
Frequencies
Step 1: Start with a Two-Way Frequency Table
Example – Survey of students’ preferred drink by gender:
Step 4: Fill the Two-Way Relative Frequency
Table

Juice Soda Water Total


Juice Soda Water Total

Boys 10 20 10 40
10/80=0 20/80=0 10/80=0
Boys 0.5
.125 .25 .125
Girls 20 15 5 40
20/80=0 15/80=0 5/80=0.
Girls 0.5
Total 30 35 15 80 .25 .1875 0625

• All inside numbers = joint frequencies Total 0.375 0.4375 0.1875 1.0
• Row and column totals = marginal
frequencies
Step 2: Find Total Number of Data Points
This is the grand total, here it’s 80 students.
Step 3: Divide Each Joint Frequency by the Grand Total

Relative Frequency=Joint Frequency/Grand Total


​ et’s calculate for Boys who chose Soda:
L
20/80​=0.25
PRACTICE Two-Way Frequency
Table:

QUESTION Bus Bike Walk Total


A school surveyed 60 students about
their preferred mode of transport to
Boys 12 10 8 30
school. The results are shown :

Your Task:
1.Convert the frequency table above into a
Girls 6 12 12 30
relative frequency table by dividing each joint
frequency by the grand total (60).
2.Fill in the row and column totals in relative Total 18 22 20 60
terms.
3.Answer this question:
⚬ What percentage of students who walk to
school are girls?
DESCRIPTI
VE
STATISTIC
Descriptive Statistics are
methods to summarize and
S
describe features of a dataset.

Key Components:
• Measures of Central Tendency
(Mean, Median, Mode)
• Measures of Dispersion (Range, MAD,
Variance, Standard Deviation)
• Data Distribution (e.g., bell curves)
Mean = Sum of all values ÷ Number
of values
Example: Scores: 70, 75, 80, 85, 90
MEAN
Mean = (70 + 75 + 80 + 85 + 90) ÷ 5
= 80
(AVERAGE)
🧠 Tip: Sensitive to extreme values
(outliers)

PRACTICE QUESTION
The math test scores of 5 students are:
Scores: 68, 74, 80, 85, 93
👉 What is the mean score?
Median = Middle value when data is arranged
in order
Steps:
1.Arrange data in ascending order
2.If odd number of items → middle one is
median MEDIAN
3.If even → average of two middle numbers
(MIDDLE
Example: Data: 12, 15, 20, 22, 30 →Median =
20
VALUE)
PRACTICE
Data: QUESTION
12, 15, 20, 22 → Median = (15 + 20)/2 =

17.5The ages of 7 participants in a chess tournament
are:
Ages: 12, 15, 11, 14, 13, 12, 14
👉 What is the median age?
• Daily temperatures (in °C) recorded over 6 days:
Temps: 22, 24, 20, 25, 23, 21
👉 What is the median temperature?
Mode = Value that appears most often
Example: Data: 12, 14, 14, 16, 18 → Mode =
14
MODE
Data: 10, 20, 30 → No mode (all unique)
(MOST
Data: 5, 5, 10, 10, 15 → Bimodal (5 and 10) FREQUENT
VALUE)
PRACTICE QUESTION
The number of pets in 10 households
is:
Pets: 1, 2, 2, 3, 4, 2, 1, 3, 5, 2
👉 What is the mode of the data?
HOW TO CHOOSE
THE BEST
CENTRAL
MEASURE
Always look at the shape and nature of the
data:
• Use Mean for balanced data
• Use Median if it’s skewed or has outliers
• Use Mode for categories or most common
value
Range

Mean Absolute Deviation


Measur (MAD)
es
of
Variance

Dispers
ion Standard Deviation
Range
It denotes the difference between the highest and the lowest value
in a data set

For Eg:
Data Set A 10,18,50,700,1000,5000,20000,50000,100000
Range=100000-10=99990

Data Set B. 93,63,34,39,42,56,63,61,82,93


Range=93-34=59
Mean Absolute Deviation (MAD)

Step 1: Step 2: Step 3: Step 4:


Find the Find the Add All Divide by How
Distance Many
Mean the
of Each Number Numbers There
Absolute
from the Mean Are
Deviations

Mean Absolute Deviation tells us how far the numbers in a group are from the
average (mean).
Practice Question: Mean Absolute Deviation (MAD)
The number of pages 5 students read over the weekend is:
📖 Pages: 12, 15, 10, 13, 20

🎯 Your Task:
• Find the mean of the data.
• Find the distance of each number from the mean (ignore
negative signs).
• Add all the absolute deviations.
• Divide by how many numbers there are.
• Write the Mean Absolute Deviation (MAD).
VARIANC
Variance tells us how far each number in a group is from the

E
average (mean). It shows how spread out the data is.

Formula :
Variance = [ (x₁ - mean)² + (x₂ - mean)² + ... + (xn - mean)² ] / n

Where:
x = each data value
mean = average of all values
n = total number of values
Standard
Deviation
Standard Deviation is the square root of variance.
It tells us the average distance of each value from the mean.

Formula:
Standard Deviation = √Variance
Example: Find Variance & Standard
Deviation
Data Set: 4, 6, 6, 8, 10
Valu Difference (x - Square of
Step 1: Find the Mean
e mean) Difference
Mean = (4 + 6 + 6 + 8 + 10) / 5 = 34 / 5
= 6.8
4 4 - 6.8 = -2.8 (-2.8)² = 7.84

Step 2: Find Squared Differences from the


Mean
Step 3: Add the Squared Differences
6 6 - 6.8 = -0.8 (-0.8)² = 0.64
Total = 7.84 + 0.64 + 0.64 + 1.44 + 10.24
= 20.8
6 6 - 6.8 = -0.8 (-0.8)² = 0.64
Step 4: Calculate Variance
8 8 - 6.8 = 1.2 (1.2)² = 1.44 Variance = 20.8 / 5 = 4.16

Step 5: Find Standard Deviation


10 10 - 6.8 = 3.2 (3.2)² = 10.24
Standard Deviation = √4.16 ≈ 2.04

You might also like