Statistics and Its Types (v1.0)
Statistics and Its Types (v1.0)
Statistics
Statistics is the branch of mathematics that deals with collecting, organizing, analyzing, interpreting, and
presenting data. It helps in making informed decisions based on data.
Types of Statistics
Mean
Measure of Central
Median
Tendency
Types of Statistics
Mode
Descriptive
Statistics
Range
Inferential Statistics
Measure of
Variance
Variability
Standard Deviation
Page 1 of 6
XI – Computer Science (Federal Board) Version: 1.0
1. Descriptive Statistics:
Descriptive statistics is the process of summarizing, organizing, and presenting data so that the important
features can be easily understood. It provides a way to give an overview of the data, highlighting key points
such as averages or trends.
Example:
A teacher collected the test scores of 10 students in your class. The scores are:
Test Scores:
80,75, 88, 85,90, 91, 92, 95,83, 87
1. Summarizing the Data:
✓ Mean (Average):
The mean gives you a general sense of how the class performed overall. It is calculated
by adding up all the scores and dividing by the number of students.
80 + 75 + 88 + 85 + 90 + 91 + 92 + 95 + 83 + 87
Mean = = 86.6
10
The average score of the class is 86.6.
✓ Range:
The range shows how spread out the scores are. It is calculated as the difference between
the highest and lowest values in the dataset:
Range = 95 − 75 = 20
The range of scores is 20.
2. Organizing the Data:
✓ Order the Scores:
To better understand the distribution of the data, you can arrange the scores in ascending order:
75, 80, 83, 85, 87, 88, 90, 91, 92, 95
This organization helps identify trends more easily.
3. Presenting the Data:
✓ Visual Representation:
A line plot is used to display student
test scores, with the x-axis
representing student numbers and
the y-axis showing their scores. The
red dashed line highlights the mean
(86.5), the green dashed line
indicates the minimum score (75),
and the yellow dashed line
represents the maximum score (95).
This visual makes it simple to
understand how the scores are
distributed and their proximity to
the mean.
Page 2 of 6
XI – Computer Science (Federal Board) Version: 1.0
Definition: The mean is the sum of all the values in a dataset divided by the total number of values.
Formula:
𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
𝑀𝑒𝑎𝑛 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
Example:
Dataset: 5, 7, 10, 15
5 + 7 + 10 + 15 37
𝑀𝑒𝑎𝑛 = = = 9.25
4 4
Usage: Used when data values are evenly distributed without extreme outliers.
Analogy: Think of the mean as dividing a pizza (total sum) equally among everyone (number of values).
Definition: The median is the middle value of a dataset when the values are arranged in ascending or
descending order.
✓ If the number of values is odd, it’s the middle value.
✓ If even, it’s the average of the two middle values.
Example:
Odd dataset: 3, 7, 8, 9, 10 → Median = 8
7+8
Even dataset: 3, 7, 8, 9 → Median = 2
= 7.5
Usage: Best used when data contains outliers, as it isn’t affected by extreme values.
Analogy: Think of the median as the "middle seat" of a row; it's right in the center.
Definition: The mode is the value(s) that occur most frequently in a dataset.
✓ A dataset can have one mode (unimodal), multiple modes (multimodal), or no mode at all.
Example:
Dataset: 2, 3, 3, 5, 7, 8 → Mode = 3
Dataset: 1, 1, 2, 2, 3 → Modes = 1 and 2 (bimodal)
Usage: Useful for categorical data or datasets where repetition is significant.
Analogy: Think of the mode as the "most popular choice" in a survey.
Summary Table:
Real-World Applications:
1.2.1. Range
Definition: The difference between the maximum and minimum values in a dataset.
Formula:
𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒
Example:
Dataset: 5, 10, 15, 20
𝑅𝑎𝑛𝑔𝑒 = 20 − 5 = 15
Usage: Simple way to understand the total spread but sensitive to outliers.
Analogy: Think of it as the difference between the tallest and shortest person in a group.
1.2.2. Variance
Definition: The average of the squared differences between each data point and the mean. It measures how
far data points spread around the mean.
Key Features:
✓ It gives a measure of how spread out the data points are.
✓ Since differences are squared, the result is in squared units of the original data.
Formula:
∑(𝑥𝑖 − 𝑥̅ )2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑛
Where 𝑥𝑖 = data points, 𝑥̅ = mean, 𝑛 = number of data points.
Example:
𝐷𝑎𝑡𝑎𝑠𝑒𝑡: 2, 4, 6 (𝑚𝑒𝑎𝑛 = 4)
(2 − 4) + (4 − 4)2 + (6 − 4)2 4 + 0 + 4
2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = = 2.67
3 3
Usage: Often used in statistical analysis but not directly interpretable due to squaring.
Limitation: Hard to interpret because it is not in the same unit as the data.
Definition: The square root of the variance. It represents the average distance of data points from the mean.
Key Features:
✓ Represents the average distance of each data point from the mean.
✓ Has the same units as the original data, making it easier to interpret.
Formula:
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Page 4 of 6
XI – Computer Science (Federal Board) Version: 1.0
Example:
From the above variance example:
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = √2.67 ≈ 1.63
Interpretation:
If the standard deviation is small, data points are closer to the mean; if large, they are spread out.
Usage: Commonly used for comparing the spread of datasets.
2. Inferential Statistics
Inferential statistics is a branch of statistics that uses a sample of data to make predictions,
inferences, or generalizations about a larger population. Instead of examining an entire population, it
relies on analyzing a subset (sample) to draw conclusions.
Key Features of Inferential Statistics
1. Sampling:
✓ Uses a small, representative sample to infer characteristics of the entire population.
✓ Example: Surveying 1,000 people to predict election results for millions.
2. Generalization:
✓ Generalizes findings from the sample to the population, acknowledging uncertainty.
3. Uncertainty and Probability:
✓ Inferences are never 100% certain and are accompanied by confidence levels or margins
of error.
4. Testing Hypotheses:
✓ Often used to test specific hypotheses about populations, like determining whether a new
drug is effective.
Example
• Scenario: A company wants to know if their new product improves customer satisfaction.
• Approach:
✓ Collect a sample of customer feedback scores (e.g., 100 customers).
✓ Use inferential techniques to determine if the average satisfaction score significantly differs
from the old product.
✓ Infer results for the entire customer base based on the sample.
Analogy
Think of inferential statistics as reading a recipe by sampling a spoonful of soup:
✓ Instead of tasting the whole pot, you sample a small spoonful to infer if the soup is well-seasoned
or needs adjustments.
Page 5 of 6
XI – Computer Science (Federal Board) Version: 1.0
# Create a plot with student numbers on the x-axis and test scores on the y-axis
plt.plot(student_numbers, scores, 'bo-', label='Test Scores', markersize=6)
Page 6 of 6