Introduction To Basic Statistics
Introduction To Basic Statistics
o Central tendency: These measures tell you where the "typical" value lies in your data. Examples
include:
o Variability (spread): These measures tell you how spread out your data is. Examples include:
o Standard deviation: How much, on average, each value deviates from the mean.
o Distribution: This describes the overall shape of your data. Is it symmetric? Bell-shaped? Skewed
towards one side? Histograms and boxplots are handy tools for visualizing this.
AR, Lecturer, Statistics Discipline, KU 7
Importances of Descriptive statistics
• Gain quick insights: Quickly grasp key features of your data
without drowning in individual points.
count of observations falling within each category. This is a simple and efficient way
• Histograms: These visualize the frequency distribution in a bar chart, where the bar
height represents the number of observations for each interval. Histograms are great
for understanding the overall shape of your data, like symmetry or skewness.
percentages, allowing for easier comparison across datasets with different sizes.
• From this table, you can see that most people visited the library between
5 and 16 times in the past year.
• Depends on data and question: Choice of measure depends on the data type
(numerical, categorical) and the research goal.
• Not infallible: It's just one aspect of data; understanding how individual
points deviate from the center is also crucial.
• Weaknesses:
• Sensitive to outliers: a single extreme value can significantly distort the mean.
• Not robust for skewed data: if more values lie on one side of the distribution, the
mean might not accurately reflect the "typical" value.
Example: Calculate the average annual growth rate of an investment that started
at $1000 and grew to $2500 in 5 years.
𝑛
values. 𝐻𝑀 = 1 1 1
+ +⋯+
𝑥1 𝑥2 𝑥𝑛
Example: You travel 100 km at 60 km/h for the first half of a journey and 50
km at 40 km/h for the second half. What is your average speed for the entire
trip?
2 2 240
Average speed = 1 1 = 2+3 = = 48
+ 5
60 40 120
Example: Calculate the average exam score in a class where grades have
different weights (e.g., final exam counts for 40%, quizzes for 30%).
• Investing: When calculating compound annual growth rate (CAGR) of an investment over
multiple years.
• Biology: When measuring cell growth or bacterial reproduction rates (assuming exponential
growth).
2. Harmonic Mean:
• Speed/Rate calculations: Finding the average speed if you travel at different speeds for
different durations.
• Machining: Determining the average cutting speed when using tools with different
diameters. AR, Lecturer, Statistics Discipline, KU 26
Different mean in different situation
3. Weighted Mean:
• Grading: Calculating a student's overall grade when different assessments have different weights.
• Surveys: Combining ratings from different groups with varying sample sizes.
• Meta-analysis: Combining results from multiple studies with different sample sizes and
methodologies.
4. Trimmed Mean:
• Pollution measurements: Calculating average air quality when occasional spikes might distort the
regular pattern.
• Sports statistics: Determining an athlete's "typical" performance by excluding their best and worst
scores.
AR, Lecturer, Statistics Discipline, KU 27
Median
The median, a vital measure of central tendency, stands apart from the mean by
focusing on the middlemost value in a sorted dataset, rather than the average. It
shines in various situations, especially when outliers or skewed data are present.
Imagine arranging your data like a number line, from least to greatest. The median is
the value that divides the data into two halves, with an equal number of data points
on either side. Here's how it works:
For even-numbered datasets: The median is the average of the two middle values.
𝑛+1
𝑚
=
2
𝑛
• is the median class
2
Understanding the Mode: Think of the mode as the "popular kid" in the data set.
It's the value that shows up the most often, regardless of where it falls within the
data's spread.
• For unordered data: Simply count the frequency of each value, and the mode is
the one with the highest count.
• For ordered data: Identify the value that repeats the most times.
o Useful for categorical data: Unlike the mean and median, it can be applied to categorical data where
ordering isn't meaningful.
o Highlights patterns: It can reveal dominant categories or preferences within the data.
o Not always unique: A dataset can have multiple modes, or even no mode at all, making it less
informative than the mean or median.
o Sensitive to sample size: A larger sample size is more likely to have a distinct mode, while smaller
samples might be misleading.
o Not representative of central tendency: The mode doesn't necessarily reflect the "typical"
value, especially in skewed or multimodal data.
• 𝛽2 = 3 Mesokurtic
• 𝛽2 < 3 Platykurtic
• 𝛽2 > 3 Leptokurtic
The gap between the largest and the The unit invariant relative measure of range
smallest value is called range. 𝐿−𝑆
Mathematically, 𝐶𝑅 = ∗ 100
𝐿+𝑆
𝑅 𝑥1 , 𝑥2 , … , 𝑥𝑛
= max 𝑥1 , 𝑥2 , … , 𝑥𝑛
− min 𝑥1 , 𝑥2 , … , 𝑥𝑛
Interpretation:
• Extremely influenced by outlier • 𝑪𝑹=0 means no relative variability
• 𝑪𝑹=1 means maximum relative variability
• Trimmed range: If outlier is present
then trimming 5%, 10% of the data Mostly used to compare two datasets
from the beginning or end.
• IQR(Inter Quartile Range): The
difference between the third and the
first quartile is IQR.
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝑛+1
• First quartile (Q1)=
4
𝑛+1
• Second quartile (Q2)=
2
3(𝑛+1)
• Third quartile (Q3) =
4
• Variance measures how far each When two sets of data has high
number in the set is from
the mean (average), and thus from variability in between then the SD
every other number in the set. doesn’t help much to understand the
∑ 𝑥 −𝜇 2
• For population, 𝜎 2 = 𝑖
2
𝑁 variability.
2 ∑ 𝑥𝑖 −𝑥ҧ
• For sample, 𝑠 =
𝑛−1 𝑠
• SD= 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝐶𝑉 = ∗ 100
𝑥ҧ
NB: if mean is zero it doesn’t work