0% found this document useful (0 votes)
44 views43 pages

Lecture 2-3 Data Analysis Location & Dispression

This document discusses measures of central tendency and dispersion used in data analysis. It defines central tendency as measuring how clustered data is around the mean, and dispersion as measuring variability around the central tendency. Measures of central tendency covered include the arithmetic mean, weighted mean, geometric mean, harmonic mean, and median. Measures of dispersion include range, interquartile range, variance, standard deviation, z-scores, and coefficient of variation. The document also discusses skewness, kurtosis, and the empirical rule for standard deviations.

Uploaded by

Shahadat Hossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views43 pages

Lecture 2-3 Data Analysis Location & Dispression

This document discusses measures of central tendency and dispersion used in data analysis. It defines central tendency as measuring how clustered data is around the mean, and dispersion as measuring variability around the central tendency. Measures of central tendency covered include the arithmetic mean, weighted mean, geometric mean, harmonic mean, and median. Measures of dispersion include range, interquartile range, variance, standard deviation, z-scores, and coefficient of variation. The document also discusses skewness, kurtosis, and the empirical rule for standard deviations.

Uploaded by

Shahadat Hossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Data Analysis

Central Tendency (Center) and


Dispersion (Variability)

⮚ Central tendency: measures the degree to which


scores are clustered around the mean of a
distribution

⮚ Dispersion: measures the fluctuations (variability)


around the characteristics of central tendency
Measures of Center
• A measure along the horizontal axis of
the data that locates the center of the
distribution.
Arithmetic Mean or Average
• The mean of a set of measurements is
the sum of the measurements divided
by the total number of measurements.

where n = number of measurements


Example
•The set: 2, 9, 11, 5, 6

If we were able to enumerate the whole


population, the population mean would be
called μ (the Greek letter “mu”).
Example:
⚫ Resistance of 5 coils:
3.35, 3.37, 3.28, 3.34, 3.30 ohm.
⚫ The average:
Weighted Mean
⚫ The Weighted mean of the positive real numbers
x1,x2, ..., xn with their weight w1,w2, ..., wn is defined to
be

Example
Geometric Mean
⚫ Geometric mean is defined as the positive root of the
product of observations. Symbolically,

⚫ It is also often used for a set of numbers whose values are


are exponential in nature, such as data on the growth of the
human population or interest rates of a financial
investment.

⚫ Find geometric mean of rate of growth: 34, 27, 45, 55, 22, 34
Harmonic Mean
⚫ The harmonic mean is the number of variables divided
by the sum of the reciprocals of the variables.

⚫ Useful for ratios such as speed (=distance/time) etc.

⚫ Exercise: Find the the harmonic mean of 1, 2, and 4


Median
• The median of a set of observations is
the middle measurement when the
observations are ranked from smallest
to largest or smallest to largest.
• The position of the median is

(n + 1)/2

once the measurements have been


ordered.
Example
⚫ The set : 2, 4, 9, 8, 6, 5, 3 n=7
⚫ Sort : 2, 3, 4, 5, 6, 8, 9
⚫ Position: .5(n + 1) = .5(7 + 1) = 4th
Median = 4th largest measurement

• The set: 2, 4, 9, 8, 6, 5 n=6


• Sort: 2, 4, 5, 6, 8, 9
• Position: .5(n + 1) = .5(6 + 1) = 3.5th
Median = (5 + 6)/2 = 5.5 — average of the 3rd and 4th
measurements
Mode
• The mode is the measurement which occurs
most frequently.
• The set: 2, 4, 9, 8, 8, 5, 3
• The mode is 8, which occurs twice
• The set: 2, 2, 9, 8, 8, 5, 3
• There are two modes—8 and 2 (bimodal)
• The set: 2, 4, 9, 8, 5, 3
• There is no mode (each value is unique).
Example
The number of quarts of milk purchased by 25
households:
0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3
3 3 3 4 4 4 5
⚫ Mean?

⚫ Median?

⚫ Mode? (Highest peak)


Extreme Values
⚫ The mean is more easily affected by extremely
large or small values than the median.

• The median is often used as a measure of


center when the distribution is skewed.
Extreme Values

Symmetric: Mean = Median

Skewed right: Mean > Median

Skewed left: Mean < Median


Measures of Variability
• A measure along the horizontal axis of the data distribution
that describes the spread of the distribution from the center.

● Range
✔ Difference between maximum and minimum values
● Interquartile Range
✔ Difference between third and first quartile (Q3 - Q1)
● Variance
✔ Average of the squared deviations from the mean
● Standard Deviation
✔ Square root of the variance
Variability

Variabilit
y

No
Variability
The Range
• The range, R, of a set of n measurements is the
difference between the largest and smallest
measurements.
• Example: A botanist records the number of
petals on 5 flowers:
5, 12, 6, 8, 14
• The range is R = 14 – 5 = 9.
Quartiles

Q Q Q
1 2 3

25 25 25 25
% % % %
Percentile
50th Percentile ≡ Median (Q2)
25th Percentile ≡ Lower Quartile (Q1)
75th Percentile
≡ Upper Quartile (Q3)

Interquartile Range:
IQR=Q3 – Q1
• The position of p-th percentile is 0.p(n + 1)

• The position of Q1 is 0.25(n + 1)

•The position of Q3 is 0.75(n + 1)

once the measurements have been ordered.


Example
The prices ($) of 18 brands of walking shoes:
40 60 65 65 65 68 68 70 70
70 70 70 70 74 75 75 90 95

Position of Q1 = 0.25(18 + 1) = 4.75


Position of Q3 = 0.75(18 + 1) = 14.25

✔ Q1is 3/4 of the way between the 4th and 5th ordered
measurements, or Q1 = 65 + 0.75(65 - 65) = 65.
Example
The prices ($) of 18 brands of walking shoes:
40 60 65 65 65 68 68 70 70
70 70 70 70 74 75 75 90 95

Position of Q1 = 0.25(18 + 1) = 4.75


Position of Q3 = 0.75(18 + 1) = 14.25

✔Q3 is 1/4 of the way between the 14th and 15th ordered
measurements, or
Q3 = 74 + .25(75 - 74) = 74.25
✔and
IQR = Q3 – Q1 = 74.25 - 65 = 9.25
90-th percentile P90
⚫The position of 90-th percentile is
0.9(18 + 1)=17.1

The prices ($) of 18 brands of walking shoes:


40 60 65 65 65 68 68 70 70
70 70 70 70 74 75 75 90 95

P90 = 90 + .10 (95-90) = 90.5


The Variance
• The variance is measure of variability that uses
all the measurements. It measures the average
deviation of the measurements about their
mean.
• Flower petals: 5, 12, 6, 8, 14

4 6 8 10 12 14
The Variance
• The variance of a population of N measurements
is the average of the squared deviations of the
measurements about their mean μ.

• The variance of a sample of n measurements is the sum


of the squared deviations of the measurements about
their mean, divided by (n – 1).
The Standard Deviation
• In calculating the variance, we squared all of the
deviations, and in doing so changed the scale of
the measurements.
• To return this measure of variability to the
original units of measure, we calculate the
standard deviation, the positive square root of
the variance.
Two Ways to Calculate the Sample
Variance Use the Definition Formula:

5 -4 16
12 3 9
6 -3 9
8 -1 1
14 5 25
Sum 45 0 60
Two Ways to Calculate the Sample
Variance
Use the calculation formula:

5 25
12 144
6 36
8 64
14 196
Sum 45 465
Example- ungrouped data
⚫ Sample: Moisture content (%) of kraft paper are:
6.7, 6.0, 6.4, 6.4, 5.9, and 5.8.

⚫ Sample standard deviation, s = 0.35


Using Measures of Center and Spread:
The Empirical Rule
Given a distribution of measurements

The interval μ ± σ contains approximately 68% of the


measurements.
✔The interval μ ± 2σ contains approximately 95% of the
measurements.
✔ The interval μ ± 3σ contains approximately 99.7% of
the measurements.
The Empirical Rule: An Example
Measures of Relative Standing
• Where does one particular measurement stand in
relation to the other measurements in the data set?
• How many standard deviations away from the
mean does the measurement lie? This is measured
by the z-score.

Suppose s = 2. s
4
s s

x = 9 lies z =2 std dev from the mean.


z-Scores
• z-scores between –2 and 2 are not unusual. z-scores
should not be more than 3 in absolute value. z-scores
larger than 3 in absolute value would indicate a
possible outlier.

Outlier Not unusual Outlier


z
-3 -2 -1 0 1 2 3
Somewhat unusual
Example of z-Scores
X z-Score X z-Score
10 -1.28244 10 -0.29204
15 0.625954 500 3.473714
10 -1.28244 10 -0.29204
16 1.007634 16 -0.24593
11 -0.90076 11 -0.28435
17 1.389313 17 -0.23824
14 0.244275 14 -0.2613
13 -0.1374 13 -0.26898
10 -1.28244 10 -0.29204
16 1.007634 16 -0.24593
11 -0.90076 11 -0.28435
17 1.389313 17 -0.23824
14 0.244275 14 -0.2613
13 -0.1374 13 -0.26898
Coefficient of Variation(CV)
⚫ When comparing between data sets with different units
or widely different means, one should use the
coefficient of variation for comparison instead of the
standard deviation.
⚫ The Coefficient of Variation can be written as

⚫ We express CV as a percentage by multiplying 100


⚫ Example: Page 181
Skewness
⚫ Skewness measures the degree of asymmetry exhibited
by the data

⚫ The data can exhibits +ve skewness or –ve skewness

⚫ If the mean of the data is greater than its median, the


data is positively skewed; and if the mean of the data is
less than its median, the data is negatively skewed

⚫ Mathematically,

37
Skewness

Mea Mod Mean Mea


Mod
n e Media n
e
Media n Media
n Mode n
Negatively Symmetric Positively
Skewed (Not Skewed) Skewed
Kurtosis
⚫ Kurtosis measure the peaking of the data relative to the
normal distribution

⚫ Data with high degree of peakeness is said to be


leptokurtic and have the kaurtosis value more than 3

⚫ Flat data has the kurtosis value of less than 3, and it is


called platykurtic

⚫ Mathematically,

39
Kurtosis
⚫Peakedness of a distribution
⚫ Leptokurtic: high and thin
⚫ Mesokurtic: normal in shape
⚫ Platykurtic: flat and spread out

Leptokurti
c
Mesokurtic
Platykurti
c
Skewness and Kurtosis

41
42, 53, 68, 66, 72, 74, 99, 69, 49, 50, 41, 76, 98, 77, 79, 60, 84, 80, 90, 52, 82, 50, 79, 84, 81,
85, 67, 79, 76, 96, 43, 65, 54, 42, 51, 61, 78, 73, 64, 86, 75, 77, 59, 69, 78, 83, 56, 81, 70, 94, N = 𝝨fi = f1 + f2 + f3 + f4 + f5 + f6
63, 95, 99, 80, 71
= 5 + 8 + 10 + 15 + 10 + 7 = 55
CI Mid-Value Tally Marks Frequency Cumulative
(x) (f) Frequency
Definitions
40 - 50 45 IIII 5 5
L = lower limit of the median class
50 - 60 55 IIII III 8 13 h = Magnitude of the median class
fm= Frequency of the median class
60 - 70 65 IIII IIII 10 23 c = cumulative frequency of the
premedian class
70 - 80 75 IIII IIII IIII 15 38
f1 = Frequency of the modal class
80 - 90 85 IIII IIII 10 48 f0= “ “ “ premodal class
f2= “ “ “ post “ “
90 - 100 95 IIII II 7 55 L = lower limit of the modal class
h = Magnitude of the modal class
n
Mean = 1/N i=1
𝚺 fix1 = x = 1/N [f1x1+ f2x2+...fnxn] = 71.91

Median = L + h/fm (N/2 - c)

N/2 = 55/2 = 27.5 h (f1 - f0)


Mode = L +
h = 10 Median = 70 + 10/15 (27.5 - 23) (f1 - f0) - (f2 - f1 )
c = 23 = 70 + 10/15 × 4.5 = 73 = 70 + 10 (15 - 10) = 75
fm= 15 (15 - 10) - (10 - 15)

You might also like