100% found this document useful (1 vote)
482 views37 pages

Descriptive Statistics

Research Method Descriptive Statistics of population sampling

Uploaded by

Emmanuel Peters
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
482 views37 pages

Descriptive Statistics

Research Method Descriptive Statistics of population sampling

Uploaded by

Emmanuel Peters
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Chapter 3

Descriptive
Statistics
 Distinguish between measures of central
tendency, measures of variability, and
measures of shape
 Understand the meanings of mean, median,

mode, quartile, percentile, and range


 Compute mean, median, mode, percentile,

quartile, range, variance, standard deviation,


and mean absolute deviation
 Differentiate between sample and population
variance and standard deviation
 Understand the meaning of standard

deviation as it is applied by using the


empirical rule
 Understand box and whisker plots, skewness,

and kurtosis
 Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
 Common Measures of Location

◦ Mode
◦ Median
◦ Mean
◦ Percentiles
◦ Quartiles
 The most frequently occurring value in a data
set
 Applicable to all levels of data measurement

(nominal, ordinal, interval, and ratio)

 Bimodal -- Data sets that have two modes


 Multimodal -- Data sets that contain more

than two modes


 30, 32, 34, 34, 38, 39, 43,43,44,44,44, 47,49
 The mode is 44.
 There are more 44s

than any other value.


 Middle value in an ordered array of numbers.
 Applicable for ordinal, interval, and ratio data
 Not applicable for nominal data
 Unaffected by extremely large and extremely

small values.
 First Procedure
◦ Arrange observations in an ordered array.
◦ If number of terms is odd, the median is the
middle term of the ordered array.
◦ If number of terms is even, the median is the
average of the middle two terms.

 Second Procedure
◦ The median’s position in an ordered array is
given by (n+1)/2.
Ordered Array includes:
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
22

 There are 17 terms in the ordered array.


 Position of median = (n+1)/2 = (17+1)/2 = 9
 The median is the 9th term, 15.
 If the 22 is replaced by 100, the median
remains at 15.
 If the 3 is replaced by -103, the median
remains at 15.
 Is the average of a group of numbers
 Applicable for interval and ratio data, not

applicable for nominal or ordinal data


 Affected by each value in the data set,

including extreme values


 Computed by summing all values in the data

set and dividing the sum by the number of


values in the data set
 X X X X  ...  X N
  1 2 3
N N
24  13  19  26  11

5
93

5
 18. 6
 X X X X  ...  X n
X  1 2 3

n n
57  86  42  38  90  66

6
379

6
 63.167
Measures of central tendency that divide a
group of data into four subgroups

 Q1: 25% of the data set is below the first


quartile
 Q2: 50% of the data set is below the second

quartile
 Q3: 75% of the data set is below the third

quartile
 Q1 is equal to the 25th percentile

 Q2 is located at 50th percentile and equals


the median

 Q3 is equal to the 75th percentile

Quartile values are not necessarily members


of the data set
Q1 Q2 Q3

25% 25% 25% 25%

©
 Ordered array: 106, 109, 114, 116, 121,
122, 125, 129

Q1: 25 109  114



i ( 8 )  2          Q 1   111 . 5
100 2
50 116  121
 Q2: i ( 8 )  4          Q 2   118 . 5
100 2
75 122  125
 Q3: i ( 8 )  6          Q 3   123 . 5
100 2
 Measures of variability describe the
spread or the dispersion of a set of data.
 Common Measures of Variability

◦ Range
◦ Interquartile Range
◦ Mean Absolute Deviation
◦ Variance
◦ Standard Deviation
◦ Z scores
◦ Coefficient of Variation
No Variability in Cash Flow Mean
Mean

Variability in Cash Flow Mean


Mean
 The difference between the largest and the
smallest values in a set of data
 Simple to compute
 Ignores all data points
except the
two extremes
 Example: 35, 37,38, 41,43,46,47,48
Range
Largest - Smallest
48 - 35 = 13
 Range of values between the first and third
quartiles
 Range of the “middle half”
 Less influenced by extremes

Interquartile   Range  Q3 Q1


 Data set: 5, 9, 16, 17, 18
 Mean:


 X

65
 13
N 5
 Deviations from the mean: -8, -4, 3, 4, 5
+5
+3 +4
-8 -4


 Average of the absolute deviations from the
mean

    X X  X  
M . A. D. 
 X
5 -8 +8 N
9 -4 +4 24
16 +3 +3 
17 +4 +4
5
18 +5 +5  4. 8
0 24
 Average of the squared deviations from the
arithmetic mean

    X X    X  
  X  
2
2


2
5 -8 64 
9 -4 16 N
16 +3 9 130

17 +4 16 5
18 +5 25  26.0
0 130
 Square root of the
variance
  X  
2

    X X    X   
2
2

N
5 -8 64 130
9 -4 16 
5
16 +3 9
 26.0
17 +4 16
18 +5 25

2
 
0 130
 26.0
 51
.
 Data are normally distributed (or
approximately normal)

Distance from Percentage of Values


the Mean Falling Within Distance

  1 68
  2 95
  3 99.7
 Average of the squared deviations from the
arithmetic mean

    X X  X XX  X  X 
2
2

2
2,398 625 390,625 S 
n 1
1,844 71 5,041
1,539 -234 54,756 663,866

1,311 -462 213,444 3
7,092 0 663,866  221,288.67
 X  X 
2
 Square root of the
sample variance
2
S 
    X X  X XX
n 1
2

663,866
2,398 625 390,625 
1,844 71 5,041
3
1,539 -234 54,756  221,288.67
1,311 -462 213,444 2
7,092 0 663,866 S S
 221,288.67
 470.41
 Ratio of the standard deviation to the mean,
expressed as a percentage
 Measurement of relative dispersion


C .V .   100 

1  29 2  84
 1
 4.6  2
 10
 
. .    100
CV 1
1
CV
. .
2
2
 100
1 2

4.6 10
  100   100
29 84
 1586
.  1190
.
 Skewness
◦ Absence of symmetry
◦ Extreme values in one side of a
distribution
Negatively Symmetric Positively
Skewed (Not Skewed) Skewed
Mean Mode Mean Mean
Mode
Median
Median Mode Median

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
 Summary measure for skewness

3   M d 
S 

 If S < 0, the distribution is negatively skewed
(skewed to the left).
 If S = 0, the distribution is symmetric (not

skewed).
 If S > 0, the distribution is positively skewed

(skewed to the right).


  23
1
 2
 26  3
 29

M  26
d1 M d2  26 Md3  26
  12.3
1  2
 12.3  3
 12.3

3   M  d1 
3 2  M d2  
3 3  M 
d3
S  S S
1
 
1
1
2
 2
3
 3

3 23  26 3 26  26 3 29  26


  
12.3 12.3 12.3
 0.73 0  0.73
 It determines if a relationship exists between
two or more quantitative variables.
 Sometimes such a relationships are useful in

prediction, or cause.
 There are many different correlation

coefficients, each applying to a particular


circumstances and each calculated by
different formulas
 Pearson coefficient is the most frequently

used correlation coefficient


 R=nxy-(x)(y) / [nx2 – (y)2][ny2-(y2)]
 Example
Student Variable X Variable Y
A 20 20
B 18 16
C 18 20
D 15 12
E 10 10
x=81
y=78
x2=1373
y2=1300
xy=1328
R= 5[1328] – (81)(78)/[5(1373)-812][5(1300)-78]
R=322/355.61=0.9

You might also like