Chapter 1 Describing Data
Chapter 1 Describing Data
Chapter 1 Describing Data
DESCRIPTIVE STATISTICS
Raw data - Data recorded in the sequence in which there are collected and before they are processed or ranked. Array data - Raw data that is arranged in ascending or descending order.
Construct a frequency distribution table for these data with their relative frequency and percentage.
0.38*100 = 38
0.38*100 = 38
0.38*100 = 38
Figur e 2.2
Figure 2.4: Number of students at Diversity College who are immigrants, by last country of permanent residence
Figure 2.8
Relative Frequency
0.27 0.18 0.14 0.14 0.11 0.08 0.08 1.00
Angle Size
360o
Relative Frequency
0.27 0.18 0.14 0.14 0.11 0.08 0.08 1.00
Angle Size
360*0.27=97.2O
Relative Frequency
0.27 0.18 0.14 0.14 0.11 0.08 0.08 1.00
Angle Size
360*0.27=97.2O 360*0.18=64.8O
Relative Frequency
0.27 0.18 0.14 0.14 0.11 0.08 0.08 1.00
Angle Size
360*0.27=97.2O 360*0.18=64.8O 360*0.14=50.4O 360*0.14=50.4O 360*0.11=39.6O 360*0.08=28.8O 360*0.08=28.8O 360o
Figure 2.9
Example 9
A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings. Year 1990 1991 1992 1993 1994 Ridership (in millions) 88.0 85.0 75.7 76.6 75.4
The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.
a.Construct a frequency distribution table. b.Calculate the relative frequencies and percentages for all categories. c.Draw a pie chart for the percentage distribution.
6% 13% 25% Cash Check Credit Card Debit 25% Other 31%
Solution:
0 1 2 3 4 9 5 2 0 7 6 2 3 1 2 4 3 8 9 2 8
5 3 8 6 1 7 1 4
0 1
1 1 0 0
2 3 3 3 5 6 7 8 8 9 1 2 4 5 6 6
Class Width (class size) Class width = Upper boundary Lower boundary e.g. : Width of the first class = 600.5 400.5 = 200
e.g:
Midpoint of the 1st class = 401 + 600 = 500.5 2
Example 11 The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season
f = 30
i>
5. 1 2. 0 6 = 0.52 0.6
= 40
Frequency of that class Re lative frequency of a class = Sum of all frequency f = f Percentage = (Re lative Frequency) *100
total
= 40
1.000
Polygon
A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon. Example 13
12 10
Frequency
8 6 4 2 0
Cummulative Freq
Cummulative Freq
total
= 40
Amount ($) Number of Responses Cumulative frequency 0 99 100 199 200 299 300 399 400 499 500 999 2 2 6 9 4 2 25 23 2
Cumulative freq
C la ss B o u n d a rie s
Box-Plot Describe the analyze data graphically using 5 measurement: smallest value, first quartile (K1), second quartile (median or K2), third quartile (K3) and largest value.
For symmetry data
Smallest value
K1
Median
K3
Largest value
K1
Median
K3
Largest value
x =
N
x x=
n
x=
the sum of all values N = the population size n = the sample size, = the population mean x = the sample mean
Example 19 Find the median for the following data: 10 5 19 8 3 Solution: (1)
(2)
19
(3)
Determine the value of the median Therefore the median is located in third position of the data set. 3 5 8 10 19 Hence, the Median for above data = 8
3 19
15
8 +10 Median = = 9 2
Hence, the Median for the above data = 9 -The median gives the center of a histogram, with half of the data values to the left of (or, less than) the median and half to the right of (or, more than) the median. -The advantage of using the median is that it is not influenced by outliers.
74
73
Grouped Data
fx
N
fx n
Where
x=
fx
= 832
Thus, this mail-order company received an average of 16.64 orders per day during these 50 days
fx = fx x= n f
974 = 100 = 9.74
Thus, an average of times the people have been to the dentist in the last five years is 9.74
Where: n = the total frequency F = the total frequency before class median i = the class width
fm
Lm
= the frequency of the class median = the lower boundary of the class median
fm
= 12,
n F i Median = Lm + 2 fm 50 22 10 = 20.5 + 2 12 = 23
Thus, 50 people take less than 10.4375 times to see the dentist and another 50 people take more than 10.4375 times to see the dentist in the last five years
mode and the frequency of the class before the class mode
Lmo
= 10.5,
and i = 10
1 Mode = Lmo + 1 + 2
Figure 2.19
frequency
Figure 2.21: Mean, median, and mode for a histogram and frequency distribution curve skewed to the right
Population
=
2
( x)
N
2
=
s= s
2
Sample
s =
2
( x) x
2
n1
=35150
s2 =
( x) n n -1 5
35150=
( 390 )
5 1 = 1182 . 50
s = 1182.50 = 34.3875
Population
2 =
fx
( fx )
N
=
2
Sample
s2 =
fx
( fx )
n
s= s
n 1
s2 =
fx
( fx )
n
2
s = s 2 = 7.5820 = 2.75
50 50 1 = 7.5820 =
( 832 ) 14216
n 1
Thus, the standard deviation of the number of orders received at the office of this mail-order company during the past 50 days is 2.75.
s =
2
fx 2
( fx )
n
s = s2 = 13.9932 = 3.7407
Given mean and standard deviation of monthly salary for two groups of worker who are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the CV for every group and determine which group is more dispersed.
MEASURE OF POSITION
QUARTILE INTERQUARTILE RANGE
Depth of Q1 =
n +1 4
oThe 2nd quartiles median of a data set or Q2 oThe 3rd quartiles denoted as Q3
FORMULA
3( n + 1) Depth of Q 3 = 4
3 ( 11 +1 ) 3( n + 1) Depth of Q 3 = = = 9 4 4
IQR = Q3
Q1
n +1 12 1 + = = 3. 25 4 4 3 (12 + ) 1 3(n +) 1 = = 4 4
975 .
Q1 = 79.4 + 0.25 (79.9 79.4) = 79.525 Q3 = 98.0 + 0.75 (103.5 98.0) = 102.125 Therefore, IQR = Q3 Q1 = 102.125 79.525 = 22.6
n 4 - F Q1 = L Q1 + i f Q1
3n 4 -F Q3 = LQ 3+ i f Q3
Therefore,
n 50 = = = 5 12 4 4
Therefore,
Therefore,
3n F i Q3 = Lq 3 + 4 f q3 45 44 = 20.5 + 3 7 = 20.9286
MEASURE OF SKEWNESS
Sk =
EXERCISE
1. A student want to study a level of satisfaction toward a price of a product at Queen supermarket. She take a simple random of 100 customers and asked them whether they very satisfied, satisfied, not sure, not satisfied, or very not satisfied. State:
Population: All customers at Queen Supermarket Sample 100 customers at Queen Supermarket Variable satisfaction Type of variable Qualitative variable Data value
Very satisfied / satisfied / not sure, not satisfied / very not satisfied