Ch2 Outline
Ch2 Outline
This chapter begins our study of descriptive statistics. Recall from Chapter 1 that when using descriptive statistics we describe a set of data using methods that organize or summarize the data set. For example, descriptive statistics may be used to organize data to show the general shape of the data, where the data tends to concentrate, or to expose extreme or unusual data values. The first procedure we present for organizing and summarizing a set of data is the frequency table. Frequency Table" * grouping of ualitative data into mutually exclusive categories showing the number of observations in each category.
Recall from Chapter 1 that ualitative data is nonnumeric and can only be classified into distinct categories. There is no particular order to the categories. !xamples of ualitative variables include" brand of computer sold by #est #uy $%ewlett &ac'ard, (ateway, Toshiba, )ony, Compa or *veratec+, month of birth $,anuary,-.,.ecember+, or the ma/or airlines that fly out of a particular airport $0) *irways, .elta, 0nited, and Continental+. To illustrate construction of a fre uency table, suppose that the following data set reports the movie /ust seen for each person in a sample of 12 people exiting a local multiplex on ,une 11, 1334" 31, 35, 31, 31, 35, 36, 31, 32, 31, 31, 35, 31, 31, 35, 32, 31, 31, 31, 36, 31, 35, 32, 31, 31, 31 $317Cars, 317The .a 8inci Code, 367 9ver the %edge, 357The #rea':0p, 327 The 9men+ The fre uency table for this data set is shown below. The left column lists the classes for the ualitative variable ;movie /ust seen<. =ote that the classes are mutually exclusive since only one movie was seen by each person on that visit to the multiplex. The right column shows the number of people observed for each class or the class frequency. For example, the class fre uency for Cars is >. Movie Cars The .a 8inci Code 9ver The %edge The #rea':0p The 9men Number of People > ? 1 2 6
Class fre uencies can be converted to relative class frequencies in order to show the fraction of the total number of observations for each class. * fre uency table is converted to a relative frequency table by dividing each class fre uency by the total number of observations. For example, the relative fre uency for the number of movie patrons in the sample who /ust saw 9ver the %edge is .3?, found by dividing 1 by 12. The relative fre uency table for the movie data set is represented by the first and third columns in the following table" Movie Cars The .a 8inci Code 9ver the %edge The #rea':0p The 9men Number of People > ? 1 2 6 Relative Frequency 3.1? 3.61 3.3? 3.13 3.11
=ote that the sum of the relative fre uency column is always e ual to 1.3 , /ust as the sum of the fre uency column is always e ual to the total number of observations in the data set.
* relative fre uency bar chart for the movie data is also shown. The difference between the two charts is the scale of the vertical axis. The vertical axis for the relative fre uency bar chart represents the fraction of the total number of observations in each class.
Frequency
The pie chart is another common graph used to portray information for ualitative data. Pie Chart. * chart that shows the proportion or percent that each class represents of the total. *s an example, suppose that .anny<s Fun Center would li'e to determine the contribution to total revenue made by each of the center<s activities last year. The information is given in the table. Activity @aser Tag Ainiature (olf Raceway 8ideo (ames Food #atting Cages Total Revenue (in $1,000 ! 112 112 123 611.2 1>2 41.2 1123 Percent of Revenue 13 1? 13 12 11 2 133
Batting Cages ,$
To plot the pie chart, after drawing a circle $pie+ we put 3 on the top and go around the circle in increments of 2.To plot the percent
-ood 22$
)ace*a+ 20$
of total revenue earned by laser tag, draw a line from 3 to the center of the circle and another line from the center to 13. Then, 13 B 1? 7 1?. This slice represents the percentage of the total earned by miniature golf. This process is continued for the remaining items. The pie reveals that food, video games, raceway and miniature golf account for most of the revenue. 8ideo games is the biggest contributor, generating somewhat more revenue than food, raceway, and miniature golf. @aser tag and batting cages contribute much smaller portions of the revenue. &ie charts can also be used to display relative fre uency data. For example, relative fre uencies for the movie data, expressed as percentages, are shown in the pie chart below.
The O en 12$
Cars 28$
5 61 13 4 4 2 4 1. 6. 5. .etermine the class interval or width. )et the individual class limits. Tally the observations into the appropriate classes.
6 ? 1
1 5 6
13 ? 6
4 5 >
2.
Count the number of tallies $items+ in each class. *s an example, the lengths of service, in years, of a sample of seventeen employees are given above. The seventeen observations are referred to as ra# $ata or un%roupe$ $ata. To organize the lengths of service into a fre uency distribution"
Frequency Distri ution !en"ths of &allies 'u( er of service e(ployees
1 up to 6 years 6 up to 2 years
HH HHHHHH
1 4
HHHHH HHH H
2 6 1 1>
%ow many classes should there beE * common guideline is from 2 to 12. %aving too few or too many classes gives little insight into the data. * rule for determining the number of classes is shown on the next page. The cla interval is the size or width of the class. The class interval may be a value such as 6, 2, 13, 12, 13, 23, 133, 1,333, and so on. Aore formally, the class interval can be defined as follows" Class Interval" The difference between the limits of two consecutive classes . The class interval can be approximated by text formula F1:1G highest value lowest value H L Class Cnterval$i+ or i number of classes k Dhere" i is the class interval. H is the highest observed value. L is the lowest observed value. k is the number of classes. Cf we apply the formula to our example, then H 7 13, L 7 1, and k 7 2. De get a class interval of 1, found by" i 13 1 ? 1.4 which is rounded to 1. 2 2
Class Interval
[ 1 1]
!ach class has a lower class limit and an upper class limit. The lower limit of the first class is usually slightly below the smallest value in the data set and, if possible, is a multiple of the class interval.
Cn the previous example, the smallest number of years of service is 1. De selected 1, which is slightly below 1, as the lower limit of the first class. The lower limit of the second class is 6 years, and so on. The number of tallies or observations that occurs in each class is called the class frequency. Class frequency" The number of observations in each class.
Cn the example, the class fre uency of the lowest class is 1. For the next higher class it is 4. The class midpoint divides a class into two e ual parts. Class Midpoint" The point halfway between the lower limits of two
consecutive classes.
The class midpoint is computed by adding the lower limit of consecutive classes and dividing the result by two. Cn the example, the class midpoint of the 2 up to > class is 4 found by $2 B >+H1. The class interval is the distance between the lower limit of two consecutive classes. Ct is 1, found by subtracting 1 $the lower limit of the first class+ from 6 $the lower limit of the second class+.
1.
6.
Your professional judgment can determine the number of classes . Too many classes or too few classes might not reveal the basic shape of the distribution. * general rule is that it is best to use at least 2 and not more than 12 classes when constructing a fre uency distribution. The 2 to the k rule is also used to determine the number of classes. To estimate the number of classes we select the smallest integer $whole number+ such that 1 k n where n is the total number of observations. )uppose a set of data has 43 observations. Cf we try k !" we get 1 2 = 61 , which is less than 43, so we try 1 4 = 45 , which is
2 to the k Rule for 'u( er of Classes &otal 'u( er Reco((ended of 'u( er of O servations Classes
5.
J J J J J J J
5 2 4 > ? I 13
greater than 43. Thus the recommended number of classes is 4. The table is based on the 2 to the k rule# 2. The lo$er limit of the first class should be an even multiple of the class interval . )uppose a sample of weight losses ranged from 12 pounds to 45 pounds. De want to organize the weight losses into a fre uency distribution with an interval of 4 pounds. The lower limit of the first class would be 15, found by multiplying 5, the even multiple, by 4, the class interval. 9bviously this suggestion was not followed in the above example for length of service. Keep in mind that these are only suggestions not rules. %void overlapping stated class limits. Class limits such as 5:4 and 4:? should not be used. 0se 5 up to 4, then 4 up to ?. This way you can determine in which class to tally 4. Try to avoid open:ended classes. 9pen:ended classes cause serious graphing problems and ma'e it difficult to calculate various measures described in Chapter 6.
4. >.
Found y
1 4 2 6 1 1>
)isto"ra(
The simplest type of a statistical chart is called a histo ram.
!isto ram" * graph in which the classes are mar'ed on the horizontal axis and the class fre uencies on the vertical axis. The class fre uencies are represented by the heights of the bars and the bars are drawn ad/acent to each other. For the length of service for the sample of seventeen employees a histogram would appear as shown on the right. =ote that to plot the )isto"ra( bar for the 2 up to > years $which has a midpoint of 4 years+, we drew lines / 6 vertically from 2 and from > years to 2 , employees on the Y:axis and then 4 connected the end points by a straight line. " 2 The histogram provides an easily 1 interpreted visual representation of a 0 fre uency distribution. 1 0 " " 0 , , 0 / / 0 1 1 0 11
Frequency
Frequency Poly"on
!en"th of #ervice
* second type of chart used to portray a fre uency distribution is the frequency poly on. Frequency Poly on" * graph that consists of line segments connecting the points formed by the intersection of the class midpoints and the class fre uency. For the fre uency polygon, the assumption is that the observations in any class interval are represented by the class midpoint. * dot is placed at the class midpoint opposite the number of fre uencies in that class. For the distribution of years of service, ma'e the first plot by selecting 1 years on the &:axis $the midpoint+ and then go vertically on the Y:axis to 1 and place a dot. This process is continued for all classes. Then connect the dots in order. =ormal practice is to anchor the fre uency polygon to the &:axis. This is accomplished by extending the lines to the midpoint of the class below the lowest class $3+ and to the midpoint of the class above the highest class $11+.
Frequency Poly"on
'u( er of +(ployees / 6 , 4 " 2 1 0 0 2 4 6 8 10 12 *ears of #ervice
* cumulative fre uency distribution is graphically portrayed in a cumulative frequency poly on. * cumulative frequency poly on reports the number and percent of observations that are less than a given value. Cumulative Frequency Poly on" * graph that consists of line segments
connecting the points formed by the intersection of the class endpoints and the class cumulative fre uency.
Cu(ulative Frequency Distri ution #efore we can draw a cumulative fre uency polygon, we must !en"th of Class Cu(ulative Found convert the fre uency distribution service $in years% Frequency Frequency y to a cumulative fre uency 1 up to 6 years 1 1 1 distribution. To construct a 6 up to 2 years 4 ? 1B4 cumulative fre uency distribution, 2 up to > years 2 16 ?B2 we add the fre uencies from the > up to I years 6 14 16 B 6 lowest class to the fre uency of the I up to 11 years 1 1> 14 B 1 next highest class. De add this sum to the fre uency of the next class, etc.
The cumulative fre uencies are plotted on the vertical axis $ Y:axis+ and the lengths of service on the &:axis. Ct may be helpful to plot the cumulative fre uencies on the left side of the vertical axis and the percent of the total on the right side as shown in the 100 16 polygon above. 10 14 80 12 /0 60 10 ,0 8 40 6 "0 4 20 2 10 0 0 1 " , / 1 11
'u( er of +(ployees !en"th of #ervice Percent of +(ployees