0% found this document useful (0 votes)
46 views9 pages

Ch2 Outline

This document outlines how to construct frequency distributions to summarize quantitative data. It discusses grouping data into classes and counting the number of observations in each class. The class interval, limits, midpoints and frequencies are defined. Guidelines are provided for determining the number of classes and ensuring equal class intervals.

Uploaded by

Mohamed Arafa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views9 pages

Ch2 Outline

This document outlines how to construct frequency distributions to summarize quantitative data. It discusses grouping data into classes and counting the number of observations in each class. The class interval, limits, midpoints and frequencies are defined. Guidelines are provided for determining the number of classes and ensuring equal class intervals.

Uploaded by

Mohamed Arafa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter 2 Outline Introduction

This chapter begins our study of descriptive statistics. Recall from Chapter 1 that when using descriptive statistics we describe a set of data using methods that organize or summarize the data set. For example, descriptive statistics may be used to organize data to show the general shape of the data, where the data tends to concentrate, or to expose extreme or unusual data values. The first procedure we present for organizing and summarizing a set of data is the frequency table. Frequency Table" * grouping of ualitative data into mutually exclusive categories showing the number of observations in each category.

Recall from Chapter 1 that ualitative data is nonnumeric and can only be classified into distinct categories. There is no particular order to the categories. !xamples of ualitative variables include" brand of computer sold by #est #uy $%ewlett &ac'ard, (ateway, Toshiba, )ony, Compa or *veratec+, month of birth $,anuary,-.,.ecember+, or the ma/or airlines that fly out of a particular airport $0) *irways, .elta, 0nited, and Continental+. To illustrate construction of a fre uency table, suppose that the following data set reports the movie /ust seen for each person in a sample of 12 people exiting a local multiplex on ,une 11, 1334" 31, 35, 31, 31, 35, 36, 31, 32, 31, 31, 35, 31, 31, 35, 32, 31, 31, 31, 36, 31, 35, 32, 31, 31, 31 $317Cars, 317The .a 8inci Code, 367 9ver the %edge, 357The #rea':0p, 327 The 9men+ The fre uency table for this data set is shown below. The left column lists the classes for the ualitative variable ;movie /ust seen<. =ote that the classes are mutually exclusive since only one movie was seen by each person on that visit to the multiplex. The right column shows the number of people observed for each class or the class frequency. For example, the class fre uency for Cars is >. Movie Cars The .a 8inci Code 9ver The %edge The #rea':0p The 9men Number of People > ? 1 2 6

Class fre uencies can be converted to relative class frequencies in order to show the fraction of the total number of observations for each class. * fre uency table is converted to a relative frequency table by dividing each class fre uency by the total number of observations. For example, the relative fre uency for the number of movie patrons in the sample who /ust saw 9ver the %edge is .3?, found by dividing 1 by 12. The relative fre uency table for the movie data set is represented by the first and third columns in the following table" Movie Cars The .a 8inci Code 9ver the %edge The #rea':0p The 9men Number of People > ? 1 2 6 Relative Frequency 3.1? 3.61 3.3? 3.13 3.11

=ote that the sum of the relative fre uency column is always e ual to 1.3 , /ust as the sum of the fre uency column is always e ual to the total number of observations in the data set.

Graphic Presentation of Qualitative Data


* bar chart is the most often used graph for presenting ualitative data. Typically, the horizontal axis is used to display the different classes of the variable and the vertical axis shows the fre uency or relative fre uency for each class. The height of the bar represents the fre uency $or relative fre uency+ for each class. The bars are of uniform width and there is a gap between ad/acent bars. Bar Chart. * graph in which the classes are reported on the horizontal axis and the class fre uencies on the vertical axis. The class fre uencies are proportional to the heights of the bars. The fre uency bar chart for the movie data is shown below"
Frequency Bar Chart
10 8 6 4 2 0 Cars Da Vinci Code Over the The Hedge Break-up Movie The O en

* relative fre uency bar chart for the movie data is also shown. The difference between the two charts is the scale of the vertical axis. The vertical axis for the relative fre uency bar chart represents the fraction of the total number of observations in each class.

Frequency

Relative Frequency Bar Chart


Relative Frequency 0!4 0!" 0!2 0!1 0 Cars Da Vinci Code Over the The Hedge Break-up Movie The O en

The pie chart is another common graph used to portray information for ualitative data. Pie Chart. * chart that shows the proportion or percent that each class represents of the total. *s an example, suppose that .anny<s Fun Center would li'e to determine the contribution to total revenue made by each of the center<s activities last year. The information is given in the table. Activity @aser Tag Ainiature (olf Raceway 8ideo (ames Food #atting Cages Total Revenue (in $1,000 ! 112 112 123 611.2 1>2 41.2 1123 Percent of Revenue 13 1? 13 12 11 2 133

Batting Cages ,$

To plot the pie chart, after drawing a circle $pie+ we put 3 on the top and go around the circle in increments of 2.To plot the percent

-ood 22$

#aser Tag 10$ %iniature &o'( 18$ "enter Revenue

Pie "hart of Fun


Video &a es 2,$

)ace*a+ 20$

of total revenue earned by laser tag, draw a line from 3 to the center of the circle and another line from the center to 13. Then, 13 B 1? 7 1?. This slice represents the percentage of the total earned by miniature golf. This process is continued for the remaining items. The pie reveals that food, video games, raceway and miniature golf account for most of the revenue. 8ideo games is the biggest contributor, generating somewhat more revenue than food, raceway, and miniature golf. @aser tag and batting cages contribute much smaller portions of the revenue. &ie charts can also be used to display relative fre uency data. For example, relative fre uencies for the movie data, expressed as percentages, are shown in the pie chart below.

The O en 12$

Cars 28$

The Break.p 20$

Over the Hedge 8$

The Da Vinci Code "2$

Frequency Distri utions for Quantitative Data


* frequency distribution is a useful statistical tool for organizing a set of uantitative data in various ways. Ct can be used to show the shape of the data, where the data is concentrated, and to detect extreme values. Frequency Distribution" * grouping of data into mutually exclusive classes showing the number of observations in each. The steps to follow in developing a fre uency distribution are" 1. .ecide on the number of classes.
!en"th of #ervice $in years%

5 61 13 4 4 2 4 1. 6. 5. .etermine the class interval or width. )et the individual class limits. Tally the observations into the appropriate classes.

6 ? 1

1 5 6

13 ? 6

4 5 >

2.

Count the number of tallies $items+ in each class. *s an example, the lengths of service, in years, of a sample of seventeen employees are given above. The seventeen observations are referred to as ra# $ata or un%roupe$ $ata. To organize the lengths of service into a fre uency distribution"
Frequency Distri ution !en"ths of &allies 'u( er of service e(ployees

1 up to 6 years 6 up to 2 years

HH HHHHHH

1 4

2 up to > years > up to I years I up to 11 yrs. Total


1. 1. 6. 5. 2. De decide to have five classes. De used a class width of 1. De used classes 1 up to 6, 6 up to 2, and so on. &ally the lengths of service into the appropriate classes. Count the number of tallies in each class as shown.

HHHHH HHH H

2 6 1 1>

%ow many classes should there beE * common guideline is from 2 to 12. %aving too few or too many classes gives little insight into the data. * rule for determining the number of classes is shown on the next page. The cla interval is the size or width of the class. The class interval may be a value such as 6, 2, 13, 12, 13, 23, 133, 1,333, and so on. Aore formally, the class interval can be defined as follows" Class Interval" The difference between the limits of two consecutive classes . The class interval can be approximated by text formula F1:1G highest value lowest value H L Class Cnterval$i+ or i number of classes k Dhere" i is the class interval. H is the highest observed value. L is the lowest observed value. k is the number of classes. Cf we apply the formula to our example, then H 7 13, L 7 1, and k 7 2. De get a class interval of 1, found by" i 13 1 ? 1.4 which is rounded to 1. 2 2
Class Interval

[ 1 1]

!ach class has a lower class limit and an upper class limit. The lower limit of the first class is usually slightly below the smallest value in the data set and, if possible, is a multiple of the class interval.

Cn the previous example, the smallest number of years of service is 1. De selected 1, which is slightly below 1, as the lower limit of the first class. The lower limit of the second class is 6 years, and so on. The number of tallies or observations that occurs in each class is called the class frequency. Class frequency" The number of observations in each class.

Cn the example, the class fre uency of the lowest class is 1. For the next higher class it is 4. The class midpoint divides a class into two e ual parts. Class Midpoint" The point halfway between the lower limits of two

consecutive classes.
The class midpoint is computed by adding the lower limit of consecutive classes and dividing the result by two. Cn the example, the class midpoint of the 2 up to > class is 4 found by $2 B >+H1. The class interval is the distance between the lower limit of two consecutive classes. Ct is 1, found by subtracting 1 $the lower limit of the first class+ from 6 $the lower limit of the second class+.

#u""estions on Constructin" Frequency Distri utions


Dhen constructing fre uency distributions, follow these guidelines" 1. The class intervals used in the frequency distribution should be equal . 0ne ual class intervals present problems in graphically portraying the distribution. %owever, in some situations une ual class intervals may be necessary in order to avoid a large number of empty classes. Text formula F1:1G is based on the number of classes, and is useful for determining the class interval.
Class Cnterval $i + highest value lowest value H L or i number of classes k F1 1G

1.

6.

Your professional judgment can determine the number of classes . Too many classes or too few classes might not reveal the basic shape of the distribution. * general rule is that it is best to use at least 2 and not more than 12 classes when constructing a fre uency distribution. The 2 to the k rule is also used to determine the number of classes. To estimate the number of classes we select the smallest integer $whole number+ such that 1 k n where n is the total number of observations. )uppose a set of data has 43 observations. Cf we try k !" we get 1 2 = 61 , which is less than 43, so we try 1 4 = 45 , which is
2 to the k Rule for 'u( er of Classes &otal 'u( er Reco((ended of 'u( er of O servations Classes

5.

I 1> 66 42 11I 12> 216

J J J J J J J

14 61 45 11? 124 211 1,315

5 2 4 > ? I 13

greater than 43. Thus the recommended number of classes is 4. The table is based on the 2 to the k rule# 2. The lo$er limit of the first class should be an even multiple of the class interval . )uppose a sample of weight losses ranged from 12 pounds to 45 pounds. De want to organize the weight losses into a fre uency distribution with an interval of 4 pounds. The lower limit of the first class would be 15, found by multiplying 5, the even multiple, by 4, the class interval. 9bviously this suggestion was not followed in the above example for length of service. Keep in mind that these are only suggestions not rules. %void overlapping stated class limits. Class limits such as 5:4 and 4:? should not be used. 0se 5 up to 4, then 4 up to ?. This way you can determine in which class to tally 4. Try to avoid open:ended classes. 9pen:ended classes cause serious graphing problems and ma'e it difficult to calculate various measures described in Chapter 6.

4. >.

Relative Frequency Distri ution


Ct is often helpful to 'now the fraction of the total number of observations that appear in each class or relative class frequencies. Relative class frequency" )hows the fraction of the total number of observations in each class. The relative class fre uency is found by dividing each of the class fre uencies by the total number of observations. 0sing the distribution of the lengths of service of the seventeen employees, the relative fre uency for the 1 up to 6:year class is 3.11>4 found by 1H1> 7 3.11>4 7 11L. Thus 11L of the employees had 1 up to 6 years of service. The relative fre uencies for the remaining classes are shown.
Relative Frequency Distri ution !en"th of service 'u( er of Relative $in years% e(ployees Frequency

Found y

1 up to 6 years 6 up to 2 years 2 up to > years > up to I years I up to 11 years Total

1 4 2 6 1 1>

3.11>4 3.621I 3.1I51 3.1>42 3.32?? 3.IIII

1H1> 4H1> 2H1> 6H1> 1H1>

Graphic Presentation of a Frequency Distri ution


To get reader attention a fre uency distribution is often portrayed graphically as a histogram, a fre uency polygon and the cumulative fre uency polygon.

)isto"ra(
The simplest type of a statistical chart is called a histo ram.

!isto ram" * graph in which the classes are mar'ed on the horizontal axis and the class fre uencies on the vertical axis. The class fre uencies are represented by the heights of the bars and the bars are drawn ad/acent to each other. For the length of service for the sample of seventeen employees a histogram would appear as shown on the right. =ote that to plot the )isto"ra( bar for the 2 up to > years $which has a midpoint of 4 years+, we drew lines / 6 vertically from 2 and from > years to 2 , employees on the Y:axis and then 4 connected the end points by a straight line. " 2 The histogram provides an easily 1 interpreted visual representation of a 0 fre uency distribution. 1 0 " " 0 , , 0 / / 0 1 1 0 11
Frequency

Frequency Poly"on

!en"th of #ervice

* second type of chart used to portray a fre uency distribution is the frequency poly on. Frequency Poly on" * graph that consists of line segments connecting the points formed by the intersection of the class midpoints and the class fre uency. For the fre uency polygon, the assumption is that the observations in any class interval are represented by the class midpoint. * dot is placed at the class midpoint opposite the number of fre uencies in that class. For the distribution of years of service, ma'e the first plot by selecting 1 years on the &:axis $the midpoint+ and then go vertically on the Y:axis to 1 and place a dot. This process is continued for all classes. Then connect the dots in order. =ormal practice is to anchor the fre uency polygon to the &:axis. This is accomplished by extending the lines to the midpoint of the class below the lowest class $3+ and to the midpoint of the class above the highest class $11+.
Frequency Poly"on
'u( er of +(ployees / 6 , 4 " 2 1 0 0 2 4 6 8 10 12 *ears of #ervice

Cu(ulative Frequency Distri utions


* cumulative frequency distribution reports the number and percent of observations that are less than a given value. Cumulative Frequency Distribution" * grouping of data into mutually exclusive classes showing the number of observations at or below the upper limit of each class.

* cumulative fre uency distribution is graphically portrayed in a cumulative frequency poly on. * cumulative frequency poly on reports the number and percent of observations that are less than a given value. Cumulative Frequency Poly on" * graph that consists of line segments

connecting the points formed by the intersection of the class endpoints and the class cumulative fre uency.
Cu(ulative Frequency Distri ution #efore we can draw a cumulative fre uency polygon, we must !en"th of Class Cu(ulative Found convert the fre uency distribution service $in years% Frequency Frequency y to a cumulative fre uency 1 up to 6 years 1 1 1 distribution. To construct a 6 up to 2 years 4 ? 1B4 cumulative fre uency distribution, 2 up to > years 2 16 ?B2 we add the fre uencies from the > up to I years 6 14 16 B 6 lowest class to the fre uency of the I up to 11 years 1 1> 14 B 1 next highest class. De add this sum to the fre uency of the next class, etc.

The cumulative fre uencies are plotted on the vertical axis $ Y:axis+ and the lengths of service on the &:axis. Ct may be helpful to plot the cumulative fre uencies on the left side of the vertical axis and the percent of the total on the right side as shown in the 100 16 polygon above. 10 14 80 12 /0 60 10 ,0 8 40 6 "0 4 20 2 10 0 0 1 " , / 1 11
'u( er of +(ployees !en"th of #ervice Percent of +(ployees

You might also like