2379 Chapter7 1 (Part1)
2379 Chapter7 1 (Part1)
1 (Part 1) 1
c) At a center blood donation, the cholesterol level was recorded for 100 donors.
yy
Part 1: Frequency Distributions
T
Below will introduce various graphical methods for summarizing a data set.
a) The Bar Charts
A bar chart is a graphical method used for categorical variables. The vertical bars have the same width and
are separated by some arbitrary space. The heights of the bars represent the frequencies for each category of the
variable. Alternatively, one can draw a bar chart of the relative frequencies:
the frequency
The relative frequency =
n
Example 1. (Color of poinsettias) A sample of 25 poinsettias were classified according to their color as follows:
MAT 2379, Introduction to Biostatistics, Section 7.1 (Part 1) 2
b) Histograms
A histogram is a graphical method used for quantitative variables. The scale of the variable determines the
placement of the bars. Usually, the vertical bars have the same width. Unlike the bar charts, the bars are not
separated by some arbitrary space.
b1) Histograms for discrete variables. In this case, the width of the bars is (usually) 1 unit.
Example 2. (piglets) A company who owns a large number of pig farms is interested in the distribution of the
number X of surviving piglets per sow. X is a discrete variable which can take the values: 0, 1, . . . , 20. A sample of
10 sows is selected. We record the observed values of X for this sample:
sons
i 1 2 3 4 5 6 7 8 9 10
xi 5 8 6 5 9 7 6 9 9 8
pi9
Below is the table of frequencies and relative frequencies for this data:
Number of Frequency Relative Frequency
Surviving Piglets ProbabilityDensity
5 2 0.2
6 2 0.2
7 1 0.1
8 2 0.2
9 3 0.3
Total 10 1.0
Here are the histograms of frequencies and relative frequencies:
frequency relativefrequencies
area
Mfg
b2) Histograms for continuous variables. In this case, we have to arrange the data into groups (called bins)
and count how many values lie in each bin. One has to select the number of bins (this is a delicate issue, which will
not be discussed here). Usually the bins have the same width.
Example 3. (height) A sample of 15 college students were asked how tall they were. Here is the data (in inches):
66.5 61.2 63.9 62.7 65.1 68.7 64.3 73.3 69.3 66.5 70.1 71.3 68.1 67.4 66.7
We arrange the data into 7 bins of equal width: bin 1 contains the values between 61 and 63, bin 2 contains the
values between 63 and 65, etc. The following table gives the frequencies and the relative frequencies:
Height Frequency Relative Frequency Probabilitydensity
61-63 2 2/15
2 15 22 lengthof bin
63-65 2 2/15
it
65-67 4 4/15
67-69 3 3/15
69-71 2 2/15
71-73 1 2/15
73-75 1 1/15
Total 15 1.0
N A
Here are the histograms of the frequencies and relative frequencies:
probabilitydensity
4 15 2
area
p
t.IN
MIA skewed totheright
in