0% found this document useful (0 votes)
11 views37 pages

Descriptive Statistics

Uploaded by

Em Marasigan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views37 pages

Descriptive Statistics

Uploaded by

Em Marasigan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Descriptive

Statistics
In this chapter, you will study numerical
and graphical ways to describe and display
your data. This area of statistics is
called "Descriptive Statistics." You will
learn how to calculate, and even more
importantly, how to interpret these
measurements and graphs.
• A statistical graph is a tool
that helps you learn about the
shape or distribution of a
sample or a population.
• A graph can be a more
effective way of presenting
data than a mass of numbers
because we can see where
data clusters and where there
are only a few data values.
• Some of the types of graphs that are
used to summarize and organize data
are: the dot plot, the bar graph, the
histogram, the stem-and-leaf plot,
the frequency polygon (a type of
broken line graph), the pie chart,
and the box plot.
Stem-and-Leaf
Graphs (Stemplots),
Line Graphs, and
Bar Graphs
One simple graph,
the stem-and-leaf
graph or stemplot,
comes from the field of
exploratory data
analysis. It is a good
choice when the data
sets are small.
To create the plot, divide each observation of data
into a stem and a leaf. The leaf consists of a final
significant digit. For example, 23 has stem two and
leaf three. The number 432 has stem 43 and leaf
two. Likewise, the number 5,432 has stem 543 and
leaf two. The decimal 9.3 has stem nine and leaf
three. Write the stems in a vertical line from
smallest to largest. Draw a vertical line to the right
of the stems. Then write the leaves in increasing
order next to their corresponding stem.
For Susan Dean's spring pre-calculus class, scores for the first exam were as follows (smallest
to largest):
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90;
92; 94; 94; 94; 94; 96; 100

Stem Leaf
3 3

4 299

5 355
Table 2.1 Stem-and-Leaf
6 1378899 Graph

7 2348

8 03888

9 0244446

10 0
The stemplot is a quick way to graph data and gives an exact picture
of the data. You want to look for an overall pattern and any outliers.
An outlier is an observation of data that does not fit the rest of the
data. It is sometimes called an extreme value. When you graph an
outlier, it will appear not to fit the pattern of the graph. Some
outliers are due to mistakes (for example, writing down 50 instead of
500) while others may indicate that something unusual is happening.
It takes some background information to explain outliers, so we will
cover them in more detail later.
Stem Leaf
The data are the distances (in 1 15
kilometers) from a home to local 2 357
supermarkets. 3 23358
Create a stemplot using the data: 4 025578
5 56
1.1; 1.5; 2.3; 2.5; 2.7; 3.2; 3.3;
6 57
3.3; 3.5; 3.8; 4.0; 4.2; 4.5; 4.5;
7
4.7; 4.8; 5.5; 5.6; 6.5; 6.7; 12.3
8

Do the data seem to have any 9


concentration of values? 10
11
12 3

•The value 12.3 may be an outlier. Values appear to


concentrate at three and four kilometers.
.A side-by-side stem-and-leaf plot allows a comparison of the two
data sets in two columns. In a side-by-side stem-and-leaf plot, two
sets of leaves share the same stem. The leaves are to the left and the
right of the stems.
.The Tables show the ages of presidents at their inauguration and at
their death. Construct a side-by-side stem-and-leaf plot using this
data.
President Age President Age President Age
Washington 57 Lincoln 52 Hoover 54

J. Adams 61 A. Johnson 56 F. Roosevelt 51

Jefferson 57 Grant 46 Truman 60

Madison 57 Hayes 54 Eisenhower 62

Monroe 58 Garfield 49 Kennedy 43

J. Q. Adams 57 Arthur 51 L. Johnson 55


•Table 2.4 Presidential Jackson 61 Cleveland 47 Nixon 56
Ages at Inauguration Van Buren 54 B. Harrison 55 Ford 61

W. H. Harrison 68 Cleveland 55 Carter 52

Tyler 51 McKinley 54 Reagan 69

Polk 49 T. Roosevelt 42 G.H.W. Bush 64

Taylor 64 Taft 51 Clinton 47

Fillmore 50 Wilson 56 G. W. Bush 54

Pierce 48 Harding 55 Obama 47

Buchanan 65 Coolidge 51
President Age President Age President Age
Washington 67 Lincoln 56 Hoover 90
J. Adams 90 A. Johnson 66 F. Roosevelt 63
Jefferson 83 Grant 63 Truman 88
Madison 85 Hayes 70 Eisenhower 78
Monroe 73 Garfield 49 Kennedy 46
J. Q. Adams 80 Arthur 56 L. Johnson 64
•Table 2.5 Presidential Age at Jackson 78 Cleveland 71 Nixon 81
Death Van Buren 79 B. Harrison 67 Ford 93
W. H. Harrison 68 Cleveland 71 Reagan 93
Tyler 71 McKinley 58
Polk 53 T. Roosevelt 60
Taylor 65 Taft 72
Fillmore 74 Wilson 67
Pierce 64 Harding 57
Buchanan 77 Coolidge 60
Ages at Inauguration Ages at Death
998777632 4 69

877776665555444
5 366778 •Solution
4422111110

9854421110 6 003344567778

7 0011147889

8 01358

9 0033
Line Graph

Another type of graph that is useful


for specific data values is a line
graph. In the particular line graph
shown in table, the x-axis (horizontal
axis) consists of data values and
the y-axis (vertical axis) consists
of frequency points. The frequency
points are connected using line
segments.
•In a survey, 40 mothers were asked how many times per week a
teenager must be reminded to do his or her chores. The results
are shown below.

Number of times teenager


is reminded Frequency
0 2
1 5
2 8
3 14
4 7
5 4
Bar graphs

Bar graphs consist of bars that are


separated from each other. The bars
can be rectangles or they can be
rectangular boxes (used in three-
dimensional plots), and they can be
vertical or horizontal. The bar
graph shown in example has age
groups represented on the x-axis and
proportions on the y-axis.
•By the end of 2011, Facebook had over 146 million users in the United
States. The table shows three age groups, the number of users in each age
group, and the proportion (%) of users in each age group. Construct a bar
graph using this data.

Proportion
Number of (%) of
Age Facebook Facebook
groups users users
13–25 65,082,280 45%
26–44 53,300,200 36%
45–64 27,885,100 19%
Activity:

The following data show the distances (in miles) from the
homes of off-campus statistics students to the college.
Create a stem plot using the data and identify any outliers:
• 0.5; 0.7; 1.1; 1.2; 1.2; 1.3; 1.3; 1.5; 1.5; 1.7; 1.7; 1.8; 1.9;
2.0; 2.2; 2.5; 2.6; 2.8; 2.8; 2.8; 3.5; 3.8; 4.4; 4.8; 4.9; 5.2;
5.5; 5.7; 5.8; 8.0
Histogram
• A histogram consists of contiguous (adjoining)
boxes. It has both a horizontal axis and a
vertical axis. The horizontal axis is labeled with
what the data represents (for instance, distance
from your home to school). The vertical axis is
labeled either frequency or relative
frequency (or percent frequency or
probability). The graph will have the same
shape with either label. The histogram (like the
stemplot) can give you the shape of the data,
the center, and the spread of the data.
• The relative frequency is equal to the frequency for an observed value
of the data divided by the total number of data values in the sample.
(Remember, frequency is defined as the number of times an answer
occurs.) If:
• f = frequency
• n = total number of data values (or the sum of the individual frequencies), and
• RF = relative frequency,
• then:
RF=f / n
For example, if three students in
Mr. Ahab's English class of 40
students received from 90% to
100%, then, f = 3, n = 40,
and RF = f/n = 3 /40 = 0.075.
7.5% of the students received
90–100%. 90–100% are
quantitative measures.
• To construct a histogram, first decide how
many bars or intervals, also called classes, represent the data.
• Many histograms consist of five to 15 bars or classes for clarity.
The number of bars needs to be chosen.
• Choose a starting point for the first interval to be less than the
smallest data value.
• A convenient starting point is a lower value carried out to one
more decimal place than the value with the most decimal places.
• For example, if the value with the most decimal places is 6.1 and this
is the smallest value, a convenient starting point is 6.05 (6.1 – 0.05 =
6.05). We say that 6.05 has more precision. If the value with the most
decimal places is 2.23 and the lowest value is 1.5, a convenient starting
point is 1.495 (1.5 – 0.005 = 1.495). If the value with the most decimal
places is 3.234 and the lowest value is 1.0, a convenient starting point is
0.9995 (1.0 – 0.0005 = 0.9995).
If all the data happen to be integers and the smallest
value is two, then a convenient starting point is 1.5 (2 –
0.5 = 1.5). Also, when the starting point and other
boundaries are carried to one additional decimal place,
no data value will fall on a boundary. The next two
examples go into detail about how to construct a
histogram using continuous data and how to create a
histogram using discrete data.
•EXAMPLE

•The following data are the heights (in inches to the nearest half inch) of 100 male
semiprofessional soccer players. The heights are continuous data, since height is
measured.
60; 60.5; 61; 61; 61.5;63.5; 63.5; 63.5
64; 64; 64; 64; 64; 64; 64; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5; 64.5
66; 66; 66; 66; 66; 66; 66; 66; 66; 66; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5;
66.5; 66.5; 66.5; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67; 67.5; 67.5; 67.5; 67.5;
67.5; 67.5; 67.5
68; 68; 69; 69; 69; 69; 69; 69; 69; 69; 69; 69; 69.5; 69.5; 69.5; 69.5; 69.5
70; 70; 70; 70; 70; 70; 70.5; 70.5; 70.5; 71; 71; 71
72; 72; 72; 72.5; 72.5; 73; 73.5
74
•The smallest data value is 60. Since the data with the most decimal places has
one decimal (for instance, 61.5), we want our starting point to have two decimal
places. Since the numbers 0.5, 0.05, 0.005, etc. are convenient numbers, use
0.05 and subtract it from 60, the smallest value, for the convenient starting point.

•60 – 0.05 = 59.95 which is more precise than, say, 61.5 by one decimal place.
The starting point is, then, 59.95.

•The largest value is 74, so 74 + 0.05 = 74.05 is the ending value.


Next, calculate the width of each bar or class
interval. To calculate this width, subtract the
starting point from the ending value and divide
by the number of bars (you must choose the
number of bars you desire). Suppose you
choose eight bars.

(74.05−59.95) / 8 = 1.76
•NOTE
•We will round up to two and make each bar or class
interval two units wide. Rounding up to two is one
way to prevent a value from falling on a boundary.
Rounding to the next number is often necessary even
if it goes against the standard rules of rounding. For
this example, using 1.76 as the width would also
work. A guideline that is followed by some for the
number of bars or class intervals is to take the square
root of the number of data values and then round to
the nearest whole number, if necessary. For example,
if there are 150 values of data, take the square root of
150 and round to 12 bars or intervals.
The heights 60 through 61.5 inches are in the
The boundaries are: interval 59.95–61.95. The heights that are
•59.95 63.5 are in the interval 61.95–63.95. The
•59.95 + 2 = 61.95 heights that are 64 through 64.5 are in the
•61.95 + 2 = 63.95 interval 63.95–65.95. The heights 66 through
•63.95 + 2 = 65.95 67.5 are in the interval 65.95–67.95. The
•65.95 + 2 = 67.95 heights 68 through 69.5 are in the interval
•67.95 + 2 = 69.95 67.95–69.95. The heights 70 through 71 are in
•69.95 + 2 = 71.95 the interval 69.95–71.95. The heights 72
•71.95 + 2 = 73.95 through 73.5 are in the interval 71.95–73.95.
•73.95 + 2 = 75.95 The height 74 is in the interval 73.95–75.95.
•The following histogram
displays the heights on
the x-axis and relative
frequency on the y-axis.
•EXAMPLE 2.8
•Create a histogram for the following data: the number of books bought by 50 part-
time college students at ABC College.
•The number of books is discrete data, since books are counted.
1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1
2; 2; 2; 2; 2; 2; 2; 2; 2; 2
3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3
4; 4; 4; 4; 4; 4
5; 5; 5; 5; 5
6; 6
•Eleven students buy one book. Ten
students buy two books. Sixteen
students buy three books. Six
students buy four books. Five
students buy five books. Two
students buy six books.

•Because the data are integers,


subtract 0.5 from 1, the smallest
data value and add 0.5 to 6, the
largest data value. Then the starting
point is 0.5 and the ending value is
6.5.
•Next, calculate the width of each bar or class interval. If the data are
discrete and there are not too many different values, a width that places
the data values in the middle of the bar or class interval is the most
convenient. Since the data consist of the numbers 1, 2, 3, 4, 5, 6, and the
starting point is 0.5, a width of one places the 1 in the middle of the
interval from 0.5 to 1.5, the 2 in the middle of the interval from 1.5 to
2.5, the 3 in the middle of the interval from 2.5 to 3.5, the 4 in the
middle of the interval from _______ to _______, the 5 in the middle of
the interval from _______ to _______, and the _______ in the middle of
the interval from _______ to _______ .
•Solution 1
•3.5 to 4.5
•4.5 to 5.5
•6
•5.5 to 6.5

•Calculate the number of bars as


follows:

•(6.5−0.5) / number of bars(6) = 1


•where 1 is the width of a
bar. Therefore, bars = 6.
•The following histogram
displays the number of
books on the x-axis and the
frequency on the y-axis.
TRY IT !!!

The following data are the shoe sizes of 50 male students. The sizes
are discrete data since shoe size is measured in whole and half units
only. Construct a histogram and calculate the width of each bar or
class interval. Suppose you choose six bars.

9; 9; 9.5; 9.5; 10; 10; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 10.5;
10.5; 10.5; 10.5
11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5;
11.5; 11.5; 11.5; 11.5
12; 12; 12; 12; 12; 12; 12; 12.5; 12.5; 12.5; 12.5; 14

You might also like