0% found this document useful (0 votes)
56 views9 pages

Chapter 2

The document discusses using R to analyze various datasets. It provides examples of making graphs like bar graphs and histograms to visualize distributions of data on topics like color preferences, municipal solid waste breakdown, spam types, glucose levels, tornado damage, carbon dioxide emissions, fish recruitment, rainwater acidity, oil well production, time spent studying, guinea pig survival times, student grades and IQ scores.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views9 pages

Chapter 2

The document discusses using R to analyze various datasets. It provides examples of making graphs like bar graphs and histograms to visualize distributions of data on topics like color preferences, municipal solid waste breakdown, spam types, glucose levels, tornado damage, carbon dioxide emissions, fish recruitment, rainwater acidity, oil well production, time spent studying, guinea pig survival times, student grades and IQ scores.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Using R

Link tải tài liệu: https://sites.google.com/a/itam.tdt.edu.vn/thachthanhtien/khoahocsusong

1.16 Least-favorite colors. Refer to the previous exercise. The same study also asked people about
their least-favorite color. Here are the results: orange, 30%; brown, 23%; purple, 13%; yellow, 13%;
gray, 12%; green, 4%; white, 4%; red, 1%; black, 0%; and blue, 0%. Make a bar graph of these
percents and write a summary of the results.

1.17 Ages of survey respondents. The survey about color preferences reported the age distribution of
the people who responded. Here are the results:

Age group (years) 1–18 19–24 25–35 36–50 51–69 70 and over
Count 10 97 70 36 14 5
(a) Add the counts and compute the percents for each age group.
(b) Make a bar graph of the percents.
(c) Describe the distribution.
(d) Explain why your bar graph is not a histogram.

1.18 Garbage. The formal name for garbage is “municipal solid waste.” The table gives a breakdown
of the materials that made up American municipal solid waste.

(a) Make a bar graph of the percents. The graph gives a clearer picture of the main contributors to
garbage if you order the bars from tallest to shortest.
(b) If you use software, also make a pie chart of the percents. Comparing the two graphs, notice that it
is easier to see the small differences among “Food scraps,” “Plastics,” and “Yard trimmings” in the bar
graph.

1.19 Spam. Email spam is the curse of the Internet. Here is a compilation of the most common types of
spam:
Make two bar graphs of these percents, one with bars ordered as in the table (alphabetical) and the
other with bars in order from tallest to shortest. Comparisons are easier if you order the bars by height.
A bar graph ordered from tallest to shortest bar is sometimes called a Pareto chart, after the Italian
economist who recommended this procedure.

1.20 Women seeking graduate and professional degrees. The table on the next page gives the
percents of women among students seeking various graduate and professional degrees:
(a) Explain clearly why we cannot use a pie chart to display these data.
(b) Make a bar graph of the data. (Comparisons are easier if you order the bars by height.)

1.23 Diabetes and glucose. People with diabetes must monitor and control their blood glucose level.
The goal is to maintain “fasting plasma glucose” between about 90 and 130 milligrams per deciliter
(mg/dl). Here are the fasting plasma glucose levels for 18 diabetics enrolled in a diabetes control class,
five months after the end of the class:

141 158 112 153 134 95 96 78 148


172 200 271 103 172 359 145 147 255

Make a stemplot of these data and describe the main features of the distribution. (You will want to trim
and also split stems.) Are there outliers? How well is the group as a whole achieving the goal for
controlling glucose levels?

1.24 Compare glucose of instruction and control groups. The study described in the previous
exercise also measured the fasting plasma glucose of 16 diabetics who were given individual
instruction on diabetes control. Here are the data:
128 195 188 158 227 198 163 164
159 128 283 226 223 221 220 160

Make a back-to-back stemplot to compare the class and individual instruction groups. How do the
distribution shapes and success in achieving the glucose control goal compare?

1.28 Tornado damage. The states differ greatly in the kinds of severe weather that afflict them. Table
1.5shows the average property damage caused by tornadoes per year over the period from 1950 to 1999
in each of the 50 states and Puerto Rico. (To adjust for the changing buying power of the
dollar over time, all damages were restated in 1999 dollars.)
(a) What are the top five states for tornado damage? The bottom five?
(b) Make a histogram of the data, by hand or using software, with classes “0≤damage<10,”
“10≤damage < 20,” and so on. Describe the shape, center, and spread of the distribution. Which states
may be outliers? (To understand the outliers, note that most tornadoes in largely rural states such as
Kansas cause little property damage. Damage to crops is not counted as property damage.)
(c) If you are using software, also display the “default” histogram that your software makes when you
give it no instructions. How does this compare
with your graph in (b)?

1.30 Carbon dioxide from burning fuels. Burning fuels in power plants or motor vehicles emits
carbon dioxide (CO2), which contributes to global warming. Table 1.6 displays CO2 emissions per
person from countries with population at least 20 million.17

(a) Why do you think we choose to measure emissions per person rather than total CO2
emissions for each country?
(b) Display the data of Table 1.6 in a graph. Describe the shape, center, and spread of the distribution.
Which countries are outliers?

1.31 California temperatures. Table 1.7 contains data on the mean annual temperatures (degrees
Fahrenheit) for the years 1951 to 2000 at two locations in California: Pasadena and Redding.18
Make time plots of both time series and compare their main features. You can see why discussions of
climate change often bring disagreement.
1.34 Fish in the Bering Sea. “Recruitment,” the addition of new members to a fish population, is an
important measure of the health of ocean ecosystems. Here are data on the recruitment of rock sole in
the Bering Sea between 1973 and 2000:

(a) Make a graph to display the distribution of rock sole recruitment, then describe the pattern and any
striking deviations that you see.
(b) Make a time plot of recruitment and describe its pattern. As is often the case with time series
data, a time plot is needed to understand what is happening.

1.36 Acidity of rainwater. Changing the choice of classes can change the appearance of a
histogram. Here is an example in which a small shift in the classes, with no change in the number of
classes, has an important effect on the histogram. The data are the acidity levels (measured by pH)
in 105 samples of rainwater. Distilled water has pH 7.00. As the water becomes more acidic, the pH
goes down. The pH of rainwater is important to environmentalists because of the problem of acid
rain.

(a) Make a histogram of pH with 14 classes, using class boundaries 4.2, 4.4,…, 7.0. How many modes
does your histogram show? More than one mode suggests that the data contain groups that have
different distributions.
(b) Make a second histogram, also with 14 classes, using class boundaries 4.14, 4.34,…, 6.94. The
classes are those from (a) moved 0.06 to the left. How many modes does the new histogram show?
(c) Use your software’s histogram function to make a histogram without specifying the number of
classes or their boundaries. How does the software’s default histogram compare with those in (a) and
(b)?

1. 39 Oil wells. How much oil the wells in a given field will ultimately produce is key information in
deciding whether to drill more wells. Here are the estimated total amounts of oil recovered from 64
wells in the Devonian Richmond Dolomite area of the Michigan basin, in thousands of barrels:

Graph the distribution and describe its main features.

1.41 Time spent studying. Do women study more than men? We asked the students in a large first-
year college class how many minutes they studied on a typical weeknight. Here are the responses of
random samples of 30 women and 30 men from the class:

(a) Examine the data. Why are you not surprised that most responses are multiples of 10 minutes? We
eliminated one student who claimed to study 30,000 minutes per night. Are there any other responses
you consider suspicious?
(b) Make a back-to-back stemplot of these data. Report the approximate midpoints of both groups.
Does it appear that women study more than men (or at least claim that they do)?

1.4 2 Guinea pigs. Table 1.8 gives the survival times in days of 72 guinea pigs after they were injected
with tubercle bacilli in a medical experiment.24 Make a suitable graph and describe the shape, center,
and spread of the distribution of survival times. Are there any outliers?
1.43 Grades and self-concept. Table 1.9 presents data on 78 seventh-grade students in a rural
midwestern school.25 The researcher was interested in the relationship between the students’ “self-
concept” and their academic performance. The data we give here include each student’s grade point
average (GPA), score on a standard IQ test, and gender, taken from school records. Gender is coded as
F for female and M for male. The students are identified only by an observation number (OBS). The
missing OBS numbers show that some students dropped out of the study. The final variable is each
student’s score on the Piers-Harris Children’s Self-Concept Scale, a psychological test administered by
the researcher.
(a) How many variables does this data set contain? Which are categorical variables and which are
quantitative variables?
(b) Make a stemplot of the distribution of GPA, after rounding to the nearest tenth of a point.

(c) Describe the shape, center, and spread of the GPA distribution. Identify any suspected outliers
from the overall pattern.
(d) Make a back-to-back stemplot of the rounded GPAs for female and male students. Write a brief
comparison of the two distributions.
1.44 Describe the IQ scores. Make a graph of the distribution of IQ scores for the seventh-grade
students in Table 1.9. Describe the shape, center, and spread of the distribution, as well as any
outliers. IQ scores are usually said to be centered at 100. Is the midpoint for these students close to 100,
clearly above, or clearly below?

1.45 Describe the self-concept scores. Based on a suitable graph, briefly describe the distribution of
self-concept scores for the students in Table 1.9. Be sure to identify any suspected outliers.

1.46 The Boston Marathon. Women were allowed to enter the Boston Marathon in 1972. The
following table gives the times (in minutes, rounded to the nearest minute) for the winning women
from 1972 to 2006.
Make a graph that shows change over time. What overall pattern do you see? Have times stopped
improving in recent years? If so, when did improvement end?

You might also like