Final QMM
Final QMM
Introduction to Statistics
D 1. The complete collection of all entities under study is called the __________.
E A. sample
Term B. parameter
C. statistic
D. population
E A. parameter
Term B. sample
C. population
D. statistic
1
2 Test Bank
E A. statistic
BApp B. population
C. parameter
D. sample
E A. statistic
BApp B. population
C. parameter
D. sample
E A. statistic
BApp B. sample
C. population
D. parameter
E A. statistic
BApp B. sample
C. population
D. parameter
Chapter 1: Introduction to Statistics 3
E A. population
BApp B. sample
C. statistic
D. parameter
E A. population
BApp B. sample
C. parameter
D. statistic
B 9. When a person collects information from the entire population, this is called a
_______.
E A. sample
Term B. census
C. statistic
D. parameter
B 10. Manuel Banales, Marketing Director of Plano Power Plants, Inc.'s Electrical
Division, is leading a study to identify and assess the relative importance of
product features. Manuel directs his staff to design a survey questionnaire for
distribution to all of Plano’s 954 customers. Manuel is ordering a ____________.
D 11. Manuel Banales, Marketing Director of Plano Power Plants, Inc.'s Electrical
Division, is leading a study to identify and assess the relative importance of
product features. Manuel directs his staff to design a survey questionnaire for
distribution to 100 of Plano’s 954 customers. Manuel is ordering a
____________.
A 14. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1999." Pinky is ordering a
__________________.
C 15. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "every tenth
payroll voucher issued since January 1, 1999." Pinky is ordering a
__________________.
C 16. Upon discovering an improperly adjusted drill press, Jack Joyner, Director of
Quality Control, ordered a 100% inspection of all castings drilled on the evening
shift. Jack is ordering a ___________________.
A 17. Upon discovering an improperly adjusted drill press, Jack Joyner, Director of
Quality Control, ordered an inspection of "every fifth casting drilled on the
evening shift." Jack is ordering a ___________________.
M A. sample statistics
Term B. population parameters
C. descriptive measures
D. inferential statistics
M A. ~
Term B. #
C. µ
D. ∞
6 Test Bank
M A. parameter
BApp B. population
C. sample
D. statistic
C 21. Abel Alonzo, Director of Human Resources, is exploring the causes of employee
absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). The average number of absences per employee,
computed from the personnel data of all employees, is a ________________.
M A. population
BApp B. sample
C. parameter
D. statistic
A 22. Abel Alonzo, Director of Human Resources, is exploring the causes of employee
absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). The most appropriate symbol for the average
number of absences per employee, computed from the personnel data of all
employees, is ________________.
M A. µ
Term B. #
C. ~
D. ∞
Chapter 1: Introduction to Statistics 7
D 23. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1999 to determine the percentage of
irregular vouchers." The percentage which Pinky ordered is a
__________________.
M A. sample statistic
BApp B. sample parameter
C. sorted order
D. population parameter
M A. Greek letters
Term B. Roman letters
C. ordinal data
D. interval data
M A. S
Term B. ~
C. µ
D. ∞
M A. parameter
BApp B. population
C. sample
D. statistic
8 Test Bank
A 27. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "every tenth
payroll voucher issued since January 1, 1999 and a calculation of the percentage
of irregular vouches in this group." The percentage which Pinky ordered is a
__________________.
M A. sample statistic
BApp B. sample parameter
C. sorted order
D. population parameter
D 28. Abel Alonzo, Director of Human Resources, is exploring the causes of employee
absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). Personnel records of 50 employees are selected for
analysis. (The plant employees 250.) For this study, the average number days
absent for these 50 employees is a ________________.
M A. population
BApp B. sample
C. parameter
D. statistic
E A. interval level
Term B. ordinal level
C. nominal level
D. ratio level
D 30. Which of the following operations is meaningful for processing nominal data?
M A. addition
Term B. multiplication
C. ranking
D. counting
A 31. Which scale of measurement has these two properties: linear distance is
meaningful and the location of origin (zero) is arbitrary?
E A. interval level
Term B. ordinal level
C. nominal level
D. ratio level
Chapter 1: Introduction to Statistics 9
D 32. Which scale of measurement has these two properties: linear distance is
meaningful and the location of origin (zero) is absolute (natural)?
E A. interval level
Term B. ordinal level
C. nominal level
D. ratio level
E A. nominal level
BApp B. ordinal level
C. interval level
D. ratio level
E A. nominal level
BApp B. ordinal level
C. interval level
D. ratio level
C 35. Which of the following operations is meaningful for processing ordinal data, but
is meaningless for processing nominal data?
M A. addition
Term B. multiplication
C. ranking
D. counting
10 Test Bank
E A. nominal level
BApp B. ordinal level
C. interval level
D. ratio level
C 37. A consumer has been asked to rank five cars based upon their desirability. This
level of measurement is _______.
M A. nominal
BApp B. ratio
C. ordinal
D. interval
C 38. Morningstar Mutual Funds analyzes the risk and performance of mutual funds.
Each mutual fund is assigned an overall rating of one to five stars. One star is the
lowest rating, and five stars is the highest rating. This level of measurement is
__________.
E A. nominal
BApp B. ratio
C. ordinal
D. interval
D 39. A level of data measurement that has an absolute zero is called _______.
E A. nominal
Term B. ordinal
C. interval
D. ratio
Chapter 1: Introduction to Statistics 11
A 40. A person has decided to code a particular set of sales data. A value of 0 is
assigned if the sales occurred on a weekday, and a value of 1 means it happened
on a weekend. This is an example of _______.
A 41. Members of the accounting department's clerical staff were asked to rate their
supervisor's leadership style as either (1) authoritarian or (2) participatory. This is
an example of _____________.
B 42. A market research analyst has asked consumers to rate the appearance of a new
package on a scale of 1 to 5. A 1 means that the appearance is awful while a 5
means that it is excellent. The level of this data is usually considered _______.
M A. nominal
BApp B. ordinal
C. interval
D. ratio
A 43. The social security number of employees would be an example of what level of
data measurement?
E A. nominal
BApp B. ordinal
C. interval
D. ratio
D 44. The dollar sales of a restaurant is an example of what level of data measurement?
M A. nominal
BApp B. ordinal
C. interval
D. ratio
12 Test Bank
D 45. Grades on a test range from 0 to 100. This level of data is _______.
E A. nominal
App B. ordinal
C. interval
D. ratio
C 46. If it were not for the existence of an "absolute zero," ratio data would be
considered the same as _______.
E A. nominal
Term B. ordinal
C. interval
D. descriptive data
D 47. Scholastic Aptitude Test scores are an example of what type of measurement
scale?
M A. nominal
App B. ordinal
C. interval
D. ratio
C 48. Which types of data are normally used with parametric statistics?
B 49. Which types of data are normally used with nonparametric statistics?
B 50. Using data from a group to generalize to a larger group involves the use of
_______.
M A. descriptive statistics
Term B. inferential statistics
C. population derivation
D. sample persuasion
Chapter 1: Introduction to Statistics 13
B 51. A student makes an 82 on the first test in a statistics course. From this, she
assumes that her average at the end of the semester (after other tests) will be about
82. This is an example of _______.
M A. descriptive statistics
App B. inferential statistics
C. nonparametric statistics
D. wishful thinking
C 52. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Life tests performed on a sample
of 100 batteries indicated an average life of seven years under normal usage.
Jessica recommended a six-year warranty period for the new model. This is an
example of _____________.
M A. descriptive statistics
BApp B. nonparametric statistics
C. inferential statistics
D. nominal data
D 53. Upon discovering an improperly adjusted drill press, Jack Joyner, Director of
Quality Control, ordered an inspection of "every fifth casting drilled on the
evening shift." Less than 1% of the castings were defective; so, Jack released the
evening shift's production to assembly. This is an example of _______________.
M A. nonparametric statistics
BApp B. nominal data
C. descriptive statistics
D. inferential statistics
C 54. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1999." Five percent of the payroll
vouchers contained material errors. This is an example of _______________.
H A. nonparametric statistics
BApp B. nominal data
C. descriptive statistics
D. inferential statistics
14 Test Bank
B 55. A new sales person is paid a commission on each sale. This person made $2,000
his first month on the job. From this he concludes that he will make $24,000
during his first year. This is an example of _______.
E A. descriptive statistics
BApp B. inferential statistics
C. nonparametric statistics
D. nominal data
A 56. A statistics instructor collects information about the background of his students.
About 30% have taken economics and about 40% have taken accounting. There
are 23 male students and 27 female students in this class. This is an example of
_______.
E A. descriptive statistics
App B. inferential statistics
C. nonparametric statistics
D. nominal data
B 57. A market researcher is interested in determining the average income for families
in Duval County, Florida. To accomplish this, she takes a random sample of 400
families from the county and uses the data gathered from these families to
estimate the average income for families of the entire county. This process is an
example of _______.
H A. descriptive statistics
BApp B. inferential statistics
C. intermediate statistics
D. a census
A 58. The Universal Pulp Company has a plant in Portland, Oregon. Management has
decided to determine the average number of sick days taken per worker in this
plant in 1991. To accomplish this, the management gathers records on all the
workers in the plant and averages the number of sick days taken in 1991 by each
worker. This process is using _______.
M A. descriptive statistics
BApp B. inferential statistics
C. company-wide statistics
D. locale-specific statistics
Chapter 1: Introduction to Statistics 15
D 59. The Magnolia Swimming Pool Company wants to determine the average number
of years it takes before a major repair is required on one of the pools that the
company constructs. The president of the company asks Rick Johnson, a
company accountant, to randomly contact fifty families that built Magnolia pools
in the past ten years and determine how long it was in each case until a major
repair. The information will then be used to estimate the average number of years
until a major repair for all pools sold by Magnolia. The average based on the data
gathered from the fifty families can best be described as a _______.
M A. sample
BApp B. population
C. parameter
D. statistic
D 60. The Chamber of Commerce wants to assess its membership's opinions of the
North American Free Trade Agreement. One-hundred of the 2,000 members are
randomly selected and contacted via telephone. Seventy-five reported an overall
favorable opinion, and twenty-five reported an overall unfavorable opinion. The
proportion, 0.75, is a ___________.
M A. sample
BApp B. population
C. parameter
D. statistic
C 61. What proportion of San Diego voters favor trade restrictions with China? In an
effort to determine this, a research team calls every registered voter in San Diego
and successfully contacts them. The proportion from the data gathered from the
calls most likely is a _______.
M A. sample
App B. population
C. parameter
D. statistic
16 Test Bank
D 62. A researcher wants to know what the average variation is in altimeters of small,
privately owned airplanes. The task of determining this is expensive and time
consuming, if even possible, given the large number of such airplanes. The
researcher decides to use government records to randomly locate the owners of ten
such planes and then get permission to test the altimeters. When the researcher is
done, he will use the data gathered from the group of ten to reach conclusions
about all small, privately owned airplanes. This process can best be described as
_______.
H A. data statistics
App B. research statistics
C. descriptive statistics
D. inferential statistics
C 63. A researcher wants to know what the average variation is in altimeters of small,
privately owned airplanes. The task of determining this is expensive and time
consuming, if even possible, given the large number of such airplanes. The
researcher decides to use government records to randomly locate the owners of ten
such planes and then get permission to test the altimeters. When the researcher is
done, he will use the data gathered from the group of ten to reach conclusions
about all small, privately owned airplanes. The data gathered on the group of ten
airplanes is best described as _______.
H A. measurements
App B. data
C. statistics
D. parameters.
C 64. How much inventory do Christmas tree sales lots keep? A researcher goes from
location to location around the city counting the number of trees in each lot.
These numbers most likely represent what level of data?
M A. nominal
BApp B. ordinal
C. ratio
D. interval
Chapter 1: Introduction to Statistics 17
B 65. During the Valentine's season, different offices in a company are encouraged to
decorate their doors. A committee then goes around and ranks the doors
according to how well decorated they are. The best door gets a ranking of one, the
second best gets a ranking of two, etc. The numbers of these rankings represent
which level of data?
M A. interval
BApp B. ordinal
C. nominal
D. ratio
A 66. A large manufacturing company in Indianapolis produces valves for the chemical
industry. According to specifications, one particular valve is supposed to have a
five-inch opening on the side. Quality control inspectors take random samples of
these valves just after the hole is bored. They measure the size of the hole in an
effort to determine if the machine is out-of-adjustment. The measurement of the
diameter of the hole represents which level of data?
M A. ratio
BApp B. nominal
C. ordinal
D. interval
E A. interval
BApp B. nominal
C. ordinal
D. ratio
18 Test Bank
M A. ratio
BApp B. interval
C. ordinal
D. nominal
C 69. A business is attempting to find the best small town in the United States in which
to relocate. As part of the investigation, the elevations of all small towns in the
United States are researched. Some towns are located high in the Rockies with
elevations over 8,000 feet. There are even some towns located in the south central
valley of California with elevations below sea level (for example, elevations of
around -100 feet). These elevations can best be described as what level of data?
M A. nominal
BApp B. ordinal
C. interval
D. ratio
E A. nominal
BApp B. ordinal
C. interval
D. ratio
D 71. The apartment vacancy rate is often used as an indicator of a community’s need
for residential housing construction. The apartment vacancy rate is best described
as what level of measurement?
E A. nominal
BApp B. ordinal
C. interval
D. ratio
Chapter 1: Introduction to Statistics 19
M A. ratio
BApp B. interval
C. ordinal
D. nominal
D 73. A business is attempting to find the best small town in the United States in which
to relocate. As part of the investigation, the availability of vocational, technical
education in small towns in the United States are researched. Vocational,
technical education is available within fifty miles of some small town; for others it
is not. The availability of vocational, technical education can best be described as
what level of data?
M A. ratio
BApp B. interval
C. ordinal
D. nominal
B 74. Colleges and universities often assign numbers as student identification numbers.
These numbers are best categorized as what level of data?
E A. interval
App B. nominal
C. ordinal
D. ratio
E A. nominal
App B. ordinal
C. interval
D. ratio
20 Test Bank
M A. nominal
App B. ordinal
C. interval
D. ratio
C 77. Financial institutions are often ranked by the volume of deposits. This ranking is
what level of measurement?
E A. ratio
BApp B. interval
C. ordinal
D. nominal
A 78. A chemical plant in Louisiana has a tank that holds a particular chemical. Every
day, a technician records the meter reading on the side of the tank which tells how
much volume of fluid there is in the tank. Most likely, these volume readings are
what level of data?
H A. ratio
BApp B. interval
C. ordinal
D. nominal
C 79. Most evening news weather reports in the United States still use Fahrenheit
temperature readings to convey the warmth of the air in various locations across
the country. These Fahrenheit temperature readings would most likely be
categorized as what level of data?
M A. ordinal
App B. ratio
C. interval
D. nominal
Chapter 1: Introduction to Statistics 21
D 80. Suppose you want to monitor the success or failure of a day trader in the stock
market. To do this, you assume that the trader starts the day with zero earnings.
At the end of the day you determine how many dollars the trader or lost at the end
of the day. These dollar measurements are most likely which level of data?
M A. unit
BApp B. nominal
C. ordinal
D. interval
A 81. At the Olympics, the first three places in each event are awarded a medal. The
winner of an event is awarded a 1 to represent first place, the runner-up is
awarded a 2, and the third place finisher is awarded a 3. These numbers, 1, 2, and
3 are what level of data measurement?
E A. ordinal
App B. nominal
C. ratio
D. interval
E A. interval
Term B. ratio
C. nominal
D. ordinal
D 83. Which type of statistics require that the data be at least interval or ratio level data?
M A. inferential statistics
Term B. descriptive statistics
C. nonparametric statistics
D. parametric statistics
B 84. Nominal and ordinal data are sometimes classified as _______.
H A. metric data
Term B. nonmetric data
C. descriptive data
D. inferential data
D 85. Which of the levels of data measurement have the highest usage potential? That
is, if you have this level of data, you can analyze it in more ways than with other
levels of data?
22 Test Bank
E A. nominal
Term B. ordinal
C. interval
D. ratio
C 86. Moody's Investor's Service uses nine ratings of corporate bonds to assist potential
investors assess their risk.
Rating Meaning
Aaa Best quality
Aa High quality
A Higher medium quality
Baa Lower medium quality
Ba Possess speculative elements
B Lack characteristics of desirable investment
Caa Poor standing
Ca Speculative in a high degree
C Extremely poor prospects
The level of data measurement in Moody's bond ratings is ______________.
M A. ratio
BApp B. interval
C. ordinal
D. nominal
Chapter 1: Introduction to Statistics 23
B 87. Standard & Poor's Corporation uses nine ratings of corporate bonds to assist
potential investors assess their risk.
Rating Meaning
AAA Highest grade
AA High grade
A Upper medium quality
BBB Medium grade
BB Lower Medium grade
B Speculative
CCC Outright speculations
CC Outright speculations
C Income bonds on which no interest is being paid
DDD In default, with rating indicating relative salvage value
DD In default, with rating indicating relative salvage value
D In default, with rating indicating relative salvage value
The level of data measurement in Standard & Poor's bond ratings is
______________.
M A. ratio
BApp B. ordinal
C. interval
D. nominal
D 88. Mac User's magazine rates products for the Apple Macintosh on a scale from one
mouse to five mice. One mouse indicates low value/performance. Five mice
indicate highest value/performance.
M A. nominal
App B. interval
C. ratio
D. ordinal
24 Test Bank
D 89. During a strategy planning session, the executives of Plano Power Plants, Inc.
identified thirty-one significant threats to the future health of the corporation.
Toward the end of the session, Paul Pearson, a management consultant, asked the
executives to rate each threat on a scale of to . The level of data
measurement in Paul's threat rating is ______________.
M A. nominal
BApp B. interval
C. ratio
D. ordinal
A 90. The RSACi system uses five levels (0, 1, 2, 3, and 4) to provide consumers with
information about the level of sex, nudity, violence, offensive language (vulgar or
hate-motivated) in Web sites. A level 4 site may have crude, vulgar language, or
extreme hate speech; while a level 0 site would not have even the mildest
expletives. The level of data measurement in RSACi rating system is ______.
M A. ordinal
BApp B. interval
C. ratio
D. nominal
B 91. The ranking of a company in the Fortune 500 is an example of ______ level of
data measurement.
M A. ratio
BApp B. ordinal
C. categorical
D. nominal
M A. ratio
BApp B. ordinal
C. rank order
D. nominal
A 93. The United States trade balance is an example of ______ level of data
measurement.
M A. ratio
BApp B. ordinal
Chapter 1: Introduction to Statistics 25
C. rank order
D. nominal
D 94. The telephone area code of clients in the United States is an example of ______
level of data measurement.
M A. ratio
BApp B. ordinal
C. rank order
D. nominal
M A. ratio
BApp B. interval
C. rank order
D. nominal
A 96. Per capita income for a geographic region is calculated by the number of people
residing in the region into the total personal income of all persons residing in the
region. Per capita income is an example of ______ level of data measurement.
M A. ratio
BApp B. ordinal
C. interval
D. nominal
M A. nominal
BApp B. ordinal
C. ratio
D. interval
M A. σ
Term B. @
C. &
26 Test Bank
D. ∞
CHAPTER TWO
C 1. A financial analyst has randomly selected 200 companies from those traded on the
NYSE. At the end of each trading day, the analyst records the closing price for
each of the 200 companies. These 200 measurements are an example of
__________.
E A. an ogive
Term B. grouped data
C. raw data
D. a stem and leaf plot
C 2. If data are grouped into intervals and the number of items in each group is listed,
this could be called a _______.
E A. ogive
Term B. histogram
C. frequency distribution
27
28 Test Bank
D 5. If the individual class frequency is divided by the total frequency, the result is the
_______.
M A. midpoint frequency
Term B. cumulative frequency
C. stem and leaf plot
D. relative frequency
E A. an ogive
Term B. a histogram
C. a frequency polygon
D. a stem and leaf plot
E A. 3 and 5
Term B. 7 and 9
C. 5 and 15
D. 1 and 25
B 9. One advantage of a stem and leaf plot over a frequency distribution is that
_______.
B 10. One rule that must always be followed in constructing frequency distributions is
that _______.
A 11. One rule that must always be followed in constructing frequency distributions is
that _______.
D 12. Which of the following is best to show the percentage of a total budget that is
spent on each category of items?
E A. histogram
Term B. ogive
C. stem and leaf chart
D. pie chart
30 Test Bank
E A. 22
Calc B. 11
C. 10.5
D. 11.5
E A. 47
Calc B. 20
C. 22.5
D. 23
E A. 15
Calc B. 7.5
C. 3
D. 1.5
E A. 10
Calc B. 20
C. 15
D. none of the above
Chapter 2: Charts and Graphs 31
E A. 0.15
Calc B. 0.30
C. 0.10
D. none of the above
B 19. Consider the following frequency distribution:
Class Interval Frequency
10-under 20 15
20-under 30 25
30-under 40 10
What is the cumulative frequency of the second class interval?
E A. 25
Calc B. 40
C. 15
D. 50
E A. 10
Calc B. 20
C. 30
D. 40
32 Test Bank
D 21. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the midpoint of the last class?
E A. 80
Calc B. 100
C. 95
D. 90
C 22. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the relative frequency of the second class?
E A. 0.45
Calc B. 0.90
C. 0.225
D. 0.75
C 23. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the cumulative frequency of the third class?
E A. 80
Calc B. 0.40
C. 155
D. 75
Chapter 2: Charts and Graphs 33
A 24. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the approximate range of the number of phone calls arriving each hour?
E A. 80
Calc B. 200
C. 20
D. 100
M A. 3
Calc B. 4
C. 5
D. 9
34 Test Bank
M A. 0.4
Calc B. 0.25
C. 0.20
D. 4
M A. 50
App B. 58
C. 59
D. 100
Chapter 2: Charts and Graphs 35
M A. 0
App B. 10
C. 7
D. 2
M A. close to 40
App B. close to 50
C. equal to 45
D. between 41 and 44
36 Test Bank
M A. 5
App B. 9
C. 13
D. 14
E A. histograms
Term B. pie charts
C. ogives
D. frequency polygons
B 32. An instructor has decided to graphically represent the grades on a test. The
instructor uses a plus/minus grading system (i.e. she gives grades of A-, B+, etc.).
Which of the following would provide the most information for the students?
M A. a histogram
App B. a stem and leaf plot
C. a cumulative frequency distribution
D. a frequency distribution
Chapter 2: Charts and Graphs 37
M A. 2
App B. 3
C. 4
D. 10
B 34. The difference between the highest number and the lowest number in a set of data
is called the _______.
E A. difference
Term B. range
C. polygonal frequency
D. relative frequency
C 35. A person has decided to construct a frequency distribution for a set of data
containing 60 numbers. The lowest number is 23 and the highest number is 68. If
5 classes are used, the class width should be approximately _______.
E A. 4
Calc B. 12
C. 9
D. 5
B 36. A person has decided to construct a frequency distribution for a set of data
containing 60 numbers. The lowest number is 23 and the highest number is 68. If
7 classes are used, the class width should be approximately _______.
E A. 6
Calc B. 7
C. 9
D. 11
D 37. A frequency distribution was developed. The lower endpoint of the first class is
9.30, and the midpoint is 9.35. What is the upper endpoint of this class?
E A. 9.50
Calc B. 9.60
C. 9.70
D. 9.40
38 Test Bank
C 38. The cumulative frequency for a class is 27. The cumulative frequency for the next
(non-empty) class will be _______.
E A. less than 27
App B. equal to 27
C. greater than 27
D. 27 minus the next class frequency
B 39. Which of the following would be most helpful if you wished to construct a pie
chart?
E A. a frequency distribution
App B. a relative frequency distribution
C. a cumulative frequency distribution
D. an ogive
B 40. A person has constructed a frequency distribution for the grades on a test. This
person is not sure how to do this, and thus only 7 classes were developed, and
each class width was set at 10 units. If the lowest possible score is 0 and the
highest possible score is 100, which of the following is true?
A 41. In a histogram, the highest bar represents the class with _______.
C 42. The following class intervals for a frequency distribution were developed to
provide information regarding the starting salaries for students graduating from a
particular school:
Salary Number of Graduates
($1,000s)
18-under 21 -
21-under 25 -
24-under 27 -
29-under 30 -
Before data was collected, someone questioned the validity of this arrangement.
Which of the following represents a problem with this set of intervals?
C 43. The following class intervals for a frequency distribution were developed to
provide information regarding the starting salaries for students graduating from a
particular school:
Salary Number of Graduates
($1,000s)
18-under 21 -
21-under 25 -
24-under 27 -
29-under 30 -
Before data was collected, someone questioned the validity of this arrangement.
Which of the following represents a problem with this set of intervals?
D 44. The following class intervals for a frequency distribution were developed to
provide information regarding the starting salaries for students graduating from a
particular school:
Salary Number of Graduates
($1,000s)
18-under 21 -
21-under 25 -
24-under 27 -
29-under 30 -
Before data was collected, someone questioned the validity of this arrangement.
Which of the following represents a problem with this set of intervals?
E A. 12
Calc B. 20
C. 40
D. 10
E A. 30
Calc B. 50
C. 18
D. 12
E A. 90
Calc B. 80
C. 0.9
D. 54
Chapter 2: Charts and Graphs 43
E A. 100
Calc B. 150
C. 25
D. 250
E A. 0.45
Calc B. 0.70
C. 0.30
D. 0.33
E A. 25
Calc B. 45
C. 70
D. 250
44 Test Bank
E A. 100
Calc B. 25
C. 300
D. 400
E A. 15
Calc B. 350
C. 300
D. 200
B 56. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The relative frequency of the first class interval is _________.
E A. 0.50
BCalc B. 0.33
C. 0.40
D. 0.27
Chapter 2: Charts and Graphs 45
C 57. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The cumulative frequency of the second class interval is _________.
E A. 1,500
BCalc B. 500
C. 900
D. 1,000
D 58. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The approximate range of the data is _________.
E A. 1,500
BCalc B. 2
C. 400
D. 10
46 Test Bank
C 59. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The midpoint of the first class interval is _________.
E A. 500
BCalc B. 2
C. 1.5
D. 1
D 60. The staffs of the accounting and the quality control departments rated their
respective supervisor's leadership style as either (1) authoritarian or (2)
participatory. Sixty-eight percent of the accounting staff rated their supervisor
"authoritarian," and thirty-two percent rated him "participatory." Forty percent of
the quality control staff rated their supervisor "authoritarian," and sixty percent
rated her "participatory." The best graphic depiction of these data would be two
___________________.
E A. histograms
BApp B. frequency polygons
C. ogives
D. pie charts
Chapter 2: Charts and Graphs 47
B 61. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Accelerated life tests were
performed on a sample of 100 batteries, and the following relative frequency
distribution was compiled.
Battery Life Relative Frequency
(months)
40-under 50 0.05
50-under 60 0.10
60-under 70 0.25
70-under 80 0.50
80-under 100 0.10
The number of batteries in 40-under 50 interval was _________.
E A. 45
BCalc B. 5
C. 10
D. 15
A 62. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Accelerated life tests were
performed on a sample of 100 batteries, and the following relative frequency
distribution was compiled.
Battery Life Relative Frequency
(months)
40-under 50 0.05
50-under 60 0.10
60-under 70 0.25
70-under 80 0.50
80-under 100 0.10
The number of batteries in 60-under 70 interval was _________.
E A. 25
BCalc B. 65
C. 40
D. 60
48 Test Bank
D 63. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Accelerated life tests were
performed on a sample of 100 batteries, and the following relative frequency
distribution was compiled.
Battery Life Relative Frequency
(months)
40-under 50 0.05
50-under 60 0.10
60-under 70 0.25
70-under 80 0.50
80-under 100 0.10
The number of batteries which lasted less than 60 months was _________.
M A. 10
BCalc B. 55
C. 5
D. 15
C 64. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter Number of Holes
(inches)
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
The percentage of holes under 1" in diameter was _____________.
E A. 33%
BCalc B. 60%
C. 67%
D. 50%
Chapter 2: Charts and Graphs 49
B 65. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter Number of Holes
(inches)
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
The number of holes under 1" in diameter was _____________.
M A. 20
BCalc B. 60
C. 25
D. 30
B 66. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter Number of Holes
(inches)
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
The midpoint of the third class interval is _____________.
E A. 1.025
BCalc B. 25
C. 35
D. 0.975
50 Test Bank
C 67. The U.S. PC market is very competitive. In 1998 unit-shipment market shares
were: Dell 13.4%; Compaq 15.0%; Gateway 8.2%; Hewlett-Packard 8.4%; IBM
8.9%; and others 46.1%.
The best graphic depiction of these data would be ___________________.
E A. a histogram
BApp B. a frequency polygon
C. a pie chart
D. an ogive
B 68. The U.S. PC market is very competitive. In 1998 unit-shipment market shares
were: Dell 13.4%; Compaq 15.0%; Gateway 8.2%; Hewlett-Packard 8.4%; IBM
8.9%; and others 46.1%. In 1999 unit-shipment market shares were: Dell 17.1%;
Compaq 15.3%; Gateway 9.3%; Hewlett-Packard 8.2%; IBM 7.6%; and others
42.5%.
The best graphic depiction of these data would be ___________________.
E A. a pie chart
BApp B. two pie charts
C. a histogram
D. two histograms
E A. a pie chart
BApp B. a histogram
C. a frequency polygon
D. an ogive
E A. a histogram
BApp B. two histograms
C. a pie chart
Chapter 2: Charts and Graphs 51
1999 2000
C 22% C 19%
A
A B B
35%
33% 45% 46%
Which of the following is true?
A 72. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts. Total sales for this
industry were $1.5 billion in 1999 and $1.8 billion in 2000.
1999 2000
C 22% C 19%
A
A B B
35%
33% 45% 46%
Company C’s sales in 2000 were ___________.
E A. $342 million
BApp B. $630 million
C. $675 million
D. $828 million
52 Test Bank
D 73. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts. Total sales for this
industry were $1.5 billion in 1999 and $1.8 billion in 2000.
1999 2000
C 22% C 19%
A
A B B
35%
33% 45% 46%
Company B’s sales in 1999 were ___________.
E A. $342 million
BApp B. $630 million
C. $675 million
D. $828 million
A 74. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts.
1999 2000
C 22% C 19%
A
A B B
35%
33% 45% 46%
Which of the following MAY BE a false statement?
E
BCalc
76. Liz Chapa manages a portfolio of 250 common stocks. Her staff compiled the
following frequency distribution of dividends received (in $/share) during the
previous year.
Dividends Number of Stocks
($/share)
$0-under $0.50 25
0.50-under 1.00 50
1.00-under 1.50 100
1.50-under 2.00 50
2.00-under 2.50 25
Construct a frequency polygon of the dividend frequency distribution on the
following grid.
E
BCalc
54 Test Bank
77. Liz Chapa manages a portfolio of 250 common stocks. Her staff compiled the
following frequency distribution of dividends received (in $/share) during the
previous year.
Dividends ($/share) Number of Stocks
$0-under $0.50 25
0.50-under 1.00 50
1.00-under 1.50 100
1.50-under 2.00 50
2.00-under 2.50 25
E
BCalc
78. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter (inches) Number of Holes
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
E
BCalc
Chapter 2: Charts and Graphs 55
79. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Measurements from the
ninety holes were compiled to form the following frequency distribution.
Hole Diameter (inches) Number of Holes
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
E
BCalc
80. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Measurements from the
ninety holes were compiled to form the following frequency distribution.
Hole Diameter (inches) Number of Holes
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
Construct a cumulative frequency ogive on the following grid.
E
BCalc
56 Test Bank
B 81. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.
E A. 200
BCalc B. 500
C. 300
D. 100
D 82. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.
The percentage of sales transactions on Saturday that were under $100 each was
_____________.
M A. 100
BCalc B. 10
C. 80
D. 20
Chapter 2: Charts and Graphs 57
C 83. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.
The percentage of sales transactions on Saturday that were at least $100 each was
_____________.
M A. 100
BCalc B. 10
C. 80
D. 20
C 84. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.
The percentage of sales transactions on Saturday that were between $100 and
$150 was _____________.
M A. 20%
BCalc B. 40%
C. 60%
D. 80%
58 Test Bank
D 85. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and a histogram of sales transactions by dollar value of the transactions. Friday's
histogram follows.
E A. 50
BCalc B. 100
C. 150
D. 200
C 86. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and a histogram of sales transactions by dollar value of the transactions. Friday's
histogram follows.
E A. 100
BCalc B. 200
C. 300
Chapter 2: Charts and Graphs 59
D. 400
D 87. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.
The total number of walk-in customers included in the study was _________.
E A. 100
BCalc B. 250
C. 300
D. 450
A 88. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.
The percentage of walk-in customers waiting one minute or less was _________.
E A. 22%
BCalc B. 11%
C. 67%
D. 10%
60 Test Bank
Chapter 2: Charts and Graphs 61
B 89. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.
The percentage of walk-in customers waiting more than 6 minutes was ______.
E A. 22%
BCalc B. 11%
C. 67%
D. 10%
C 90. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.
The percentage of walk-in customers waiting between 1 and 6 minutes was ___.
M A. 22%
BCalc B. 11%
C. 67%
D. 10%
62 Test Bank
D 91. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a frequency histogram of waiting time for walk-in customers.
E A. 20
BCalc B. 30
C. 100
D. 180
B 92. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a frequency histogram of waiting time for walk-in customers.
E A. 20
BCalc B. 30
C. 100
D. 180
Chapter 2: Charts and Graphs 63
B 93. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.
100%
Cumulative Percentage
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
M arket Capitalization
($1,000,000)
E A. 200
BCalc B. 26
C. 12
D. 60
64 Test Bank
C 94. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.
100%
Cumulative Percentage 90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
M arket Capitalization
($1,000,000)
E A. 38
BCalc B. 26
C. 62
D. 43
Chapter 2: Charts and Graphs 65
A 95. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.
100%
Cumulative Percentage
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
M arket Capitalization
($1,000,000)
E A. 38
BCalc B. 26
C. 62
D. 43
66 Test Bank
B 96. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.
100%
Cumulative Percentage 90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
M arket Capitalization
($1,000,000)
E A. 38
BCalc B. 85
C. 62
D. 15
Chapter 2: Charts and Graphs 67
D 97. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.
100%
Cumulative Percentage
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
M arket Capitalization
($1,000,000)
E A. 38
BCalc B. 85
C. 62
D. 15
68 Test Bank
B 98. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a frequency histogram of market capitalization of the 937 corporations
listed on the American Stock Exchange in January 2000.
400
200
0
$100 $200 $300 $400 $500
Market Capitalization
($1,000,000)
E A. 50
BCalc B. 100
C. 700
D. 800
Chapter 2: Charts and Graphs 69
D 99. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a frequency histogram of market capitalization of the 937 corporations
listed on the American Stock Exchange in January 2000.
Number of Issues
600
400
200
0
$100 $200 $300 $400 $500
Market Capitalization
($1,000,000)
E A. 50
BCalc B. 100
C. 700
D. 800
B 100. Destiny Houston needs to prepared several graphics for a report on competition
and growth of Internet advertising. Which Excel feature is most useful for this
task?
E A. Solver
Term B. Chart Wizard
C. Data Analysis
D. Pivot Table
Quantitative methods for
Management
Model paper
• A survey in which customers taste five
different brands of ice cream, and rank their
favorites from 1 to 5, would be an example of
which type of scale of measurement?
– Ordinal
– Nominal
– Interval
– Ratio
• State whether the following question provided
is qualitative or quantitative data and
indicates the measurement scale appropriate -
What is your age?
– Qualitative, ratio
– Quantitative, ratio
– Qualitative, nominal
– Quantitative, ordinal
• Abel Alonzo, Director of Human Resources, is
exploring the causes of employee absenteeism at
Batesville Bottling during the last operating year
(January 1, 1999 through December 31, 1999). For
this study, the set of all employees who worked at
Batesville Bottling during the last operating year is
a(a)____________________.
c. cluster sampling
d. judgment sampling
• Stratified random sampling is a method of
selecting a sample in which
a. the sample is first divided into strata, and then
random samples are taken from each stratum
b. various strata are selected from the sample
c. the population is first divided into strata, and
then random samples are drawn from each
stratum
d. None of these alternatives is correct.
• A tabular representation of the payoffs for a
decision problem is a
• a. decision tree
• b. payoff table
• c. matrix
• d. sequential matrix
• For a decision alternative, the weighted
average of the payoffs is known as
a. the expected value of perfect information
b. the expected value
c. the expected probability
d. perfect information
• The number of degrees of freedom for the
appropriate chi-square distribution in a test of
independence is
• a. n-1
• b. K-1
• c. number of rows minus 1 times number of
columns minus 1
• d. a chi-square distribution is not used
• In order to determine whether or not a particular medication was
effective in curing the common cold, one group of patients was given the
medication, while another group received sugar pills. The results of the
study are shown below.We are interested in determining whether or not
the medication was effective in curing the common cold.The test statistic
is
1. State whether the following question provided is qualitative or quantitative data and
indicate the measurement scale appropriate -Are you a male or female?
a. Qualitative, ratio
b. Quantitative, ratio
c. Qualitative, nominal
d. Quantitative, ordinal
2. State whether the following question provided is qualitative or quantitative data and
indicate the measurement scale appropriate - How long have you been in your present
job or position
a. Qualitative, ratio
b. Quantitative, ratio
c. Qualitative, nominal
d. Quantitative, ordinal
a) 12 b) 8 c) 7 d) 7.5
4. The hourly wages of a sample of 130 system analysts are given below.
mean = 60 range = 20
median = 74
a. 0.30%
b. 30%
c. 5.4%
d. 54%
5. The variance of a sample of 169 observations equals 576. The standard deviation of the
sample equals
a. 13
b. 24
c. 576
d. 28,461
a. mode
b. mean
c. 50th percentile
7. A sample selected in such a manner that each sample of size n has the same probability of
being selected is
a. a convenience sample
b. a judgment sample
c. nonprobabilistic sampling
a. judgment sampling
b. convenience sampling
c. cluster sampling
9. An uncertain future event affecting the consequence, or payoff, associated with a decision is
known as
a. unconditional probability
b. unknown probability
c. chance event
d. uncertain probability
a. decision nodes
b. chance nodes
c. marginal nodes
d. conditional nodes
12. Below you are given a payoff table involving three states of nature and two decision
alternatives.
Alternative S1 S2 S3
A 80 45 -20
B 40 50 15
The probability that S1 will occur is 0.1; the probability that S2 will occur is 0.6. The
recommended decision based on the expected value criterion is
a. A
b. B
13. The degrees of freedom for a contingency table with 6 rows and 3 columns is
a. 18
b. 15
c. 6
d. 10
14. If the coefficient of determination is a positive value, then the coefficient of correlation
b. must be zero
15. Regression analysis is a statistical procedure for developing a mathematical equation that
describes how
16. Given below are five observations collected in a regression study on two variables x
(independent variable) and y (dependent variable). Develop the least squares estimated
regression equation
x y
10 7
20 5
30 4
40 2
50 1
a. Y = 8.3-0.15x
b. Y= 9+0.15x
c. Y= 8.3+0.15x
d. 9-0.15x
17. As the sample size increases, the margin of error
a. increases
b. decreases
c. stays the same
18. In computing the standard error of the mean, the finite population correction factor is used
when
b. N/n 0.05
d. n/N 30
19. Z is a standard normal random variable. What is the value of Z if the area to the right of Z is
0.9803?
a. -2.06
b. 0.4803
c. 0.0997
d. 3.06
20. For a standard normal distribution, the probability of obtaining a z value between -2.4 to -2.0
is
a. 0.4000
b. 0.0146
c. 0.0400
d. 0.5000
21. For a standard normal distribution, the probability of obtaining a z value of less than 1.6 is
a. 0.1600
b. 0.0160
c. 0.0016
d. 0.9452
22. The number of electrical outages in a city varies from day to day. Assume that the number of
electrical outages (x) in the city has the following probability distribution.
x f(x)
0 0.80
1 0.15
2 0.04
3 0.01
The mean and the standard deviation for the number of electrical outages (respectively) are
c. 3 and 0.01
d. 0 and 0.8
23. Assume that you have a binomial experiment with p = 0.4 and a sample size of 50. The
variance of this distribution is
a. 20
b. 12
c. 3.46
d. 144
24. In a binomial experiment the probability of success is 0.06. What is the probability of two
successes in seven trials?
a. 0.0036
b. 0.0600
c. 0.0555
d. 0.2800
25. If P(A) = 0.62, P(B) = 0.47, and P(A B) = 0.88, then P(A U B) =
a. 0.2914
b. 1.9700
c. 0.6700
d. 0.2100
Quantitative Methods in
Management
Term II
4 credits
MGT 408
Business Statistics
A First course
David M.Levine
Kathryn A.Szabat
David F.Stephan
P.K.Viswanathan
PEARSON PUBLICATIONS 7e
Additional Readings
• Statistics for Business and Economics- Anderson, Sweeney , Williams
• A survey reported women were more likely than men to cite seeing photos
or videos, sharing with man people at one, seeing entertaining or funny
posts, learning about ways to help others, and receiving support from
people in your network as reasons to use Facebook.
• Data are facts about the world and are constantly reported as
numbers by an ever increasing number of sources.
• They can count on other people’s summaries of data and hope they
are correct.
• They can develop their own capability and insight into data by
learning about statistics and its application to business.
Statistics Is Evolving So Businesses Can Use The
Vast Amount Of Data Available
Decision
making
Knowledge
Information
DATA
What is Statistics?
“Statistics is a way to get information from data”
Statistics
Data Information
The word Statistics derived from the Latin word ‘status’
meaning a state
Statistics is a tool for creating new understanding from a set of
numbers.
Statistics – A way of thinking
Methods that allow to work with data effectively
Method which help to make better decisions
DEFINITION
STATISTICS
COLLECTION
COMPILATION
CLASSIFICATION
PRESENTATION
ANALYSIS &
INTERPRETATION OF DATA
Statistics
• Art and Science of Collecting and Understanding DATA:
• DATA = Recorded Information
• e.g., Sales, Productivity, Quality, Costs, Return, …
• Why? Because you want:
• Best use of imperfect information:
• e.g., 50,000 customers, 1,600 workers, 386,000 transactions,…
• Good decisions in uncertain conditions:
• e.g., new product launch: Fail? OK? Make you rich?
• Competitive Edge
• e.g., for you and your business!
To Properly Apply Statistics Follow A Framework To Minimize
Possible Errors
DCOVA
• Big data
• Collections of data that cannot be easily browsed or analyzed using traditional
methods.
• Use information systems’ methods to collect and process data sets of all sizes,
including very large data sets that would otherwise be hard to examine efficiently
* The total number of data values in a complete data set is the number of elements
multiplied by the number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Variables
Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)
Data Set
How Many Variables?
• Univariate data set: One variable measured for each
elementary unit
• e.g., Sales for the top 30 computer companies.
• Can do: Typical summary, diversity, special features
• Bivariate data set: Two variables
• e.g., Sales and # Employees for top 30 computer firms
• Can also do: relationship, prediction
• Multivariate data set: Three or more variables
• e.g., Sales, # Employees, Inventories, Profits, …
• Can also do: predict one from all other variables
Types of Variables
Categorical (qualitative) variables have values that can only be placed
into categories, such as “yes” and “no.”
Ordinal Ratio
• Students of a university are classified by the school in which they are enrolled
using a nonnumeric label such as Business, Humanities, Education, and so on.
• Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes
Business, 2 denotes Humanities, 3 denotes Education, and so on).
• Students of a university are classified by their class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
• Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Example of Ordinal Measurement
1 f
6 i
2 n
4 i
s
3
5 h
Ordinal Data
1 2 3 4 5
Numbers or Categories?
• Quantitative Variable: Meaningful numbers
• e.g., Sales, # Employees
• Can add, rank, count
• Qualitative Variable: Categories
• Ordinal Variable: Categories with meaningful ordering
• e.g., Bond rating (AA, A, B, …), Diamonds (VSI, SI, …)
• Can rank, count
• Nominal Variable: categories without meaningful ordering
• e.g., State, Type of business, Field of study
• Can count
Interval Level Data
• Distances between consecutive integers are equal
• The data have the properties of ordinal data, and the interval between
observations is expressed in terms of a fixed unit of measure.
• Interval data are always numeric.
Data
Categorical Quantitative
Categorical Numerical
Examples:
Marital Status
Political Party Discrete Continuous
Eye Color
(Defined categories) Examples: Examples:
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured characteristics)
Example
Nominal
Data Level, Operations,
and Statistical Methods
Statistical
Data Level Meaningful Operations
Methods
Elementary unit
defined by “year” Quantitative data
Stock Market – Time Series
• Dow Jones Stock Index, monthly since 1928
Dow Jones Industrial Stock Market Index, Monthly from 1928 to early 2011
16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Basic Vocabulary of Statistics
Basic Vocabulary of Statistics
POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.
SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a numerical measure that describes a characteristic
of a population.
STATISTIC
A statistic is a numerical measure that describes a characteristic of
a sample.
Population vs. Sample
Population Sample
Subset
Parameter Statistic
Populations have Parameters Samples have Statistics.
Descriptive measures of population descriptive measures of sample
σ
2
denotes population variance
σ denotes population standard deviation
Symbols for
Sample Statistics
• Collect data
• e.g., Survey
• Present data
• e.g., Tables and graphs
• Characterize data
• e.g., Sample mean =
∑X i
n
Inferential Statistics
• Estimation
• e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
• e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a large group of individuals based on a subset of the
large group.
Descriptive Statistics
Most of the statistical information in newspapers,
magazines, company reports, and other
publications consists of data that are summarized
and presented in a form that is easy to
understand.
Population
Sample
Inference
Statistic
Parameter
Select a
random sample
Sources of data collection
Collecting Data Correctly Is A Critical Task
DCOVA
Need to avoid data flawed by biases,
ambiguities, or other types of errors.
Secondary Sources: The person performing data analysis is not the data collector
Analyzing census data
Examining data from print journals or data published on the internet.
Government data: economics and demographics
Media reports – TV, newspapers, Internet
Companies that specialize in gathering data
Sources of data fall into five
categories DCOVA
• Data distributed by an organization or an individual
TABULAR
DIAGRAMS
GRAPHS
• TABULATION
SPECIMEN OF A TABLE
Total Grand
Total
Foot Note
Sources
DESCRIPTIVE STATISTICS:
ORGANIZING AND VISUALIZING
VARIABLES
CHAPTER 2
Descriptive Statistics:
Tabular and Graphical
Presentations
• Summarizing Categorical Data
Summarizing Quantitative Data
Tallying Data
One Two
Categorical Categorical
Variable Variables
Summary Contingency
Table Table
Organizing Categorical Data: Summary Table
DCOVA
A summary table tallies the frequencies or percentages of items in a set
of categories so that you can see differences between categories.
Drink Frequency
Coke 7
Pepsi 1
Mirinda 3
7 Up 4
Total 15
Frequency Distribution…
Soft Drink Frequency Relative Percent
frequency frequency
Coke 7 0.46 46
Pepsi 1 0.07 7
Mirinda 3 0.20 20
7 Up 4 0.27 27
Total 15 1.00 100
Frequency Distribution
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average, below
average, or poor. The ratings provided by a sample of 20 guests are:
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
A Contingency Table Helps Organize Two or More
Categorical Variables
DCOVA
• Used to study patterns that may exist between
the responses of two or more categorical
variables
Numerical Data
You must give attention to selecting the appropriate number of class groupings for the
table, determining a suitable width of a class grouping, and establishing the
boundaries of each class grouping to avoid overlapping.
The number of classes depends on the number of values in the data. With a larger
number of values, typically there are more classes. In general, a frequency
distribution should have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the range (Highest value–
Lowest value) of the data by the number of class groupings desired.
Organizing Numerical Data:
Frequency Distribution Example
DCOVA
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53,
27
Organizing Numerical Data:
Frequency Distribution Example
DCOVA
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits):
Class 1: 10 but less than 20
Class 2: 20 but less than 30
Class 3: 30 but less than 40
Class 4: 40 but less than 50
Class 5: 50 but less than 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Organizing Numerical Data:
Frequency Distribution
Example
Data in ordered array:
DCOVA
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
NO. OF
HOURS
WORKERS
LESS THAN 10 5
LESS THAN 30 15
LESS THAN 60 30
LESS THAN 90 50
• MORE THAN CUMULATIVE FREQUENCY SERIES
10 – 19 17
20 – 29 15
30 – 39 12
40 – 49 10
• EXCLUSIVE CLASS INTERVAL
NO. OF
REVENUE (RS.)
PRODUCTS
100 – 200 15
200 – 300 20
300 – 400 10
400 – 500 5
TOTAL 50
• OPEN END CLASS INTERVAL
30 58 37 50 30
53 40 30 47 49
Ages of a Sample of
Managers from
50 40 32 31 40 Urban Child Care
52 28 23 35 25 Centers in the
United States
30 36 32 26 50
55 30 58 64 52
49 33 43 46 32
61 31 30 40 60
74 37 29 43 54
Frequency Distribution
of Child Care Manager’s
Ages
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Data Range
42 26 32 34 57 Range = Largest - Smallest
30 58 37 50 30
53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52 Smallest
49 33 43 46 32
61 31 30 40 60 Largest
74 37 29 43 54
Number of Classes and Class
Width
• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive
summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
51
Approximate Class Width = = 8.5
6
Class Width = 10
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
Cumulative Frequency
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
Class Midpoints, Relative Frequencies, and
Cumulative Frequencies
Relative Cumulative
Class Interval Frequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative
Frequencies
Cumulative
Relative Cumulative Relative
Class Interval Frequency Frequency Frequency Frequency
20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Cumulative Distributions
Summary Contingency
Table For One Table For Two
Variable Variables
The “Vital
Few”
Visualizing Categorical Data:
Side By Side Bar Charts DCOVA
The side by side bar chart represents the data from a contingency table.
No
Errors Errors Total
Invoice Size Split Out By Errors
Small 50.75% 30.77% 47.50% & No Errors
Amount
Medium 29.85% 61.54% 35.00% Errors
Amount
Large 19.40% 7.69% 17.50% No Errors
Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
Total 100.0% 100.0% 100.0% Large Medium Small
Frequency Distributions
Ordered Array and
Cumulative Distributions
Stem-and-Leaf
Histogram Polygon Ogive
Display
Stem-and-Leaf Display
DCOVA
Frequency
4
(In a percentage
histogram the
vertical axis would 2
be defined to show
the percentage of 0
observations per
class) 5 15 25 35 45 55
Visualizing Numerical Data:
The Polygon
DCOVA
Two Numerical
Variables
Scatter Time-
Plot Series
Plot
Visualizing Two Numerical
Variables: The Scatter Plot
DCOVA
Scatter plots are used for numerical data consisting of paired
observations taken from two numerical variables
C o s t p er D ay
33 160 150
38 167 100
42 170 50
50 188 0
20 30 40 50 60 70
55 195
Volume per Day
60 200
Visualizing Two Numerical
Variables: The Time Series Plot
DCOVA
• A Time-Series Plot is used to study patterns
in the values of a numeric variable over time
100
50
0
1994 1996 1998 2000 2002 2004 2006
Year
Organizing Many Categorical Variables: The
Multidimensional Contingency Table
DCOVA
• A multidimensional contingency table is constructed by tallying
the responses of three or more categorical variables.
• Chartjunk
An Example of Selective Summarization, These Two
Summarizations Tell Totally Different Stories
DCOVA
Change
from Prior
Company Year Company Year 1 Year 2 Year 3
A +7.2% A -22.6% -33.2% +7.2%
B +24.4% B -4.5% -41.9% +24.4%
C +24.9% C -18.5% -31.5% +24.9%
D +24.8% D -29.4% -48.1% +24.8%
E +12.5% E -1.9% -25.3% +12.5%
F +35.1% F -1.6% -37.8% +35.1%
G +29.7% G +7.4% -13.6% +29.7%
How Obvious Is It That Both Pie Charts Summarize The Same Data?
DCOVA
100 10%
0 0%
FR SO JR SR FR SO JR SR
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Graphical Errors: No Zero Point on the
Vertical Axis
DCOVA
Total 30 20 35 15 100
Crosstabulation
Example: Finger Lakes Homes
Insights Gained from Preceding Crosstabulation
• The greatest number of homes (19) in the sample
are a split-level style and priced at less than
$200,000.
• Only three homes in the sample are an A-Frame
style and priced at $200,000 or more.
Crosstabulation
Frequency
Example: Finger Lakes Homes distribution
for the
price range
variable
Total 30 20 35 15 100
Price
Rating 10 - 19 20 - 29 30 - 39 Total
Good 1 2 1 4
25% 50% 25% 100%
Very Good 2 2 0 4
50% 50% 100%
Excellent 2 0 2 4
50% 50% 100%
Total 5 4 3 12
Cross Tabulation …
Problem - In a study of job satisfaction for 4 occupations
– higher the scores indicate high satisfaction – Provide a
cross tab of occupation & satisfaction score
CLASS
0–5 5 – 10 10 – 15 15 – 20
INTERVAL
0 – 10 1 - 2 -
10 – 20 4 3 - -
20 – 30 - - 1 -
30 – 40 2 - 1 -
Scatter Diagram and Trendline
x
Scatter Diagram
A Negative Relationship
y
x
Scatter Diagram
No Apparent Relationship
y
x
Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
Scatter Diagram
y
35
Points Scored.
25
Number of 20
15
10
5
0
0 1 2 3 4
Number of Interceptions
Tabular and GraphicalDataMethods
Categorical Data Quantitative Data
60 (89.5, 76)
40
20
Parts
Cost ($)
50 60 70 80 90 100 110
Histogram – more insight
18
Tune-up Parts Cost
16
14
Frequency
12
10
8
6
4
2
Parts
50−59 60−69 70−79 80−89 90−99 100-110 Cost ($)
Histograms Showing Skewness
Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness
.25
.20
.15
.10
.05
0
Frequency Distribution…
Example – BMW manufactures racing cars and
has gathered the follg info on the number of
models of engines in different size categories used
in the racing market it serves.
(Page :33-98)
• Categorical data can be graphically represented by using a(n)
• a. histogram
• b. frequency polygon
• c. ogive
• d. bar chart
• A questionnaire provides 58 yes, 42 no and 20 no-opinion answers.
• In the construction of a pie chart , how many degrees would be in the section
of the pie showing the Yes answers?
• How many degrees would be in the section of the pie showing the No
answers.
Quantitative Methods in
Management
Day-4
Recap..
• Introduction
• Definition
• Terms and terminologies
– Population, sample, parameter, statistic, element, data,
datasets, variable,
• Types of statistics
• Types of data
• Types of variables – qualitative, quantitative and
time series
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing variables
• Tables
– Frequency distribution
– Relative frequency distribution
– Relative percent frequency distribution
– Cumulative frequency distribution
– Univariate
– Bivariate / cross tabulation
• Diagrams
– Bar charts
– Pie charts
• Graphs
– Histogram
– Frequency polygon
– Frequency curve
– Cumulative frequency curve ( Ogive)
• EDA
– Stem and leaf plot
– Scatter diagram
– Dot plots
– Pareto chart
Numerical descriptive statistics
Day 3
Pg. 99-148
.
.
. . ……. . . . .. . ..
. . . .
…
. . . . . . . … …..
…
. ....… . . .. . .
..
.
.
..
.
• MCT .
. . ……. . . . .. . ..
. . . .
…
. ......… X …..
…
. ....… . . .. . .
..
.
.
..
.
• MCT .
. . ……. . . . .. . ..
. . . .
…
• MD . ......… X …..
…
. ....… . . .. . .
..
.
.
..
.
• MCT .
. . ……. . . . .. . ..
. . . .
…
• MD . ......… X …..
…
. ....… . . .. . .
..
• Positive or .
.
Negative (SKEW) ..
Objectives
– For a sample
n of size n:
∑X i
X1 + X 2 + L + Xn
X= i=1
=
n n
Sample size Observed values
Population Mean
µ=
∑ X
= X +X
1 2
+ X 3
+ ... + X N
N N
24 + 13 + 19 + 26 + 11
=
5
93
=
5
= 18 . 6
Measures of Central Tendency:
The Mean (con’t) DCOVA
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean = 13 Mean = 14
11 + 12 + 13 + 14 + 15 65 11 + 12 + 13 + 14 + 20 70
= = 13 = = 14
5 5 5 5
Properties of AM
• Sum of deviations from AM is ZERO
• Sum of squares of deviation taken from AM
will be minimum
• Combined mean
• It is affected by change of scale and change of
origin
Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean.
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
Weighted Mean
x= ∑ wxi i
∑w i
where:
xi = value of observation i
wi = weight for observation i
Weighted mean
Purchase Cost per Number of
Pound($) pounds
1 3.00 1200
2 3.40 500
3 2.80 2750
4 2.90 1000
5 3.25 800
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
n +1
Median position = position in the ordered data
2
• If the number of values is odd, the median is the middle number
• If the number of values is even, the median is the average of the two
middle numbers
Note that
n + 1 is not the value of the median, only the position of
2
the median in the ranked data
Percentiles
• Measures of central tendency that divide a group
of data into 100 parts
• At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data lie
above the nth percentile
Q1 Q2 Q3
50 116+121
Q2: i= (8) = 4 Q2 = = 1185
.
100 2
75 122+125
Q3: i= (8) = 6 Q3 = = 1235
.
100 2
Measures of Central Tendency:
The Mode
DCOVA
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical
data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
Mode
• The most frequently occurring value in a data
set
• Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)
37 43 44 46
39 43 44 46
40 43 44 46
40 43 45 48
Measures of Central Tendency:
Review Example
DCOVA
House Prices: Mean: ($3,000,000/5)
$2,000,000 = $600,000
$ 500,000
$ 300,000
Median: middle value of ranked
$ 100,000 data
$ 100,000 = $300,000
Sum $ 3,000,000 Mode: most frequent value
= $100,000
Measures of Central Tendency:
Which Measure to Choose?
DCOVA
The mean is generally used, unless extreme
values (outliers) exist.
The median is often used, since the median is
not sensitive to extreme values. For example,
median home prices may be reported for a
region; it is less sensitive to outliers.
In some situations it makes sense to report
both the mean and the median.
Measures of Central Tendency:
Summary
DCOVA
Central Tendency
∑X i
X= i=1
n Middle value in Most
the ordered frequently
array observed
value
Empirical formula
Variability
No Variability
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread or the
dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
Measures of Variation
Variation DCOVA
Same center,
different variation
Measures of Variation:
The Range
DCOVA
Simplest measure of variation
Difference between the largest and the smallest values:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Measures of Variation:
Why The Range Can Be Misleading
DCOVA
Does not account for how the data are
distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
40 43 45 48
Interquartile Range
Interquartile Range = Q 3 − Q1
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:
µ=
∑ X 65 = = 13
N 5
• Deviations from the mean: -8, -4, 3, 4, 5
+5
-4 +4
-8 +3
0 5 10 15 20
µ
Mean Absolute Deviation
• Average of the absolute deviations from the
mean
X X − µ X − µ ∑ X −µ
M . A. D. =
5 -8 +8 N
9 -4 +4
16 +3 +3 24
17 +4 +4 =
18 +5 +5 5
0 24 = 4.8
Population Variance
• Average of the squared deviations from the
arithmetic mean
X − µ (X
X )
∑ (X − µ )
2
−µ 2
σ
2
5 -8 64 =
9 -4 16 N
16 +3 9
130
=
17 +4 16
18 +5 25
0 130
5
= 2 6 .0
Population Standard Deviation
• Square root of the
variance
∑ (X − µ )
2
X − µ (X ) σ
2
X −µ
2
=
N
5 -8 64 130
9 -4 16 =
16 +3 9 5
17
18
+4
+5
16
25 = 2 6 .0
0 130
σ = σ
2
= 2 6 .0
= 5 .1
Measures of Variation:
The Sample Variance
DCOVA
• Average (approximately) of squared
deviations of values from the mean
n
– Sample variance: ∑ (X − X)
i
2
S =2 i=1
n -1
Where X= arithmetic mean
n = sample size
Xi = ith value of the variable X
Measures of Variation:
The Sample Standard Deviation
DCOVA
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
n
S= i=1
n -1
Measures of Variation:
The Standard Deviation
DCOVA
Steps for Computing Standard Deviation
n=8 Mean = X = 16
11 12 13 14 15 16 17 18 19 20 21
S = 3.338
A 15% 3%
B 15% 7%
3-60
Measures of Variation:
The Coefficient of Variation
DCOVA
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare the variability of two or
more sets of data measured in different units
S
CV = ⋅ 100%
X
Measures of Variation:
Comparing Coefficients of Variation
DCOVA
• Stock A:
– Average price last year = $50
– Standard deviation = $5
S $5
CVA = ⋅ 100% = ⋅ 100% = 10%
X $50 Both stocks have
• Stock B: the same
standard
– Average price last year = $100 deviation, but
stock B is less
– Standard deviation = $5 variable relative
to its price
S $5
CVB = ⋅ 100% = ⋅ 100% = 5%
X $100
Measures of Variation:
Comparing Coefficients of Variation (con’t)
• Stock A:
DCOVA
– Average price last year = $50
– Standard deviation = $5
S $5
CVA = ⋅ 100% =
⋅ 100% = 10%
X $50 Stock C has a
• Stock C: much smaller
standard
– Average price last year = $8 deviation but a
much higher
– Standard deviation = $2 coefficient of
variation
S $2
CVC = ⋅ 100% = ⋅ 100% = 25%
X $8
Coefficient of Variation
µ = 29
1
µ = 84
2
σ 1
= 4.6 σ 2
= 10
σ (100) σ (100)
. .=µ
CV 1
1
. .=µ
CV 2
2
1 2
4.6 10
= (100) = (100)
29 84
= 1586
. = 1190
.
Measures of shapes
skewness
Shape of a Distribution
DCOVA
Skewness
<0 0 >0
Statistic
Skewness
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Distribution Shape: Skewness
.25
.20
.15
.10
.05
0
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
Shape of a Distribution -- Kurtosis measures how sharply the
curve rises approaching the center of the distribution
DCOVA
Sharper Peak
Than Bell-Shaped
(Kurtosis > 0)
Bell-Shaped
(Kurtosis = 0)
Flatter Than
Bell-Shaped
(Kurtosis < 0)
RELATIVE LOCATION
Z score
Chebyshev's inequality
Empirical rule
Relative location – Z score
* In addition to measures of location, variability, and
shape, we are also interested in the relative location of
values within a data set.
x −x
zi = i
s
The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Locating Extreme Outliers:
Z-Score DCOVA
X−X
Z=
S
A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
z-Scores
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
z-Scores
Example: Apartment Rents
• z-Score of Smallest Value (425)
xi − x 425 − 490.80
z= = = − 1.20
s 54.74
µ
x
µ – 3σ µ – 1σ µ + 1σ µ + 3σ
µ – 2σ µ + 2σ
The Empirical Rule
68%
µ
µ±1σ
The Empirical Rule
• Approximately 95% of the data in a bell-
shaped distribution lies within two standard
deviations of the mean, or µ ± 2σ
• Approximately 99.7% of the data in a bell-
shaped distribution lies within three standard
deviations of the mean, or µ ± 3σ
95% 99.7
%
µ±2σ µ±3σ
Using the Empirical Rule
Suppose that the variable Math SAT scores is bell-shaped with a mean of 500 and a
standard deviation of 90. Then,
– Examples:
At withi
least n
• Approximately one-fourth, or
25%, of the observations are
between adjacent numbers
in a five-number summary.
Box Plot
Minimum Q1 Q2 Q3 Maximum
Five-Number
Summary
Example: Apartment Rents
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
3310
Mean 3540
3355
Standard Error 47.81989569
3450
Median 3505
3480
Mode 3480
3480 Standard Deviation 165.6529779
3490 Sample Variance 27440.90909
3520 Kurtosis 1.718883645
3540 Skewness 1.091108688
3550 Range 615
3650 Minimum 3310
3730 Maximum 3925
3925 Sum 42480
Count 12
General Descriptive Stats
Using Microsoft Excel Data
Analysis Tool
1.
DCOVA
Select Data.
2. Select Data Analysis.
3. Select Descriptive Statistics
and click OK.
General Descriptive Stats
Using Microsoft Excel DCOVA
4. Enter the cell range.
5. Check the Summary
Statistics box.
6. Click OK
Excel output DCOVA
House Prices
Microsoft Excel
Mean 600000
descriptive statistics output, using
the house price data: Standard Error 357770.8764
Median 300000
Mode 100000
Standard Deviation 800000
House Prices: Sample Variance 640,000,000,000
Kurtosis 4.1301
$2,000,000 Skewness 2.0068
500,000 Range 1900000
300,000 Minimum 100000
100,000 Maximum 2000000
100,000 Sum 3000000
Count 5
Minitab Output DCOVA
Minitab descriptive statistics output using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Descriptive Statistics: House Price
Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000
N for
Variable Median Maximum Range Mode Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2 2.01 4.13
Distribution Shape and
The Boxplot DCOVA
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Boxplot Example
DCOVA
0 0022233525 2 3 3 4
27
5
27
5
9 27
Sample statistics versus
population parametersDCOVA
Measure Population Sample
Parameter Statistic
Mean
µ X
Variance
σ2 S2
Standard
σ S
Deviation
Measuring two variables
Co variance
correlation
We Discuss Two Measures Of The Relationship Between
Two Numerical Variables
∑ ( X − X)( Y − Y )
i i
cov ( X , Y ) = i=1
n −1
• Only concerned with the strength of the relationship
• No causal effect is implied
Interpreting Covariance
DCOVA
• Covariance between two variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent
n n n
cov (X , Y) = i =1
SX = i=1
SY = i=1
n −1 n −1 n −1
Features of the
Coefficient of Correlation
DCOVA
• The population coefficient of correlation is referred as ρ.
• The sample coefficient of correlation is referred to as r.
• Either ρ or r have the following features:
– Unit free
– Range between –1 and 1
– The closer to –1, the stronger the negative linear relationship
– The closer to 1, the stronger the positive linear relationship
– The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with Various
Coefficients of Correlation
Y Y
DCOVA
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
The Coefficient of Correlation Using Microsoft Excel
Function
DCOVA
Test #1 Score Test #2 Score Correlation Coefficient
78 82 0.7332 =CORREL(A2:A11,B2:B11)
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77
The Coefficient of Correlation Using Microsoft Excel
Data Analysis Tool
1. Select Data
DCOVA
2. Choose Data Analysis
3. Choose Correlation &
Click OK
The Coefficient of
Correlation
Using Microsoft Excel DCOVA
r = .733
Scatter Plot of Test Scores
100
There is a relatively 95
Test #2 Score
90
relationship between 85
test score #1 and test 80
score #2. 75
70
70 75 80 85 90 95 100
Students who scored Test #1 Score
high on the first test
tended to score high on
second test.
Pitfalls in Numerical
Descriptive Measures
DCOVA
• Data analysis is objective
– Should report the summary measures
that best describe and communicate the
important aspects of the data set
Models with DVD Player Price Models without DVD Player Price
Sony HT-1800DP $450 Pioneer HTP-230 $300
Pioneer HTD-330DV 300 Sony HT-DDW750 300
Sony HT-C800DP 400 Kenwood HTB-306 360
Panasonic SC-HT900 500 RCA RT-2600 290
Panasonic SC-MTI 400 Kenwood HTB-206 300
• Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theatre unit?
• Compute the range, variance, and standard deviation for the two samples. What does
this information tell you about the prices for models with and without a DVD player?
Price with DVD player Price without DVD player
Count 5 Count 5
• The following data were used to construct the histograms of the number
of days required to fill orders for Dawson Supply, Inc., and J.C. Clark
Distributors
• Use the range and standard deviation to support that Dawson Supply
provides the more consistent and reliable delivery times.
dawson clark
Range 2 Range 8
Minimum 9 Minimum 7
Maximum 11 Maximum 15
Count 10 Count 10
• The following times were recorded by the quarter-mile and mile runners
of a university track team (times are in minutes).
Quarter-Mile Times: .92 .98 1.04 .90 .99
Mile Times: 4.52 4.35 4.60 4.70 4.50
After viewing this sample of running times, one of the coaches commented
that the quarter milers turned in the more consistent times. Use the standard
deviation and the coefficient of variation to summarize the variability in the
data. Does the use of the coefficient of variation indicate that the coach’s
statement should be qualified?
•A statistics student made the
following grades on 5 tests: 84, 78,
88, 72, and 72.
What is the median grade?
(a) 78
(b) 80
(c) 88
(d) 72
Quantitative Methods in
Management
Day-5
Simple Regression
Page: 430-445
Recap..
• Introduction
• Definition
• Terms and terminologies
• Types of statistics
• Types of data
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing variables
• Tables
• Frequency distribution
• Relative frequency distribution
• Relative percent frequency distribution
• Cumulative frequency distribution
• Univariate
• Bivariate / cross tabulation
• Diagrams
• Bar charts
• Pie charts
• Graphs
• Histogram
• Frequency polygon
• Frequency curve
• Cumulative frequency curve ( Ogive)
• EDA
• Stem and leaf plot
• Scatter diagram
• Dot plots
• Pareto chart
Numerical descriptive statistics
Measures of location
Measures of dispersion
Measures of shapes
Kurtosis
Relative location
- Z score
- Chebyshev's inequality
- Empirical rule
Exploratory data analysis
- Five number summary
- Box plot
Relationship between two variables
- Co variance
- correlation
Simple linear regression
Chapter 12
Learning Objectives
• How to use regression analysis to predict the
value of a dependent variable based on an
independent variable
• The meaning of the regression coefficients b0
and b1
• Measures of variation ( SSE, SSR, SST)
• Coefficient of determination
Steps
• Plot the scatter diagram
• Identify the independent and dependent variables
• Fit a regression line by estimating b0 and b1
• Estimate the value ( predict Y^)
• Measures of variation
• SSR
• SSE
• SST
• Coefficient of determination
• Sign of b1(Sqrt (r2)) correlation
Correlation vs. Regression
• A scatter plot can be used to show the relationship between
two variables
• Correlation analysis is used to measure the strength of the
association (linear relationship) between two variables
• Correlation is only concerned with strength of the relationship
• No causal effect is implied with correlation
Introduction to
Regression Analysis
• Regression analysis is used to:
• Predict the value of a dependent variable based on the value of at least
one independent variable
• Explain the impact of changes in an independent variable on the
dependent variable
Dependent variable: the variable we wish to predict or explain
Independent variable: the variable used to predict or explain the
dependent variable
Regression Analysis
• Regression analysis is a tool for building
mathematical and statistical models that characterize
relationships between a dependent (ratio) variable
and one or more independent, or explanatory
variables (ratio or categorical), all of which are
numerical.
• Simple linear regression involves a single
independent variable.
• Multiple regression involves two or more
independent variables.
Simple Linear Regression Model
100
Sales
80
Larger (smaller) values of sales tend to be 60
associated with larger (smaller) values of 40
advertising. 20
0
0 10 20 30 40 50
A d ve rtising
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
Types of Relationships
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Types of Relationships
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Types of Relationships
(continued)
No relationship
X
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = β0 + β1Xi + ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)
Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an estimate of the
population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
Ŷi = b0 + b1Xi
observation i
The Least Squares Method
b0 and b1 are obtained by finding the values of
that minimize the sum of the squared differences
between Y and Ŷ :
y = β 0 + β 1x + ε
where:
β0 and β1 are called parameters of the model,
ε is a random variable called the error term.
Simple Linear Regression Equation
Positive Linear Relationship
E(y)
Regression line
Intercept Slope β1
β0 is positive
x
Simple Linear Regression Equation
E(y)
Intercept
β0 Regression line
Slope β1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
x
Estimated Simple Linear Regression Equation
ŷ = b0 + b1 x
Estimated
Regression Equation
b0 and b1
provide estimates of ŷ = b0 + b1 x
β0 and β1
Sample Statistics
b0, b1
Least Squares Method
• Least Squares Criterion
min ∑ (y i − y$ i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
Least Squares Method
• Slope for the Estimated Regression Equation
∑ ( x − x )( y − y )
b1 = i i
∑ (x − x )
i
2
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares Method
b0 = y − b1 x
Simple Linear Regression
Example: Reed Auto Sales
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Σx = 10 Σy = 100
x=2 y = 20
Columns required to calculate b1 and bo
X Y XY X2
Estimated Regression Equation
Slope for the Estimated Regression Equation
∑ ( x − x )( y − y ) 20
b1 = i i
= =5
∑ (x − x )i
2
4
25
Cars Sold 20
y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads
Columns required to calculate Measures of
variation
X Y XY X2 Y^ =… +…X Y- Y^ (Y-y^)2 (Y-Y) (Y-Y)2
SSE SST
Coefficient of Determination
• Relationship Among SST, SSR, SSE
SST = SSR + SSE
∑ i
( y − y ) 2
= ∑ i
( ˆ
y − y ) 2
+ ∑ i i
( y − ˆ
y ) 2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Coefficient of Determination
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
• Goodness of fit
• Perfect fit : SSR= SST or SST/SSR = 1
• Poorer fit result in larger values for SSE ( occurs when SSR=0 and SSE = SST)
Coefficient of Determination
where:
b1 = the slope of the estimated regression
equation
yˆ = b0 + b1 x
Sample Correlation Coefficient
rxy = (sign of b1 ) r 2
rxy = + .8772
rxy = +.9366
Simple Linear Regression Example
450
400
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
Simple Linear Regression Example: Graphical
Representation
450
400
= 98.25 + 0.1098(2000)
= 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Simple Linear Regression Example:
Making Predictions
• When using a regression model for prediction, only
predict within the relevant range of data
Relevant range for
interpolation
450
400
House Price ($1000s)
350
300
250
200
150 Do not try to
100
extrapolate
50
0
beyond the
0 500 1000 1500 2000 2500 3000 range of
Square Feet observed X’s
Measures of Variation
_
SST = ∑(Yi - Y)2
∧
Y ∧ _
SSR = ∑(Yi - Y)2
_ _
Y Y
Xi X
Coefficient of Determination, r2
• The coefficient of determination is the portion of
the total variation in the dependent variable that is
explained by variation in the independent variable
• The coefficient of determination is also called r-
squared and is denoted as r2
note:
0 ≤r ≤1
2
R 2
r2 = 1
X
r2 =1
Examples of r2 Values
Y
0 < r2 < 1
X
Examples of r2 Values
r2 = 0
Y
DAY 6
Course content
Chapter Page number content
1 11-32 Introduction, variables, levels of measurement, types of statistics
2 33-98 Organizing and visualizing variables
3 99-148 Numerical descriptive measures – categorical / numerical – Measures of central
tendencies, measures of dispersion, skewness, kurtosis, measures of relations –
co variance and correlation
12 430-446 Simple linear regression, estimating bo and b1, measures of variations, SST, SSR
and SSE, coefficient of determination, coefficient of correlation.
4 149-182 Basic probability
5,6 183-232 Discrete probability distributions – binomial and poisson
Continuous probability distribution – Normal
7 234-257 Sampling distribution
8 258-293 Confidence interval – mean, proportion and determining sample size
9 294-304 Fundamentals of testing of hypothesis
11 402-415 Chi square test
15 15-1 to 15-17 Decision analysis
RECAP
• Introduction – definition, types of statistics, levels of
measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and
correlation
• Regression analysis – simple, measures of variations ( SSE,
SSR, SST, coefficient of determination and coefficient of
correlation)
INFERENCE STATISTICS
• Preliminaries concepts on probability and random
variables, theoretical distributions
• Sampling distribution
• Estimation and
• Testing of hypothesis
Probability
• Concepts
• Definition - different ways of assigning probability.
• Understand and apply marginal, union, joint, and
conditional probabilities.
• Solve problems using the laws of probability including
the laws of addition, multiplication and conditional
probability
• Revise probabilities using Bayes’ rule.
CERTAIN/ REAL UNCERTAIN/ABSTRACT
• Survey • Experiment
• Data • Events
Marks % # of students
0 – 25 45
25 – 50 280
50 – 75 205
75 –100 30
Introduction…
• Assuming the next exam is equally tough and
there is a same % of dull and bright students, she
can conclude that the % of students in the 4
classes of marks would be
0 .5 1
Probability:
• Simple event
• An event described by a single characteristic
• e.g., A red card from a deck of cards
• Joint event
• An event described by two or more characteristics
• e.g., An ace that is also red from a deck of cards
• Complement of an event A (denoted A’)
• All events that are not part of event A
• e.g., All cards that are not diamonds
Sample Space
The Sample Space is the collection of all possible events
e.g. All 6 faces of a die:
Black 2 24 26
Red 2 24 26
Total 4 48 52
• Decision Trees 2
Sample
Space
Sample
Space 24
Full Deck
of 52 Cards
2
24
Definition
• Classical method of assigning probability (rules and
laws)
• Axiomatic
Assessing Probability
There are three approaches to assessing
the probability of an uncertain event:
1. a priori -- based on prior knowledge of the process
X number of ways the event can occur
probability of occurrence = =
Assuming
T total number of elementary outcomes
all
outcomes 2. empirical probability
are equally
likely number of ways the event can occur
probability of occurrence =
total number of elementary outcomes
3. subjective probability
based on a combination of an individual’s past experience,
personal opinion, and analysis of a particular situation
Example of a priori probability
X 12 face cards 3
= =
T 52 total cards 13
Empirical probability
• 383 of 751 business graduates were employed in the
past. The probability that a particular graduate will be
employed in his or her major area is 383/751 = 0.51
or 51%.
Example – an analyst on share prices may opine that the price of Reliance share has a 20%
probability of increasing by Rs.500 in the next 2 months
• Estimating the probability that a person wins a jackpot lottery.
• Estimating the probability that the GM will lose its first ranking in car sales.
Axiomatic ( basic rules)
• Probability lies between 0 and 1
• P(sure event) = 1
• P( impossible event) = 0
• P(AUB) = P(A) +P(B) – P(A∩B)
Definitions
Simple vs. Joint Probability
• Simple Probability refers to the probability of a
simple event.
• ex. P(King)
• ex. P(Spade)
January Days
Wednesdays
Organizing & Visualizing Events
(continued)
Wed. 4 48 52
Not Wed. 27 286 313
• Decision Trees 4
Total
Number
Sample Of
Space 27 Sample
All Days Space
In 2015 Outcomes
48
286
Definition: Simple Probability
• Simple Probability refers to the probability of a
simple event.
• ex. P(Jan.)
• ex. P(Wed.)
P(Jan.) = 31 / 365
Definition: Joint Probability
• Joint Probability refers to the probability of an
occurrence of two or more events (joint event).
• ex. P(Jan. and Wed.)
• ex. P(Not Jan. and Not Wed.)
A = Weekday; B = Weekend;
C = January; D = Spring;
example:
A = aces; B = black cards;
C = diamonds; D = hearts
Y
X
Law of Addition
X Y
Rules of Probability- Addition
theorem
• The probability of the entire sample space is 1
• 0 ≤ p(A) ≤ 1
Color
Type Red Black Total
Ace 2 2 4
Non-Ace 24 24 48
Total 26 26 52
Marginal Probability Example
P(Ace)
2 2 4
= P( Ace and Re d) + P( Ace and Black ) = + =
52 52 52
Color
Type Red Black Total
Ace 2 2 4
Non-Ace 24 24 48
Total 26 26 52
Marginal & Joint Probabilities In
A Contingency Table
Event
Event B1 B2 Total
A1 P(A1 and B1) P(A1 and B2) P(A1)
Wed. 4 48 52
Not Wed. 27 286 313
P(Wed.)
4 48 52
= P(Jan. and Wed.) + P(Not Jan. and Wed.) = + =
365 365 365
Wed. 4 48 52
Not Wed. 27 286 313
CD No CD Total
CD No CD Total
P(A | B) = P(A)
• Events A and B are independent when the probability of
one event is not affected by the fact that the other event
has occurred
Multiplication Rules
• Where B1, B2, …, Bk are k mutually exclusive and collectively exhaustive events
Problem Contingency Table
Counts
Telecommunication 40 10 50
Probability that a project is
undertaken by IBM given it is a
Computers 20 30 50 telecommunications project:
Total 60 40 100
Probabilities
P ( IBM I T )
AT& T IBM Total P ( IBM T ) =
P (T )
Telecommunication .40 .10 .50
0 . 10
= = 0 .2
Computers .20 .30 .50 0 . 50
Carpenter 0 2 4 3 1
Lawyer 6 2 1 1 0
Therapist 0 5 2 1 2
System 2 1 4 3 0
Analyst
Problem ….
• Develop a joint probability table
• What is the p that one of the participants had a
score in the 80s
• What is the p of a score in the 80s given he was a
therapist
• What is the p that one of the participants was a
lawyer
• What is the p that one of the participants was a
lawyer and received a score under 50
• What is the p of a score under 50 given that he is
a lawyer
• What is the p of being a lawyer given that his
score is under 50
• What is the p of a score of 70 or higher
Problem
• Joint Probability table
Yes 45 55 60 50
No 35 45 35 45
No 5 5 5 5
opinion
Problem …
What is the probability that a consumer selected at
random
• Preferred the brand = 210/390
• Preferred the brand and was from Chennai = 60/390
• Preferred the brand given that he was from Chennai =
60/100
• Given that a consumer preferred the brand, what
is the p that he was from Mumbai = 50/210
Bayes’ Theorem
• Bayes’ Theorem is used to revise previously
calculated probabilities based on new information.
P(A | B i )P(Bi )
P(Bi | A) =
P(A | B 1 )P(B1 ) + P(A | B 2 )P(B2 ) + ⋅ ⋅ ⋅ + P(A | B k )P(Bk )
• where:
Bi = ith event of k mutually exclusive and collectively
exhaustive events
A = new event that might impact P(Bi)
Bayes’ Theorem …
Prior Probabilities
New Information
Bayesian Theorem
Posterior Probabilities
Bayes’ Theorem Example
• A drilling company has estimated a 40% chance of
striking oil for their new well.
• A detailed test has been scheduled for more
information. Historically, 60% of successful wells
have had detailed tests, and 20% of unsuccessful
wells have had detailed tests.
• Given that this well has been scheduled for a
detailed test, what is the probability
that the well will be successful?
Bayes’ Theorem Example
(continued)
P(D | S)P(S)
P(S | D) =
P(D | S)P(S) + P(D | U)P(U)
(0.6)(0.4)
=
(0.6)(0.4) + (0.2)(0.6)
0.24
= = 0.667
0.24 + 0.12
Sum = 0.36
Problem 13
The probability of 3 events A, B and C occurring
are
p(A) = .35 p(B) = .45 p(C) = 0.2
Assuming that A, B or C has occurred, the
probabilities of another event, X, occurring are
p(X/A) = .8 p(X/B) = .65 p(X/C) = 0.3
.35
.20
.3 .3*.2 = .06 .06/.6325=.0949
.5
.5
.60 .5*.6 = .30 .71
.5
.2
.03 .006 .2727
• Counting Rule 1:
• If any one of k different mutually exclusive and
collectively exhaustive events can occur on each of n
trials, the number of possible outcomes is equal to
kn
• Example
• If you roll a fair die 3 times then there are 63 = 216 possible
outcomes
Counting Rules
(continued)
• Counting Rule 2:
• If there are k1 events on the first trial, k2 events on the
second trial, … and kn events on the nth trial, the number
of possible outcomes is
(k1)(k2)9(kn)
• Example:
• You want to go to a park, eat at a restaurant, and see a movie.
There are 3 parks, 4 restaurants, and 6 movie choices. How
many different possible combinations are there?
• Answer: (3)(4)(6) = 72 different possibilities
Counting Rules
(continued)
• Counting Rule 3:
• The number of ways that n items can be arranged in order
is
n! = (n)(n – 1)9(1)
• Example:
• You have five books to put on a bookshelf. How many different
ways can these books be placed on the shelf?
• Counting Rule 4:
• Permutations: The number of ways of arranging X objects
selected from n objects in order is
n!
n Px =
• Example: (n − X)!
• You have five books and are going to put three on a bookshelf. How
many different ways can the books be ordered on the bookshelf?
n! 5! 120
n Px = = = = 60
(n − X)! (5 − 3)! 2
Counting Rules
(continued)
• Counting Rule 5:
• Combinations: The number of ways of selecting X objects
from n objects, irrespective of order, is
n!
n Cx =
X!(n − X)!
• Example:
• You have five books and are going to select three are to read.
How many different combinations are there, ignoring the order
in which they are selected?
n! 5! 120
n Cx =
• Answer: = = = 10 different possibilities
X!(n − X)! 3! (5 − 3)! (6)(2)
Chapter Summary
• Discussed basic probability concepts
• Sample spaces and events, contingency tables, Venn diagrams, simple
probability, and joint probability
Chapter 5
Random
Variables
Related to frequency distributions by simply replaces the actual numbers (frequencies) with the
• A probability distribution for a discrete random variable is a mutually exclusive listing of all
possible numerical outcomes for that variable and a probability of occurrence associated with each
outcome.
T H 1 2/4 = 0.50
2 1/4 = 0.25
H T
Probability
0.50
0.25
H H
0 1 2 X
Discrete Random Variable
• A random variable that assumes a finite number of
values or an infinite sequence of values such as 0, 1,
2…. is a discrete random variable
variable
not ‘counted’
Continuous Random Variable
• Example
• Temperature between 29oC and 30oC can be 29.1, 29.5
or 29.9
• Time between customer arrivals at a bank
• Current Ratio of a motorcycle distributorship
• Elapsed time between arrivals of bank customers
• Percent of the labor force that is unemployed
Discrete random variable Continuous random variable
• (X, p(x)) • (x, f(x))
• PMF (probability mass function) • PDF ( Probability Density
Function)
• ΣP(x) = 1
• ∫f(x)dx = 1
E(X) E(X)
V(X) V(X)
• Decide which of the following distributions are probability
distributions:
a. The distribution takes the values -2,-1 ,0,1 and P(-2) =-0.5, P(-1) =
0.7, P(0) = 0.2 and P(1) = 0.6
b. The distribution takes the values 1,2,3,4 and corresponding
probabilities are 0.1,0.2,0.25,0.3
c. The distribution takes the values 20,30,40,50 with corresponding
probabilities as 0.1,0.2,0.3,0.4
Discrete Random Variables
Expected Value (Measuring Center)
• Expected Value (or mean) of a discrete
random variable (Weighted Average)
N
µ = E(X) = ∑ Xi P( Xi )
i=1
X P(X)
• Example: Toss 2 coins,
0 0.25
X = # of heads,
1 0.50
compute expected value of X:
2 0.25
i=1
N
σ = σ2 = ∑ i
[X
i =1
− E(X)] 2
P(Xi )
where:
E(X) = Expected value of the discrete random variable X
Xi = the ith outcome of X
P(Xi) = Probability of the ith occurrence of X
Discrete Random Variables
Measuring Dispersion
(continued)
σ= ∑ [X − E(X)] P(X )
i
2
i
E(X) E(X2)
• V(X) = E(X2) – [E(X)]2
Distribution of Daily
Crises P
Number of r 0.5
Probability o
Crises 0.4
b
0 0.37 a 0.3
b
1 0.31 i
0.2
2 0.18 l 0.1
3 0.09 i
0
4 0.04 t 0 1 2 3 4 5
y
5 0.01 Number of Crises
Mean of the Crises Data Example
µ = E( X ) = ∑ X ⋅ P( X ) = 115
.
X P(X) X•P(X) P
r 0.5
0 .37 .00
o 0.4
1 .31 .31 b
a 0.3
2 .18 .36 b
0.2
i
3 .09 .27
l 0.1
4 .04 .16 i
0
t 0 1 2 3 4 5
5 .01 .05 y
Number of Crises
1.15
Variance & SD - Crises Data
∑ ( X − µ ) ⋅ P ( X ) = 1.41 σ σ
2
σ =
2
= 141
. = 119
2
= .
X P(X) (X- µ ) (X- µ ) 2 (X- µ ) 2 • P(X)
0 .37 -1.15 1.32 .49
1 .31 -0.15 0.02 .01
2 .18 0.85 0.72 .13
3 .09 1.85 3.42 .31
4 .04 2.85 8.12 .32
5 .01 3.85 14.82 .15
1.41
Discrete Variables Expected Value (Measuring
Center)
• Expected Value (or mean) of a discrete
variable (Weighted Average)
N
µ = E(X) = ∑ x i P ( X = x i )
i =1
N
σ = σ2 = ∑ i
[x
i =1
− E(X)] 2
P(X = x i )
where:
E(X) = Expected value of the discrete variable X
xi = the ith outcome of X
P(X=xi) = Probability of the ith occurrence of X
Discrete Variables:
Measuring Dispersion (continued)
N
σ= ∑ [x
i =1
i − E(X)] P(X = x i )
2
Interruptions Per
Day In Computer Probability
Network (xi) P(X = xi) [xi – E(X)]2 [xi – E(X)]2P(X = xi)
0 0.35 (0 – 1.4)2 = 1.96 (1.96)(0.35) = 0.686
1 0.25 (1 – 1.4)2 = 0.16 (0.16)(0.25) = 0.040
2 0.20 (2 – 1.4)2 = 0.36 (0.36)(0.20) = 0.072
3 0.10 (3 – 1.4)2 = 2.56 (2.56)(0.10) = 0.256
4 0.05 (4 – 1.4)2 = 6.76 (6.76)(0.05) = 0.338
5 0.05 (5 – 1.4)2 = 12.96 (12.96)(0.05) = 0.648
σ2 = 2.04, σ = 1.4283
Problem
• An auto dealer determines the demand he can
expect for autos during a 1-month period. The
probability of demand for 50, 55, 60 & 65 cars
sold per month is 0.15, 0.2, 0.3 and 0.35. Find
the expected value
Mean 114 81
SD 42 29.14
CV 36.84% 35.97%
Return – Share 1
Risk – Share 2
Problem
The probability distribution for the # of TV sets
per household is
X 0 1 2 3 4 5
P(X) .01 .23 .41 .2 .1 .05
If Walters purchases stock whenever the expected rate of return exceeds 10 per cent, will he
purchase the stock, according to these data? What is your suggestion to Walters?
• Answer= Yes, he will purchase the stock because he carefully studies any potential investment.
Exercises
• 5.5
• 5.6
• 5.7
• 5.8
page no. 187,188
Probability Distributions
Probability
Distributions
Binomial Normal
Poisson
For any distribution
• When to apply
• Prob mass function/ density function
• Range
• Parameter
• Constants/ characteristics
• Simple problem
• Given parameter calculate probability
• Given probability, parameters get the random variable ( INVERSE)
• Expected value : E(X) = NP(X=x)
Binomial Probability Distribution
A fixed number of observations, n
e.g., 15 tosses of a coin; ten light bulbs taken from a warehouse
Each observation is categorized as to whether or not
the “event of interest” occurred
e.g., head or tail in each toss of a coin; defective or not defective light bulb
Since these two categories are mutually exclusive and collectively
exhaustive
When the probability of the event of interest is represented as π, then the
probability of the event of interest not occurring is 1 - π
Constant probability for the event of interest occurring (π) for each
observation
Probability of getting a tail is the same each time we toss the coin
Binomial Probability Distribution
(continued)
• Possible ways: HHT, HTH, THH, so there are three ways you
can getting two heads.
n!
n Cx =
X!(n − X)!
where:
n! =(n)(n - 1)(n - 2) . . . (2)(1)
X! = (X)(X - 1)(X - 2) . . . (2)(1)
0! = 1 (by definition)
Counting Techniques
Rule of Combinations
• How many possible 3 scoop combinations could you create at
an ice cream parlor if you have 31 flavors to select from?
• The total choices is n = 31, and we select X = 3.
E(X) = N P(X=x)
Binomial Distribution
• Probability n! X n− X
function P( X ) = p ⋅q for 0 ≤ X ≤ n
X !( n − X ) !
• Mean
value µ = n⋅ p
• Variance and σ 2
= n⋅ p⋅q
standard
deviation σ = σ 2
= n⋅ p⋅q
Example:
Calculating a Binomial Probability
What is the probability of one success in five
observations if the probability of an event of
interest is .1?
X = 1, n = 5, and π = 0.1
n!
P(X = 1) = π X (1 − π ) n − X
X!(n − X)!
5!
= (0.1)1 (1 − 0.1)5 −1
1!(5 − 1)!
= (5)(0.1)(0.9) 4
= 0.32805
The Binomial Distribution
Example
Suppose the probability of purchasing a defective
computer is 0.02. What is the probability of
purchasing 2 defective computers in a group of 10?
X = 2, n = 10, and π = .02
n!
P(X = 2) = π X (1 − π ) n − X
X!(n − X)!
10!
= (.02) 2 (1 − .02)10 − 2
2!(10 − 2)!
= (45)(.0004)(.8508)
= .01531
The Binomial Distribution
Shape
• The shape of the binomial P(X) n = 5 π = 0.1
.6
distribution depends on the .4
values of π and n .2
0
Here, n = 5 and π = .1 0 1 2 3 4 5 X
P(X) n = 5 π = 0.5
.6
.4
.2
0
Here, n = 5 and π = .5 0 1 2 3 4 5 X
The Binomial Distribution
Using Binomial Tables
n = 10
x F π=.20 π=.25 π=.30 π=.35 π=.40 π=.45 π=.50
0 F 0.1074 0.0563 0.0282 0.0135 0.0060 0.0025 0.0010 10
1 F 0.2684 0.1877 0.1211 0.0725 0.0403 0.0207 0.0098 9
2 F 0.3020 0.2816 0.2335 0.1757 0.1209 0.0763 0.0439 8
3 F 0.2013 0.2503 0.2668 0.2522 0.2150 0.1665 0.1172 7
4 F 0.0881 0.1460 0.2001 0.2377 0.2508 0.2384 0.2051 6
5 F 0.0264 0.0584 0.1029 0.1536 0.2007 0.2340 0.2461 5
6 F 0.0055 0.0162 0.0368 0.0689 0.1115 0.1596 0.2051 4
7 F 0.0008 0.0031 0.0090 0.0212 0.0425 0.0746 0.1172 3
8 F 0.0001 0.0004 0.0014 0.0043 0.0106 0.0229 0.0439 2
9 F 0.0000 0.0000 0.0001 0.0005 0.0016 0.0042 0.0098 1
10 F 0.0000 0.0000 0.0000 0.0000 0.0001 0.0003 0.0010 0
• Mean
µ = E(x) = nπ
Variance and Standard Deviation
2
σ = nπ (1 - π )
σ = nπ (1 - π )
Where n = sample size
π = probability of the event of interest for any trial
(1 – π) = probability of no event of interest for any trial
The Binomial Distribution
Characteristics
Examples
P(X) n = 5 π = 0.1
µ = nπ = (5)(.1) = 0.5 .6
.4
σ = nπ (1 - π ) = (5)(.1)(1 − .1) .2
0
= 0.6708 0 1 2 3 4 5 X
P(X) n = 5 π = 0.5
µ = nπ = (5)(.5) = 2.5 .6
.4
σ = nπ (1 - π ) = (5)(.5)(1 − .5) .2
0
= 1.118 0 1 2 3 4 5 X
Using Excel For The
Binomial Distribution
Problem
• Find the probability of getting
I) exactly 3 heads in 4 tosses of a biased coin,
where p(H) = ¾ and p(T) = ¼
P(X = 3) = 4C3 (¾)3 (¼)1 = 0.421875
ii) Atleast 3 heads p(X ≥ 3) = .737
iii) No more than 2 heads p(X ≤2) = .263
Problem
• Assume that on an average, 1 telephone line out of
5 is busy. What is the probability that if 3
randomly selected telephone numbers are called
−λ x
e λ
P( X) =
X!
where:
X = number of events in an area of opportunity
λ = expected number of events
e = base of the natural logarithm system (2.71828...)
Poisson Distribution Characteristics
• Mean
µ=λ
Variance and Standard Deviation
σ2 = λ
σ= λ
where λ = expected number of events
Using Poisson Tables
λ
e − λ λ X e −0.50 (0.50)2
P(X = 2) = = = 0.0758
X! 2!
Using Excel For The
Poisson Distribution
Graph of Poisson Probabilities
0.70
Graphically: 0.60
λ = 0.50 0.50
λ= 0.40
P(x)
X 0.50
0.30
0 0.6065
0.20
1 0.3033
2 0.0758 0.10
3 0.0126 0.00
0 1 2 3 4 5 6 7
4 0.0016
5 0.0002 x
6 0.0000
P(X = 2) = 0.0758
7 0.0000
Poisson Distribution Shape
• The shape of the Poisson Distribution depends
on the parameter λ :
0.70
λ = 0.50 0.25
λ = 3.00
0.60
0.20
0.50
0.15
0.40
P(x)
P(x)
0.30 0.10
0.20
0.05
0.10
0.00 0.00
0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12
x x
Types of problem in Poisson distribution
• Given mean, find the probability of X=x
• Given: n and p , find the probability of X=x
• Given : probability and mean, find X ( inverse)
• Given, N, mean and find expected value
Problem
On an average, 1 in 400 items are defective.
Out of 100 items chosen, what is the
probability that there are more than 3
defectives?
P = 1/400 n = 100 λ = np = 0.25
P(X > 3) = 1 – [p(0) + p(1) + p(2) + p(3)] =
1–[e-.25(.250/0!+.251/1!+.252/2!+.253/3!)]
= 1 – [.7787(1 + .25 + .03125 + .0026)]
= 1 – (.7787474 * 1.28385)
= 0.0002052 ( 2 in 10000)
(e-.25 = e -.20 * e -.05 = .8187 * .9512 =
.7787474)
Problem
A factory produces an item in packets of 10.
The probability of an item to be defective is
.2%. Find the number of packets having 2
defective items in a consignment of 10000
packets
2
Poisson Approximation
of the Binomial Distribution
• Binomial probabilities are difficult to
calculate when n is large.
• Under certain conditions binomial
probabilities may be approximated by
Poisson probabilities.
Use λ = n ⋅ p.
Customer Dissatisfaction Survey –
Airline Passengers
Complaints per 100,000
Southwest 0.25
Alaska Air 0.54
Delta 0.79
US Airways 0.84
Continental 1.02
Tower Air 1.91
Northwest 2.21
If 1000,000 boarded passengers were contacted, what is
the probability that exactly 3 of them logged a
complaint? λ = 1.08
(1.08)3e-1.08 /3! = 0.0713
7.13% of the time, 3 would have logged complaints
• On the average, six people per hour use a self-service
banking facility during the prime shopping hours in a
department store. What is the probability that
a. Exactly six people will use the facility during a randomly
selected hour?
b. Fewer than five people will use the facility during a
randomly selected hour.
c. No one will use the facility during a 10 minutes interval?
d. No one will use the facility during a 5 – minutes interval
* X ~P(λ) st p(x=1) = P(x=2)
2
1 (X − µ)
1 −
2 σ
f(X) = e
2πσ
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
µ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
Many Normal Distributions
μ X
The Standardized Normal
X −µ
Z=
σ
The Z distribution always has mean = 0 and standard
deviation = 1
The Standardized Normal
Probability Density Function
• The formula for the standardized normal probability
density function is
1 −(1/2)Z 2
f(Z) = e
2π
Z
0
Values above the mean have positive Z-values, values below the mean have negative Z-values
Problem
The mean length of time spent on a training program
is 500 hrs and this normally distributed random
variable has a SD of 100 hrs
What is the probability that a participant will take
• More than 500 hrs (0.50)
• Between 500 & 600 hrs (.3413)
• Between 550 & 650 hrs (.2417)
• Between 420 & 570 hrs (.5461)
2) A pr oj ect yi elds an aver age cash – f low of Rs. 50 0 lakhs w i t h a st andar d
devi at i on of Rs. 60 l akhs. Calculat e t he follow i ng pr obabi li t i es.
(i ) Cash f low w i ll be mor e t han Rs. 560 l akhs
(i i ) Cash f low w i ll be less t han Rs. 420 lakhs
(i i i ) Cash f low w i ll be bet w een Rs. 460 and Rs. 540 lakhs
(i v) Cash f low w i ll be mor e t han Rs. 680 lakhs
Solution:
Let x Cash flow in Rs.
Χ = Rs. 500 lakhs
σ = Rs. 60 lakhs
-3 -2 -1 0 1 2 3
Χ−Χ
P (X ≥ 560) = P(Z≥ )
σ
560 − 500
P (X ≥ 560) = P(Z≥ )
60
= P (Z ≥ 1)
= 0.5 – 0.3413
= 0.1587
( i i ) Cash f low w i ll be less t han Rs. 420 lakhs
-3 -2 -1 0 -1 -2 -3
Χ− Χ
P ( X ≤ 420 ) = P( Z≤ )
σ
420−500
P ( X ≤ 420 ) = P( Z≤ )
60
= P ( Z ≤ −1.33)
= ( Ar ea f r om 0 t o -∞ ) - ( Ar ea fr om -1.33 t o 0 )
= 0 .5 – 0 .40 8 2
= 0 .0 918
(iii) Cash flow will be between Rs. 460 and Rs. 540 lakhs
-3 -2 -1 0 1 2 3
= 0.2486 + 0.2486
= 0.4972
(iv) Cash flow will be more than Rs. 680 lakhs
-3 -2 -1 0 1 2 3
Χ−Χ
P (X ≥ 680) = P(Z≥ )
σ
680 − 500
P (X ≥ 560) = P(Z≥ )
60
= P (Z ≥ 3 )
= 0.5 – 0.4987
= 0.0013
Problem
A normal variable has a mean of 10 and SD 5.
What is the probability that the normal
variable will take a value in the interval 0.2 to
19.8?
P(0.2 < X < 19.8)
= p[((0.2 – 10)/5) < Z < ((19.8 – 10)/5)]
= p(-1.96 < Z < 1.96)
= 2 * .4750
= .9500
Finding Probabilities of the
Standard Normal Distribution:
P(0 < Z < 1.56)
Standard Normal Probabilities
Standard Norm al D istribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.4 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.3 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
f(z)
0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
0.0
-5 -4 -3 -2 -1 0 1
{
2 3 4 5
1.2
1.3
1.4
0.3849
0.4032
0.4192
0.3869
0.4049
0.4207
0.3888
0.4066
0.4222
0.3907
0.4082
0.4236
0.3925
0.4099
0.4251
0.3944
0.4115
0.4265
0.3962
0.4131
0.4279
0.3980
0.4147
0.4292
0.3997
0.4162
0.4306
0.4015
0.4177
0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
Look in row 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
labeled 1.5 and 2.4
2.5
0.4918
0.4938
0.4920
0.4940
0.4922
0.4941
0.4925
0.4943
0.4927
0.4945
0.4929
0.4946
0.4931
0.4948
0.4932
0.4949
0.4934
0.4951
0.4936
0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
column labeled .06 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
to find P(0 ≤ z ≤
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
1.56) = 0.4406
Given a Normal Probability
Find the X Value
X = µ + Zσ
Finding the X value for a Known
Probability
Example:
• Let X represent the time it takes (in seconds) to download an
image file from the internet.
• Suppose X is normal with mean 8.0 and standard deviation
5.0
• Find X such that 20% of download times are less than X.
0.2000
? 8.0 X
? 0 Z
Find the Z value for
20% in the Lower Tail
1. Find the Z value for the known probability
Standardized Normal Probability
Table (Portion) • 20% area in the lower tail
is consistent with
a Z value of -0.84
Z … .03 .04 .05
X = µ + Zσ
= 8.0 + ( −0.84)5.0
= 3.80
Chapter 7
Chapter 7 : 234-257
Sampling and Sampling Distributions
Learning Objectives
Does the selected sample represent the characteristics of the whole bunch?
• Sample
• Subset of the population
• Sampling
Finite Population:
If the population consists of a finite number of
individuals, then it is called a Finite Population.
Infinite Population:
In a statistical survey aimed at determining average
per capita income of the people in a city, all earning
individuals in the city form the population.
Why Sample?
• Selecting a sample is less time-consuming than
selecting every item in the population (census).
Simple Stratified
Random
Judgment Convenience
Systematic Cluster
Types of Samples:
Nonprobability Sample
• In a nonprobability sample, items included are
chosen without regard to their probability of
occurrence.
• In convenience sampling, items are selected based only on
the fact that they are easy, inexpensive, or convenient to
sample.
• In a judgment sample, you get the opinions of pre-selected
experts in the subject matter.
Types of Samples:
Probability Sample
• In a probability sample, items in the sample
are chosen on the basis of known probabilities.
Probability Samples
Simple
Random Systematic Stratified Cluster
Probability Sample:
Simple Random Sample
• Every individual or item from the frame has an
equal chance of being selected
9 9 4 3 7 8 7 9 6 1 4 5 7 3 7 3 7 5 5 2 9 7 9 6 9 3 9 0 9 4 3 4 4 7 5 3 1 6 1 8
5 0 6 5 6 0 0 1 2 7 6 8 3 6 7 6 6 8 8 2 0 8 1 5 6 8 0 0 1 6 7 8 2 2 4 5 8 3 2 6
8 0 8 8 0 6 3 1 7 1 4 2 8 7 7 6 6 8 3 5 6 0 5 1 5 7 0 2 9 6 5 0 0 2 6 4 5 5 8 7
8 6 4 2 0 4 0 8 5 3 5 3 7 9 8 8 9 4 5 4 6 8 1 3 0 9 1 2 5 3 8 8 1 0 4 7 4 3 1 9
6 0 0 9 7 8 6 4 3 6 0 1 8 6 9 4 7 7 5 8 8 9 5 3 5 9 9 4 0 0 4 8 2 6 8 3 0 6 0 6
5 2 5 8 7 7 1 9 6 5 8 5 4 5 3 4 6 8 3 4 0 0 9 9 1 9 9 7 2 9 7 6 9 4 8 1 5 9 4 1
8 9 1 5 5 9 0 5 5 3 9 0 6 8 9 4 8 6 3 7 0 7 9 5 5 4 7 0 6 2 7 1 1 8 2 6 4 4 9 3
Probability Sample:
Systematic Sample
• Decide on sample size: n
• Divide frame of N individuals into groups of k
individuals: k=N/n
• Randomly select one individual from the 1st group
• Select every kth individual thereafter
N = 40 First Group
n=4
k = 10
Probability Sample:
Stratified Sample
• Divide population into two or more subgroups (called strata) according to
some common characteristic
• A simple random sample is selected from each subgroup, with sample sizes
proportional to strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying
across racial or socio-economic lines.
Population
Divided
into 4
strata
Probability Sample
Cluster Sample
• Population is divided into several “clusters,” each representative of the
population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a
cluster using another probability sampling technique
• A common application of cluster sampling involves election exit polls, where
certain election districts are selected and sampled.
Population
divided into
16 clusters. Randomly selected
clusters for sample
Multi Stage Sampling
• Sampling carried out in stages
• Material regarded as being made up of a number
of I stage sampling units, each of which is made
up of a number of II Stage units and so on
• Example – sample of 5000 households in
Karnataka
• I Stage – State – divided into District
• II Stage – Districts – divided into villages
• III Stage – Villages – divided into households
Judgement Sampling
• Choice of a sample depends exclusively on the
judgement of the investigator
• Quality of the sample depends exclusively on the
judgement of the person selecting the sample
Convenience Sampling
• Elements are included in the sample without pre
specified or known probability of being selected –
convenience of researcher
• A convenient chink or slice of the population is
taken
• Example
• From telephone directories
• Professor conducting research may use student
volunteers
Quota Sampling
• Quotas are set based on a given criteria, but
within the quotas, the sample is judgmental
• Example
• If out of 100 people to be interviewed, 60 are to be
housewives, 25 farmers, 15 children less than 15 years
Biased Sampling
• Picking a sample by choosing people who would
have very strong feelings on the issue
Snowball Sampling
• Survey subjects are selected based on referral
from other survey respondents
Probability Sample:
Comparing Sampling Methods
• Simple random sample and Systematic sample
• Simple to use
• May not be a good representation of the population’s
underlying characteristics
• Stratified sample
• Ensures representation of individuals across the entire
population
• Cluster sample
• More cost effective
• Less efficient (need larger sample to acquire the same level
of precision)
Evaluating Survey Worthiness
• What is the purpose of the survey?
• Is the survey based on a probability sample?
• Coverage error – appropriate frame?
• Nonresponse error – follow up
• Measurement error – good questions elicit good responses
• Sampling error – always exists
Sampling Vs. Non Sampling Error
• Sampling Error
• As sample results are based on partial or
incomplete analysis of the population features,
any statistical inference based on the sample may
not always be correct
• Non sampling error
• Incorrect enumeration of population
• Non random selection of samples
• Use of faulty questionnaire
• Wrong editing, coding or analysis
Types of Survey Errors
• Coverage error or selection bias
• Exists if some groups are excluded from the frame and have no
chance of being selected
• Sampling error
• Variation from sample to sample will always exist
• Measurement error
• Due to weaknesses in question design, respondent error, and
interviewer’s effects on the respondent (“Hawthorne effect”)
Types of Survey Errors
(continued)
• For example, suppose you sample 50 students from your college regarding
their mean GPA. If you obtained many different samples of 50, you will
compute a different mean for each sample. We are interested in the
distribution of all potential mean GPA we might calculate for any given
sample of 50 students.
Developing a
Sampling Distribution
• Random variable, X,
is age of individuals
• Values of X: 18, 20,
22, 24 (years)
Developing a
Sampling Distribution
(continued)
µ=
∑ X i P(x)
N .3
18 + 20 + 22 + 24
= = 21 .2
4 .1
0
σ=
∑ (X − µ)
i
2
= 2.236
18
A
20
B
22
C
24
D
x
N
Uniform Distribution
Developing a
Sampling Distribution
(continued)
Now consider all possible samples of size n=2
16 Sample
1st 2nd Observation
Obs Means
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement) 24 21 22 23 24
Developing a
Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
µX =
∑ X
i 18 + 19 + 19 + L + 24
= = 21
N 16
σX =
∑ ( X i − µ X
) 2
Population Sample
n = 2 Means Distribution
N=4
µ = 21 σ = 2.236 µX = 21 σ X = 1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0 18 20 22 24 X 0
18 19 20 21 22 23 24
_
A B C D X
Sampling Distribution…
• The function used – Mean or SD – is the Sample
Statistic
• The SD of the distribution of the sample statistic
is the Standard Error of the Statistic
• The expected value is regarded as the true value
and any deviation is regarded as error of
estimation due to sampling effects
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same
population will yield different sample means
• A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)
σ
σX =
n
• Note that the standard error of the mean decreases as the
sample size increases
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normally distributed with mean μ
and standard deviation σ, the sampling distribution
of X is also normally distributed with
σ
µX = µ and σX =
n
Z-value for Sampling Distribution
of the Mean
• Z-value for the sampling distribution of : X
(X − µX ) ( X − µ)
Z= =
σX σ
n
where: X = sample mean
µ = population mean
σ = population standard deviation
n = sample size
Sampling Distribution Properties
Normal Population
•
µx = µ Distribution
µ x
(i.e. xis unbiased ) Normal Sampling
Distribution
(has the same mean)
µx
x
Sampling Distribution Properties
(continued)
As n increases, Larger
sample size
σ xdecreases
Smaller
sample size
µ x
Sample Mean Sampling Distribution:
If the Population is not Normal
σ
µ x = µ and σx =
n
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough7
shape of
population
x
Sample Mean Sampling Distribution:
If the Population is not Normal
(continued)
Population Distribution
Sampling distribution
properties:
Central Tendency
µx = µ
µ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx = Larger
n Smaller
sample size
sample
size
µx x
How Large is Large Enough?
• For most distributions, n > 30 will give a sampling
distribution that is nearly normal
• For fairly symmetric distributions, n > 15 will
usually give a sampling distribution is almost
normal
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed
Example
• Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n
= 36 is selected.
Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• … so the sampling distribution of x is approximately
normal
• … with mean µx = 8
• …and standard deviation
σ 3
σx = = = 0.5
n 36
Example
(continued)
Solution (continued):
7.8 - 8 X -µ 8.2 - 8
P(7.8 < X < 8.2) = P < <
3 σ 3
36 n 36
= P(-0.4 < Z < 0.4) = 0.3108
• √(N – n) / (N – 1)
• Find the Sampling Fraction n/N, if < .05, the
Finite multiplier is NOT to be used
• Also called Finite Correction Factor
Standard Error of Mean
• Infinite Population
σ
___
√n
• Finite Population
______
σ N-n
___ _______
√n √ N-1
Problem 4
• In a sample of 25 observations from a normal
distribution with mean 98.6 and SD 17.2, what
is p(92 < x < 102)
• σ = 17.2 µ = 98.6
n = 25
• σ / √ n = 3.44
• P(92< x <102)
= p(-1.92 < z < 0.99)
= .4726 + .3389
= .8115
Problem 5
• The auditor of a credit card company knows
that on an average, the daily balance of any
given customer is 112 and the SD 56. From 50
randomly selected accounts what is the
probability that the sample average daily
balance is
• < 100 ( .0643)
• Between 100 and 130 ( .9241)
Problem 6
• From a population of 125 items, with mean of
105 and SD 17, 64 items are chosen, what is
the standard error of the mean?
N = 125 n = 64
µ = 105 σ = 17
SE = 1.4896
Population Proportions
π = the proportion of the population having
some characteristic
• Sample proportion ( p ) provides an estimate
of π:
X number of items in the sample having the characteristic of interest
p= =
n sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution when
n is large
(assuming sampling with replacement from a finite population or without
replacement from an infinite population)
Sampling Distribution of p
• Approximated by a
normal distribution if: Sampling Distribution
P( ps)
.3
•
nπ ≥ 5 .2
.1
0
and
0 .2 .4 .6 8 1 p
n(1 − π ) ≥ 5
where
π(1− π )
µp = π and σp =
n
(where π = population proportion)
Z-Value for Proportions
Standardize p to a Z value with the formula:
p −π p −π
Z= =
σp π (1− π )
n
Example
Standardized
Sampling Distribution Normal Distribution
0.4251
Standardize
Chapter 8
n XB s b
SampleE SampleA
_
XE n XA s a
se
In reality, the sample mean is just one of many possible sample
means drawn from the population, and is rarely equal to µ.
Estimation
Point Estimation
In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval
Point Estimates
Mean µ X
Proportion π p
Estimator
• The sample mean, is the most common estimator
of the population mean
• The sample variance, is the most common
estimator of the population variance
• The sample standard deviation, s, is the most
common estimator of the population standard
deviation
• The sample proportion, is the most common
estimator of the population proportion
Types
• Point Estimate – Single number used to estimate
single-valued estimate.
A single element chosen from a sampling
distribution.
Conveys little information about the actual value
of the population parameter, about the accuracy of
the estimate
Types
• Interval Estimate – Range of values
An interval or range of values believed to include
the unknown population parameter.
Associated with the interval is a measure of the
confidence we have that the interval does indeed
contain the parameter of interest.
Properties of a good estimator
• Property of unbiasedness
• Expected value of the estimator is equal to the parameter
being estimated.
• Property of efficiency
• Smallest variance
• Property of sufficiency
• Use as much information as possible from the sample
• Property of consistency
• Sample size increases, estimate tends to be parameter value
Point Estimate
• The sample mean is the best estimator of the
population mean
• The sample SD is the best estimator of the
population SD
• Sample proportion is the best estimator of the
population proportion
Problem 1
From the following data find the point estimates of
the population mean and the population SD
5 8 10 7 10 14
Problem 2
A survey question for a sample of 150 individuals
yielded 75 YES responses, 55 NO responses and
20 NO OPINIONS
What is the point estimate of the proportion in the
population who respond
(i) Yes (ii) No (iii) No Opinion
Problem 3
A bank wants to determine the number of tellers
during lunch rush on Fridays. Data on the
number of people who entered the bank
between 11 am and 1 pm on Friday over the
last 3 months is:
242 275 289 306
342 385 279 245
269 305 394 328
Find point estimates of mean & SD of population
from which the sample was drawn
Problem 4
In a sample of 400 textile workers, 184 expressed
extreme dissatisfaction regarding a prospective
plan to modify working conditions. Because
this dissatisfaction was strong enough to allow
management to interpret plan reaction as being
highly positive, they were curious about the
proportion of total workers harboring this
sentiment. Give a point estimate of this
proportion
Confidence Intervals
Sample
General Formula
• The general formula for all confidence
intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Where:
• Point Estimate is the sample statistic estimating the population
parameter of interest
• Confidence Level
• The confidence that the interval will
contain the unknown population
parameter
• A percentage (less than 100%)
Confidence Level, (1-α)
(continued)
• Suppose confidence level = 95%
• Also written (1 - α) = 0.95, (so α = 0.05)
• A relative frequency interpretation:
• 95% of all the confidence intervals that can be
constructed will contain the unknown true parameter
• A specific interval either will contain or will not
contain the true parameter
• No probability involved in a specific interval
Confidence Intervals
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
Confidence Interval for μ
(σ Known)
• Assumptions
• Population standard deviation σ is known
• Population is normally distributed
• If population is not normal, use large sample
σ
X ± Z α/2
n
where X is the point estimate
Zα/2 is the normal distribution critical value for a probability of α/2 in each tail
is the standard error
σ/ n
Finding the Critical Value, Zα/2
Z α /2 = ± 1.96
• Consider a 95% confidence interval:
1 − α = 0.95 so α = 0.05
α α
= 0.025 = 0.025
2 2
Confidence
Confidence
Coefficient, Zα/2 value
Level
1− α
80% 0.80 1.28
90% 0.90 1.645
95% 0.95 1.96
98% 0.98 2.33
99% 0.99 2.58
99.8% 0.998 3.08
99.9% 0.999 3.27
Intervals and Level of Confidence
Sampling Distribution of the Mean
α/2 1− α α/2
x
Intervals µx = µ
extend from x1
σ x2 (1-α)x100%
X − Zα / 2 of intervals
n
to constructed
σ contain µ;
X + Zα / 2
n (α)x100% do
not.
Confidence Intervals
Example
• Solution: σ
X ± Z α/2
n
= 2.20 ± 1.96 (0.35/ 11)
= 2.20 ± 0.2068
1.9932 ≤ µ ≤ 2.4068
Interpretation
Population Population
Mean Proportion
σ Known σ Unknown
Do You Ever Truly Know σ?
• Probably not!
• If you truly know µ there would be no need to gather a sample to estimate it.
Confidence Interval for μ
(σ Unknown)
d.f. = n - 1
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7
If the mean of these three
Let X2 = 8
What is X3?
values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Student’s t Distribution
Note: t Z as n increases
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
Student’s t Table
Upper Tail Area
Let: n = 3
df .25 .10 .05 df = n - 1 = 2
α = 0.10
1 1.000 3.078 6.314 α/2 = 0.05
Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) (∞ d.f.)
Note: t Z as n increases
Example of t distribution confidence interval
A random sample of n = 25 has X = 50 and
S = 8. Form a 95% confidence interval for μ
46.698 ≤ µ ≤ 53.302
Problem
From a population with SD 1.65, a sample of 32
items resulted in 34.8 as an estimate of the
mean. Find the SE of the mean. Compute an
interval estimate that should include the
population mean 99.7% of the time
σ = .292
Interval Estimates 34.8 ± .867
33.93 - 35.67
Problem
Estimate the mean life of windshield wiper
blades under typical driving conditions for CL
of 95%
SD 6 months
Sample size = 100
Mean = 21 months
Population Population
Mean Proportion
σ Known σ Unknown
Confidence Intervals for the
Population Proportion, π
π (1− π )
σp =
n
• We will estimate this with sample data:
p(1− p)
n
Confidence Interval Endpoints
• Upper and lower confidence limits for the population
proportion are calculated with the formula
p(1 − p)
p ± Z α/2
n
• where
• Zα/2 is the standard normal value for the level of confidence desired
• p is the sample proportion
• n is the sample size
• Note: must have np > 5 and n(1-p) > 5
Example
Determining
Sample Size
For the
Mean Sampling error
(margin of error)
σ σ
X ± Zα / 2 e = Zα / 2
n n
Determining Sample Size
(continued)
Determining
Sample Size
For the
Mean
σ 2
Zα / 2 σ 2
e = Zα / 2 Now solve
n=
for n to get 2
n e
Determining Sample Size
(continued)
2 2 2 2
Z σ (1.645) (45)
n= 2
= 2
= 219.19
e 5
Determining
Sample Size
For the
Proportion
Solution:
For 95% confidence, use Zα/2 = 1.96
e = 0.03
p = 0.12, so use this to estimate π
Two-tailed test
Is a significance test in which it will reject the null
hypothesis if the sample mean is significantly higher
or lower then hypothesized population mean.(i.e.
there are two rejection region)
Ho : µ # µo
Terminologies
Significance level
Complementary concepts to confidence limits.
Probability of committing a TYPE 1 error, naming
rejecting the null hypothesis when in reality it is true.
There is no single standard or universal level of
significance for testing hypothesis.
The higher the significance level, the higher the
probability of rejecting a null hypothesis when it is
true.
Set up a Significance level
• The confidence with which an experimenter
rejects or retains H0 depends on the level of
significance involved
• α = 5%
• 5% chance that H0 is rejected when it should be
accepted
• 95% confident that we have made the right decision
• Willing to accept a 5% chance of being wrong to reject
H0
Suitable Test Statistic
D C O N D I T I O N
E
H0 : True H0 : False
C
I
Accept H0 Correct Decision TYPE 2 Error
S Confidence Level (β)
I (1 - α)
O Reject H0 TYPE 1 Error Correct Decision
N (α) Power of Test
(1 - β)
Type 1 error, α
Is the error of rejecting a null hypothesis when it is
true.
Type 11error, β
Is the error of accepting a null hypothesis when it is
actually false.
Step 4 : calculation
Step 5 : inference
• Classical approach : if table value > calculated value – accept Ho
• P value approach : if p> α accept Ho.
FLOW CHART FOR HYPOTHESIS TESTING
State H0 as well as H1
χ2 = (O − E )2
∑
E
where O = the observed frequency of any value
E = the expected frequency of any value
The obtained value from the formula is
compared with the value from χ2 table for a
given significance level and the number of
degrees of freedom.
Example:
Are technical support calls equal across all days of the week? (i.e., do calls follow a
uniform distribution?)
Sample data for 10 days per day of week:
( o − e ) 2
χ2 = ∑ i i
ei
• Reject H0 if χ >χ
2 2
α
α
(with k – 1 degrees of
freedom) 0 χ2
Do not Reject H0
reject H0 χ2α
Chi-Square Test Statistic
Contingency Tables
Situations involving multiple population
proportions
Used to classify sample observations according
to two or more characteristics
Also called a cross-tabulation table.
Example:
The following data concerning industrial accidents and absentees
classified according to the types of employee.
χ =∑
2 (O − E )2
with d.f . = (r − 1)(c − 1)
E
where:
O = observed frequency
E = expected frequency
r = number of rows
c = number of columns
Contingency Analysis
Example:
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
for
Area
+
(380 − 415)
2
+
(550 − 585)
2
+
(450 − 415)
2
585 415 585 415 585 415 585 415
= (1225)(0.002409 + 0.001709 + 0.002409 + 0.001709)
i.e. χ2cal =10.0891
Tabulated χ2 for (2-1)(2-1)=1 d.f. at 5% level of significance is 3.841 i.e.
χ2tab=3.841.
Here we see that χ2cal>χ2tab (10.0891>3.841)⇒Ho is rejected i.e. it is highly
significant at 5% level of significance. Thus we conclude that nature of area is
related to voting preference in the election.
Alternative procedure: To calculate the value χ2, we can use the following
formula:
Total
a b a+b
c d c+d
Total a+c b+d N = a+b+c+d
N = 620+380+550+450=2000
N (ad − bc ) 2000(620 × 450 − 380 × 550 )
2 2
χ = 2
= = 10.09165
(a + b )(a + c )(b + d )(c + d ) 1000 ×1170 × 830 × 1000
What is a Hypothesis
Of a test?
• A hypothesis is an I assume the mean AGE
of this class is 50!!!
assumption about the
population parameter. Am I correct? TEST IT!
• A parameter is a
characteristic of the
population, like its
mean or variance.
• The parameter must
be identified before
analysis.
• Steps:
• State the Null Hypothesis
• State its opposite, the Alternative Hypothesis
• Hypotheses are mutually exclusive &
exhaustive
• Sometimes it is easier to form the
alternative hypothesis first.
Hypothesis Testing Process
Assume the
population
mean age is 50.
(Null Hypothesis) Population
The Sample
IsX =20 ≅ µ =50? Mean Is 20
No, not likely!
REJECT
Sample
Null Hypothesis
Reason for Rejecting H0
Sampling Distribution
Our sample
mean (20) we reject the
falls in the null hypothesis
tails!It’s H0 that µ = 50.
not likely!
Hypotyzed
population mean.
20 µ = 50 Sample Mean
Critical
Rejection Value
Regions
α “Area” of the
Rejection region
0
Level of Significance, α and
the Rejection Region
One tail (left) test
α
H0: µ = 0 Critical
H1: µ < 0 Value(s)
0
Rejection
Regions One tail (right) test
H0: µ = 0
α
H1: µ > 0
0
H0: µ = 0
Two tails test
H1: µ ≠ 0 α/2
0
Errors in Making Decisions
• Type I Error
• Reject Null Hypothesis when it is True (“False
Positive”)
• Has Serious Consequences
• Probability of Type I Error Is α
Called Level of Significance
•
• Type II Error
• Do Not Reject Null Hypothesis when it is
False (“False Negative”)
• Probability of Type II Error Is β (
Power 1- β )
α &β Have an Inverse
Relationship
Reduce probability of
one error and the
other one goes up.
0
• Used to Make Rejection Decision
• Used to determine optimal strategies where a decision maker faced with several
decision alternatives and an uncertain, or risky pattern of future events.
• Decision – Definition
Choosing from
alternatives
Determination of
payoff
Identification of all
courses of action
(Strategies)
S2 50 0 100 200
(10)
100 50 0 100
S3
(15)
150 100 50 0
S4
(20)
Decision Making Under Certainty
Manager knows which event will occur
pick the alternative with the best payoff
Possible Future Demand
Alternative Low High
Small facility 200 270
Large facility 160 800
Do nothing 0 0
Conservative
30 m 5m -10 m
Speculative
40 m 10 m -30 m
Counter cyclical
-10 m 0 15 m
• If the prior probabilities for improving economy, stable economy and worsening
economy are 0.1, 0.5 and 0.4, which investment would Warren consider?
Solution
E(conservative) = 0.1*30+0.5*5-0.4*10 =
E(Speculative) = 40*0.1+0.5*10 -30*0.4 =
E(counter cylic) = -10*0.1+ 0*0.5+ 10*0.4 =
Decision Trees