0% found this document useful (0 votes)
96 views997 pages

Final QMM

Uploaded by

JATIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views997 pages

Final QMM

Uploaded by

JATIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 997

CHAPTER ONE

Introduction to Statistics

D 1. The complete collection of all entities under study is called the __________.

E A. sample
Term B. parameter
C. statistic
D. population

B 2. A portion (subset) of the entities under study is called the __________.

E A. parameter
Term B. sample
C. population
D. statistic

1
2 Test Bank

B 3. Manuel Banales, Marketing Director of Plano Power Plants, Inc.'s Electrical


Division, is leading a study to identify and assess the relative importance of
product features. Manuel directs his staff to design a survey questionnaire for
distribution to all of Plano’s 954 customers. For this study, the set of 954
customers is a ________________.

E A. statistic
BApp B. population
C. parameter
D. sample

D 4. Manuel Banales, Marketing Director of Plano Power Plants, Inc.'s Electrical


Division, is leading a study to identify and assess the relative importance of
product features. Manuel directs his staff to design a survey questionnaire for
distribution to 100 of Plano’s 954 customers. For this study, the set of 100
customers is a ________________.

E A. statistic
BApp B. population
C. parameter
D. sample

C 5. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by surveying all 1,500 industrial customers. For this study, the
set of 1,500 industrial customers is a ________________.

E A. statistic
BApp B. sample
C. population
D. parameter

B 6. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by selecting a focus group of 40 industrial customers. For this
study, the set of 40 industrial customers is a ________________.

E A. statistic
BApp B. sample
C. population
D. parameter
Chapter 1: Introduction to Statistics 3

A 7. Abel Alonzo, Director of Human Resources, is exploring the causes of employee


absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). For this study, the set of all employees who worked
at Batesville Bottling during the last operating year is a ________________.

E A. population
BApp B. sample
C. statistic
D. parameter

B 8. Abel Alonzo, Director of Human Resources, is exploring the causes of employee


absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). Personnel records for 50 of the plant's 250
employees are selected for analysis. For this study, the group of 50 employees is a
__________.

E A. population
BApp B. sample
C. parameter
D. statistic
B 9. When a person collects information from the entire population, this is called a
_______.

E A. sample
Term B. census
C. statistic
D. parameter

B 10. Manuel Banales, Marketing Director of Plano Power Plants, Inc.'s Electrical
Division, is leading a study to identify and assess the relative importance of
product features. Manuel directs his staff to design a survey questionnaire for
distribution to all of Plano’s 954 customers. Manuel is ordering a ____________.

M A. statistic from the customers


BApp B. census of the customers
C. sample of the customers
D. sorting of the customers
4 Test Bank

D 11. Manuel Banales, Marketing Director of Plano Power Plants, Inc.'s Electrical
Division, is leading a study to identify and assess the relative importance of
product features. Manuel directs his staff to design a survey questionnaire for
distribution to 100 of Plano’s 954 customers. Manuel is ordering a
____________.

M A. statistic from the customers


BApp B. census of the customers
C. sorting of the customers
D. sample of the customers

B 12. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by surveying all 1,500 industrial customers Sue is ordering a
__________.

M A. statistic from the industrial customers


BApp B. census of the industrial customers
C. sample of the industrial customers
D. sorting of the industrial customers

D 13. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by selecting a focus group of 40 industrial customers. Sue is
ordering a __________.

M A. statistic from the industrial customers


BApp B. census of the industrial customers
C. sorting of the industrial customers
D. sample of the industrial customers

A 14. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1999." Pinky is ordering a
__________________.

M A. census of the payroll records


BApp B. statistic from the payroll records
C. sample of the payroll records
D. sorting of the payroll records
Chapter 1: Introduction to Statistics 5

C 15. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "every tenth
payroll voucher issued since January 1, 1999." Pinky is ordering a
__________________.

M A. census of the payroll records


BApp B. parameter from the payroll records
C. sample of the payroll records
D. sorting of the payroll records

C 16. Upon discovering an improperly adjusted drill press, Jack Joyner, Director of
Quality Control, ordered a 100% inspection of all castings drilled on the evening
shift. Jack is ordering a ___________________.

M A. sample of the castings


BApp B. sorting of the castings
C. census of the castings
D. statistic from the castings

A 17. Upon discovering an improperly adjusted drill press, Jack Joyner, Director of
Quality Control, ordered an inspection of "every fifth casting drilled on the
evening shift." Jack is ordering a ___________________.

M A. sample of the castings


BApp B. sorting of the castings
C. census of the castings
D. parameter from the castings
B 18. Greek letters are commonly used to represent _______.

M A. sample statistics
Term B. population parameters
C. descriptive measures
D. inferential statistics

C 19. Which of the following symbols is used to represent a population parameter?

M A. ~
Term B. #
C. µ
D. ∞
6 Test Bank

A 20. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by surveying all 1,500 industrial customers. One question on the
survey asked the customers to rate “Merchandise is delivered on time” on a scale
of 1 to 5, with 1 meaning “never” and 5 meaning “always”. The average response
of the 1,500 customers to this question is a _________________.

M A. parameter
BApp B. population
C. sample
D. statistic

C 21. Abel Alonzo, Director of Human Resources, is exploring the causes of employee
absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). The average number of absences per employee,
computed from the personnel data of all employees, is a ________________.

M A. population
BApp B. sample
C. parameter
D. statistic

A 22. Abel Alonzo, Director of Human Resources, is exploring the causes of employee
absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). The most appropriate symbol for the average
number of absences per employee, computed from the personnel data of all
employees, is ________________.

M A. µ
Term B. #
C. ~
D. ∞
Chapter 1: Introduction to Statistics 7

D 23. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1999 to determine the percentage of
irregular vouchers." The percentage which Pinky ordered is a
__________________.

M A. sample statistic
BApp B. sample parameter
C. sorted order
D. population parameter

B 24. Statistics are usually represented by _______.

M A. Greek letters
Term B. Roman letters
C. ordinal data
D. interval data

A 25. Which of the following symbols is used to represent a sample statistic?

M A. S
Term B. ~
C. µ
D. ∞

D 26. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by selecting a focus group of 40 industrial customers. One
question asked the focus group customers to rate “Merchandise is delivered on
time” on a scale of 1 to 5, with 1 meaning “never” and 5 meaning “always”. The
average response of the 40 customers to this question is a _________________.

M A. parameter
BApp B. population
C. sample
D. statistic
8 Test Bank

A 27. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "every tenth
payroll voucher issued since January 1, 1999 and a calculation of the percentage
of irregular vouches in this group." The percentage which Pinky ordered is a
__________________.

M A. sample statistic
BApp B. sample parameter
C. sorted order
D. population parameter

D 28. Abel Alonzo, Director of Human Resources, is exploring the causes of employee
absenteeism at Batesville Bottling during the last operating year (January 1, 1999
through December 31, 1999). Personnel records of 50 employees are selected for
analysis. (The plant employees 250.) For this study, the average number days
absent for these 50 employees is a ________________.

M A. population
BApp B. sample
C. parameter
D. statistic

C 29. The lowest level of data measurement is _______.

E A. interval level
Term B. ordinal level
C. nominal level
D. ratio level

D 30. Which of the following operations is meaningful for processing nominal data?

M A. addition
Term B. multiplication
C. ranking
D. counting

A 31. Which scale of measurement has these two properties: linear distance is
meaningful and the location of origin (zero) is arbitrary?

E A. interval level
Term B. ordinal level
C. nominal level
D. ratio level
Chapter 1: Introduction to Statistics 9

D 32. Which scale of measurement has these two properties: linear distance is
meaningful and the location of origin (zero) is absolute (natural)?

E A. interval level
Term B. ordinal level
C. nominal level
D. ratio level

A 33. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by surveying all 1,500 industrial customers. One question on the
survey asked the customers “Which of the following best describes your primary
business: a. manufacturing, b. wholesaler, c. retail, d. service.” The measurement
level for this question is _________________.

E A. nominal level
BApp B. ordinal level
C. interval level
D. ratio level

A 34. A question in a survey of microcomputer users asked “Which operating system do


you use most often: a. Apple OS 7, b. MS DOS, c. MS Windows 95, d. Unix.”
The measurement level for this question is _________________.

E A. nominal level
BApp B. ordinal level
C. interval level
D. ratio level

C 35. Which of the following operations is meaningful for processing ordinal data, but
is meaningless for processing nominal data?

M A. addition
Term B. multiplication
C. ranking
D. counting
10 Test Bank

B 36. Sue Taylor, Director of Global Industrial Sales, is concerned by a deteriorating


sales trend. Specifically, the number of customers is stable at 1,500, but they are
purchasing less each year. She orders her staff to search for causes of the
downward trend by surveying all 1,500 industrial customers. One question on the
survey asked the customers “How many people does your company employ: a. 0 -
25, b. 26 - 100, c. 101 - 1000, d. 1001+.” The measurement level for this question
is __________.

E A. nominal level
BApp B. ordinal level
C. interval level
D. ratio level

C 37. A consumer has been asked to rank five cars based upon their desirability. This
level of measurement is _______.

M A. nominal
BApp B. ratio
C. ordinal
D. interval

C 38. Morningstar Mutual Funds analyzes the risk and performance of mutual funds.
Each mutual fund is assigned an overall rating of one to five stars. One star is the
lowest rating, and five stars is the highest rating. This level of measurement is
__________.

E A. nominal
BApp B. ratio
C. ordinal
D. interval

D 39. A level of data measurement that has an absolute zero is called _______.

E A. nominal
Term B. ordinal
C. interval
D. ratio
Chapter 1: Introduction to Statistics 11

A 40. A person has decided to code a particular set of sales data. A value of 0 is
assigned if the sales occurred on a weekday, and a value of 1 means it happened
on a weekend. This is an example of _______.

E A. nominal level data


BApp B. ordinal level data
C. interval level data
D. ratio level data

A 41. Members of the accounting department's clerical staff were asked to rate their
supervisor's leadership style as either (1) authoritarian or (2) participatory. This is
an example of _____________.

E A. nominal level data


BApp B. ordinal level data
C. interval level data
D. ratio level data

B 42. A market research analyst has asked consumers to rate the appearance of a new
package on a scale of 1 to 5. A 1 means that the appearance is awful while a 5
means that it is excellent. The level of this data is usually considered _______.

M A. nominal
BApp B. ordinal
C. interval
D. ratio

A 43. The social security number of employees would be an example of what level of
data measurement?

E A. nominal
BApp B. ordinal
C. interval
D. ratio

D 44. The dollar sales of a restaurant is an example of what level of data measurement?

M A. nominal
BApp B. ordinal
C. interval
D. ratio
12 Test Bank

D 45. Grades on a test range from 0 to 100. This level of data is _______.

E A. nominal
App B. ordinal
C. interval
D. ratio
C 46. If it were not for the existence of an "absolute zero," ratio data would be
considered the same as _______.

E A. nominal
Term B. ordinal
C. interval
D. descriptive data

D 47. Scholastic Aptitude Test scores are an example of what type of measurement
scale?

M A. nominal
App B. ordinal
C. interval
D. ratio

C 48. Which types of data are normally used with parametric statistics?

M A. ratio and ordinal


Term B. nominal and ordinal
C. interval and ratio
D. interval and ordinal

B 49. Which types of data are normally used with nonparametric statistics?

M A. ratio and ordinal


Term B. nominal and ordinal
C. interval and ratio
D. interval and ordinal

B 50. Using data from a group to generalize to a larger group involves the use of
_______.

M A. descriptive statistics
Term B. inferential statistics
C. population derivation
D. sample persuasion
Chapter 1: Introduction to Statistics 13

B 51. A student makes an 82 on the first test in a statistics course. From this, she
assumes that her average at the end of the semester (after other tests) will be about
82. This is an example of _______.

M A. descriptive statistics
App B. inferential statistics
C. nonparametric statistics
D. wishful thinking

C 52. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Life tests performed on a sample
of 100 batteries indicated an average life of seven years under normal usage.
Jessica recommended a six-year warranty period for the new model. This is an
example of _____________.

M A. descriptive statistics
BApp B. nonparametric statistics
C. inferential statistics
D. nominal data

D 53. Upon discovering an improperly adjusted drill press, Jack Joyner, Director of
Quality Control, ordered an inspection of "every fifth casting drilled on the
evening shift." Less than 1% of the castings were defective; so, Jack released the
evening shift's production to assembly. This is an example of _______________.

M A. nonparametric statistics
BApp B. nominal data
C. descriptive statistics
D. inferential statistics

C 54. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1999." Five percent of the payroll
vouchers contained material errors. This is an example of _______________.

H A. nonparametric statistics
BApp B. nominal data
C. descriptive statistics
D. inferential statistics
14 Test Bank

B 55. A new sales person is paid a commission on each sale. This person made $2,000
his first month on the job. From this he concludes that he will make $24,000
during his first year. This is an example of _______.

E A. descriptive statistics
BApp B. inferential statistics
C. nonparametric statistics
D. nominal data

A 56. A statistics instructor collects information about the background of his students.
About 30% have taken economics and about 40% have taken accounting. There
are 23 male students and 27 female students in this class. This is an example of
_______.

E A. descriptive statistics
App B. inferential statistics
C. nonparametric statistics
D. nominal data

B 57. A market researcher is interested in determining the average income for families
in Duval County, Florida. To accomplish this, she takes a random sample of 400
families from the county and uses the data gathered from these families to
estimate the average income for families of the entire county. This process is an
example of _______.

H A. descriptive statistics
BApp B. inferential statistics
C. intermediate statistics
D. a census

A 58. The Universal Pulp Company has a plant in Portland, Oregon. Management has
decided to determine the average number of sick days taken per worker in this
plant in 1991. To accomplish this, the management gathers records on all the
workers in the plant and averages the number of sick days taken in 1991 by each
worker. This process is using _______.

M A. descriptive statistics
BApp B. inferential statistics
C. company-wide statistics
D. locale-specific statistics
Chapter 1: Introduction to Statistics 15

D 59. The Magnolia Swimming Pool Company wants to determine the average number
of years it takes before a major repair is required on one of the pools that the
company constructs. The president of the company asks Rick Johnson, a
company accountant, to randomly contact fifty families that built Magnolia pools
in the past ten years and determine how long it was in each case until a major
repair. The information will then be used to estimate the average number of years
until a major repair for all pools sold by Magnolia. The average based on the data
gathered from the fifty families can best be described as a _______.

M A. sample
BApp B. population
C. parameter
D. statistic

D 60. The Chamber of Commerce wants to assess its membership's opinions of the
North American Free Trade Agreement. One-hundred of the 2,000 members are
randomly selected and contacted via telephone. Seventy-five reported an overall
favorable opinion, and twenty-five reported an overall unfavorable opinion. The
proportion, 0.75, is a ___________.

M A. sample
BApp B. population
C. parameter
D. statistic

C 61. What proportion of San Diego voters favor trade restrictions with China? In an
effort to determine this, a research team calls every registered voter in San Diego
and successfully contacts them. The proportion from the data gathered from the
calls most likely is a _______.

M A. sample
App B. population
C. parameter
D. statistic
16 Test Bank

D 62. A researcher wants to know what the average variation is in altimeters of small,
privately owned airplanes. The task of determining this is expensive and time
consuming, if even possible, given the large number of such airplanes. The
researcher decides to use government records to randomly locate the owners of ten
such planes and then get permission to test the altimeters. When the researcher is
done, he will use the data gathered from the group of ten to reach conclusions
about all small, privately owned airplanes. This process can best be described as
_______.

H A. data statistics
App B. research statistics
C. descriptive statistics
D. inferential statistics

C 63. A researcher wants to know what the average variation is in altimeters of small,
privately owned airplanes. The task of determining this is expensive and time
consuming, if even possible, given the large number of such airplanes. The
researcher decides to use government records to randomly locate the owners of ten
such planes and then get permission to test the altimeters. When the researcher is
done, he will use the data gathered from the group of ten to reach conclusions
about all small, privately owned airplanes. The data gathered on the group of ten
airplanes is best described as _______.

H A. measurements
App B. data
C. statistics
D. parameters.

C 64. How much inventory do Christmas tree sales lots keep? A researcher goes from
location to location around the city counting the number of trees in each lot.
These numbers most likely represent what level of data?

M A. nominal
BApp B. ordinal
C. ratio
D. interval
Chapter 1: Introduction to Statistics 17

B 65. During the Valentine's season, different offices in a company are encouraged to
decorate their doors. A committee then goes around and ranks the doors
according to how well decorated they are. The best door gets a ranking of one, the
second best gets a ranking of two, etc. The numbers of these rankings represent
which level of data?

M A. interval
BApp B. ordinal
C. nominal
D. ratio

A 66. A large manufacturing company in Indianapolis produces valves for the chemical
industry. According to specifications, one particular valve is supposed to have a
five-inch opening on the side. Quality control inspectors take random samples of
these valves just after the hole is bored. They measure the size of the hole in an
effort to determine if the machine is out-of-adjustment. The measurement of the
diameter of the hole represents which level of data?

M A. ratio
BApp B. nominal
C. ordinal
D. interval

B 67. A marketing demographic survey is undertaken to determine the market potential


for a new product. One of the questions asked is: What type of residence do you
live in? Respondents are offered several possible answers including: house,
apartment, or condominiums. In order to computerize the survey answers, the
responses are coded as a 1 if the answer is "house", a 2 if the answer is an
"apartment", and a 3 if the answer is a "condominium". These numbers, 1, 2, and
3, are an example of what level of data?

E A. interval
BApp B. nominal
C. ordinal
D. ratio
18 Test Bank

C 68. A marketing survey is conducted to ascertain the potentiality of several new


products. A series of focus groups is used to conduct this survey. At the end of
one of the sessions, the group members are asked to rank the remaining eight
products in order of desirability. A one indicates the most favored product and an
eight is awarded to the least desirable. The numbers are examples of what level of
data?

M A. ratio
BApp B. interval
C. ordinal
D. nominal

C 69. A business is attempting to find the best small town in the United States in which
to relocate. As part of the investigation, the elevations of all small towns in the
United States are researched. Some towns are located high in the Rockies with
elevations over 8,000 feet. There are even some towns located in the south central
valley of California with elevations below sea level (for example, elevations of
around -100 feet). These elevations can best be described as what level of data?

M A. nominal
BApp B. ordinal
C. interval
D. ratio

D 70. The unemployment rate is often used as an indicator of a community’s economic


vitality. An unemployment rate is best described as what level of measurement?

E A. nominal
BApp B. ordinal
C. interval
D. ratio

D 71. The apartment vacancy rate is often used as an indicator of a community’s need
for residential housing construction. The apartment vacancy rate is best described
as what level of measurement?

E A. nominal
BApp B. ordinal
C. interval
D. ratio
Chapter 1: Introduction to Statistics 19

A 72. An investor evaluates the performance of her portfolio by calculating the


portfolio’s rate of return. The portfolio’s rate of return can best be described as
what level of data?

M A. ratio
BApp B. interval
C. ordinal
D. nominal

D 73. A business is attempting to find the best small town in the United States in which
to relocate. As part of the investigation, the availability of vocational, technical
education in small towns in the United States are researched. Vocational,
technical education is available within fifty miles of some small town; for others it
is not. The availability of vocational, technical education can best be described as
what level of data?

M A. ratio
BApp B. interval
C. ordinal
D. nominal

B 74. Colleges and universities often assign numbers as student identification numbers.
These numbers are best categorized as what level of data?

E A. interval
App B. nominal
C. ordinal
D. ratio

B 75. Undergraduate university students are usually classified as freshmen, sophomores,


juniors, and seniors. If numeric codes of 1, 2, 3, and 4 are used to represent these
four categories, the level of data measurement is ________________.

E A. nominal
App B. ordinal
C. interval
D. ratio
20 Test Bank

B 76. At many universities, students are assigned a grade of A, B, C, D, or F at the


completion of each course. These letter grades are translated to 4, 3, 2, 1, and 0
grade (quality) points, respectively. The level of data measurement is
____________.

M A. nominal
App B. ordinal
C. interval
D. ratio

C 77. Financial institutions are often ranked by the volume of deposits. This ranking is
what level of measurement?

E A. ratio
BApp B. interval
C. ordinal
D. nominal

A 78. A chemical plant in Louisiana has a tank that holds a particular chemical. Every
day, a technician records the meter reading on the side of the tank which tells how
much volume of fluid there is in the tank. Most likely, these volume readings are
what level of data?

H A. ratio
BApp B. interval
C. ordinal
D. nominal

C 79. Most evening news weather reports in the United States still use Fahrenheit
temperature readings to convey the warmth of the air in various locations across
the country. These Fahrenheit temperature readings would most likely be
categorized as what level of data?

M A. ordinal
App B. ratio
C. interval
D. nominal
Chapter 1: Introduction to Statistics 21

D 80. Suppose you want to monitor the success or failure of a day trader in the stock
market. To do this, you assume that the trader starts the day with zero earnings.
At the end of the day you determine how many dollars the trader or lost at the end
of the day. These dollar measurements are most likely which level of data?

M A. unit
BApp B. nominal
C. ordinal
D. interval

A 81. At the Olympics, the first three places in each event are awarded a medal. The
winner of an event is awarded a 1 to represent first place, the runner-up is
awarded a 2, and the third place finisher is awarded a 3. These numbers, 1, 2, and
3 are what level of data measurement?

E A. ordinal
App B. nominal
C. ratio
D. interval

B 82. The highest level of data is _______.

E A. interval
Term B. ratio
C. nominal
D. ordinal

D 83. Which type of statistics require that the data be at least interval or ratio level data?

M A. inferential statistics
Term B. descriptive statistics
C. nonparametric statistics
D. parametric statistics
B 84. Nominal and ordinal data are sometimes classified as _______.

H A. metric data
Term B. nonmetric data
C. descriptive data
D. inferential data

D 85. Which of the levels of data measurement have the highest usage potential? That
is, if you have this level of data, you can analyze it in more ways than with other
levels of data?
22 Test Bank

E A. nominal
Term B. ordinal
C. interval
D. ratio

C 86. Moody's Investor's Service uses nine ratings of corporate bonds to assist potential
investors assess their risk.

Rating Meaning
Aaa Best quality
Aa High quality
A Higher medium quality
Baa Lower medium quality
Ba Possess speculative elements
B Lack characteristics of desirable investment
Caa Poor standing
Ca Speculative in a high degree
C Extremely poor prospects
The level of data measurement in Moody's bond ratings is ______________.

M A. ratio
BApp B. interval
C. ordinal
D. nominal
Chapter 1: Introduction to Statistics 23

B 87. Standard & Poor's Corporation uses nine ratings of corporate bonds to assist
potential investors assess their risk.

Rating Meaning
AAA Highest grade
AA High grade
A Upper medium quality
BBB Medium grade
BB Lower Medium grade
B Speculative
CCC Outright speculations
CC Outright speculations
C Income bonds on which no interest is being paid
DDD In default, with rating indicating relative salvage value
DD In default, with rating indicating relative salvage value
D In default, with rating indicating relative salvage value
The level of data measurement in Standard & Poor's bond ratings is
______________.

M A. ratio
BApp B. ordinal
C. interval
D. nominal

D 88. Mac User's magazine rates products for the Apple Macintosh on a scale from one
mouse to five mice. One mouse indicates low value/performance. Five mice
indicate highest value/performance.

M A. nominal
App B. interval
C. ratio
D. ordinal
24 Test Bank

D 89. During a strategy planning session, the executives of Plano Power Plants, Inc.
identified thirty-one significant threats to the future health of the corporation.
Toward the end of the session, Paul Pearson, a management consultant, asked the
executives to rate each threat on a scale of to . The level of data
measurement in Paul's threat rating is ______________.

M A. nominal
BApp B. interval
C. ratio
D. ordinal

A 90. The RSACi system uses five levels (0, 1, 2, 3, and 4) to provide consumers with
information about the level of sex, nudity, violence, offensive language (vulgar or
hate-motivated) in Web sites. A level 4 site may have crude, vulgar language, or
extreme hate speech; while a level 0 site would not have even the mildest
expletives. The level of data measurement in RSACi rating system is ______.

M A. ordinal
BApp B. interval
C. ratio
D. nominal

B 91. The ranking of a company in the Fortune 500 is an example of ______ level of
data measurement.

M A. ratio
BApp B. ordinal
C. categorical
D. nominal

D 92. A company's Federal Tax I.D. is an example of ______ level of data


measurement.

M A. ratio
BApp B. ordinal
C. rank order
D. nominal

A 93. The United States trade balance is an example of ______ level of data
measurement.

M A. ratio
BApp B. ordinal
Chapter 1: Introduction to Statistics 25

C. rank order
D. nominal

D 94. The telephone area code of clients in the United States is an example of ______
level of data measurement.

M A. ratio
BApp B. ordinal
C. rank order
D. nominal

A 95. A corporation's Price/Earrings ratio (P/E) is calculated by dividing its current


earnings per share into current market price per share. P/E is an example of
______ level of data measurement.

M A. ratio
BApp B. interval
C. rank order
D. nominal

A 96. Per capita income for a geographic region is calculated by the number of people
residing in the region into the total personal income of all persons residing in the
region. Per capita income is an example of ______ level of data measurement.

M A. ratio
BApp B. ordinal
C. interval
D. nominal

B 97. Classifying households as low-income, middle-income, or high-income is an


example of ______ level of data measurement.

M A. nominal
BApp B. ordinal
C. ratio
D. interval

C 98. Which of the following symbols is used to represent a population parameter?

M A. σ
Term B. @
C. &
26 Test Bank

D. ∞
CHAPTER TWO

Charts and Graphs

C 1. A financial analyst has randomly selected 200 companies from those traded on the
NYSE. At the end of each trading day, the analyst records the closing price for
each of the 200 companies. These 200 measurements are an example of
__________.

E A. an ogive
Term B. grouped data
C. raw data
D. a stem and leaf plot

C 2. If data are grouped into intervals and the number of items in each group is listed,
this could be called a _______.

E A. ogive
Term B. histogram
C. frequency distribution

27
28 Test Bank

C 3. A graphical representation of a frequency distribution is called a _______.

E A. stem and leaf plot


Term B. ogive
C. histogram
D. pie chart

A 4. The width of a class interval in a frequency distribution will be approximately


equal to the range divided by _______.

M A. the number of class intervals


Term B. the highest number in the data set
C. the lowest number in the data set
D. the midpoint of the middle class

D 5. If the individual class frequency is divided by the total frequency, the result is the
_______.

M A. midpoint frequency
Term B. cumulative frequency
C. stem and leaf plot
D. relative frequency

A 6. A cumulative frequency polygon is also called _______.

E A. an ogive
Term B. a histogram
C. a frequency polygon
D. a stem and leaf plot

B 7. A histogram can be described as _______.

E A. a graphical depiction of an ogive


Term B. a vertical bar chart
C. a vertical stem and leaf plot
D. a three dimensional pie chart
Chapter 2: Charts and Graphs 29

C 8. The number of class intervals in a frequency distribution is usually between


_______.

E A. 3 and 5
Term B. 7 and 9
C. 5 and 15
D. 1 and 25

B 9. One advantage of a stem and leaf plot over a frequency distribution is that
_______.

E A. it contains more class intervals


Term B. the values of the original data are retained
C. the class midpoints are used as the stem
D. the class midpoints are used as the leaf

B 10. One rule that must always be followed in constructing frequency distributions is
that _______.

E A. the number of classes must be less than 10


Term B. each data point can only fall into one class
C. the width of each class is equal to the range
D. the number of intervals must be an odd number

A 11. One rule that must always be followed in constructing frequency distributions is
that _______.

E A. adjacent classes must not overlap


Term B. the midpoint of each class must be a whole number
C. the width of each class is equal to the range
D. the number of intervals must be an odd number

D 12. Which of the following is best to show the percentage of a total budget that is
spent on each category of items?

E A. histogram
Term B. ogive
C. stem and leaf chart
D. pie chart
30 Test Bank

B 13. A cumulative frequency distribution would provide _______.

E A. a graph of a frequency distribution


Term B. a running total of the frequencies in the classes
C. the proportion of the total frequencies which fall into each class
D. a very cloudy picture of the frequencies

B 14. What is the midpoint of the class interval 10 - under 12?

E A. 22
Calc B. 11
C. 10.5
D. 11.5

C 15. What is the midpoint of the class interval 20 - under 25?

E A. 47
Calc B. 20
C. 22.5
D. 23

B 16. What is the midpoint of the class interval 6 - under 9?

E A. 15
Calc B. 7.5
C. 3
D. 1.5

C 17. Consider the following frequency distribution:


Class Interval Frequency
10-under 20 15
20-under 30 25
30-under 40 10
What is the midpoint of the first class?

E A. 10
Calc B. 20
C. 15
D. none of the above
Chapter 2: Charts and Graphs 31

B 18. Consider the following frequency distribution:


Class Interval Frequency
10-under 20 15
20-under 30 25
30-under 40 10
What is the relative frequency of the first class?

E A. 0.15
Calc B. 0.30
C. 0.10
D. none of the above
B 19. Consider the following frequency distribution:
Class Interval Frequency
10-under 20 15
20-under 30 25
30-under 40 10
What is the cumulative frequency of the second class interval?

E A. 25
Calc B. 40
C. 15
D. 50

C 20. Consider the following frequency distribution:


Class Interval Frequency
10-under 20 15
20-under 30 25
30-under 40 10
What is the approximate range of this data?

E A. 10
Calc B. 20
C. 30
D. 40
32 Test Bank

D 21. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the midpoint of the last class?

E A. 80
Calc B. 100
C. 95
D. 90

C 22. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the relative frequency of the second class?

E A. 0.45
Calc B. 0.90
C. 0.225
D. 0.75

C 23. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the cumulative frequency of the third class?

E A. 80
Calc B. 0.40
C. 155
D. 75
Chapter 2: Charts and Graphs 33

A 24. The number of phone calls arriving at a switchboard each hour has been recorded,
and the following frequency distribution has been developed.
Class Interval Frequency
20-under 40 30
40-under 60 45
60-under 80 80
80-under 100 45
What is the approximate range of the number of phone calls arriving each hour?

E A. 80
Calc B. 200
C. 20
D. 100

C 25. Consider the following stem and leaf plot:


Stem Leaf
1 0, 2, 5, 7
2 2, 3, 4, 4
3 0, 4, 6, 6, 9
4 5, 8, 8, 9
5 2, 7, 8
Suppose that a frequency distribution was developed from this, and there were 5
classes (10-under 20, 20-under 30, etc.). What would the frequency be for class
30-under 40?

M A. 3
Calc B. 4
C. 5
D. 9
34 Test Bank

C 26. Consider the following stem and leaf plot:


Stem Leaf
1 0, 2, 5, 7
2 2, 3, 4, 8
3 0, 4, 6, 6, 9
4 5, 8, 8, 9
5 2, 7, 8
Suppose that a frequency distribution was developed from this, and there were 5
classes (10-under 20, 20-under 30, etc.). What would be the relative frequency of
the class 20-under 30?

M A. 0.4
Calc B. 0.25
C. 0.20
D. 4

B 27. Consider the following stem and leaf plot:


Stem Leaf
1 0, 2, 5, 7
2 2, 3, 4, 8
3 0, 4, 6, 6, 9
4 5, 8, 8, 9
5 2, 7, 8
Suppose that a frequency distribution was developed from this, and there were 5
classes (10-under 20, 20-under 30, etc.). What was the highest number in the data
set?

M A. 50
App B. 58
C. 59
D. 100
Chapter 2: Charts and Graphs 35

B 28. Consider the following stem and leaf plot:


Stem Leaf
1 0, 2, 5, 7
2 2, 3, 4, 8
3 0, 4, 6, 6, 9
4 5, 8, 8, 9
5 2, 7, 8
Suppose that a frequency distribution was developed from this, and there were 5
classes (10-under 20, 20-under 30, etc.). What was the lowest number in the data
set?

M A. 0
App B. 10
C. 7
D. 2

B 29. Consider the following stem and leaf plot:


Stem Leaf
1 0, 2, 5, 7
2 2, 3, 4, 8
3 0, 4, 6, 6, 9
4 5, 8, 8, 9
5 2, 7, 8
Suppose that a frequency distribution was developed from this, and there were 5
classes (10-under 20, 20-under 30, etc.). Most of the numbers in the 40-under 50
class are _______.

M A. close to 40
App B. close to 50
C. equal to 45
D. between 41 and 44
36 Test Bank

C 30. Consider the following stem and leaf plot:


Stem Leaf
1 0, 2, 5, 7
2 2, 3, 4, 8
3 0, 4, 6, 6, 9
4 5, 8, 8, 9
5 2, 7, 8
Suppose that a frequency distribution was developed from this, and there were 5
classes (10-under 20, 20-under 30, etc.). What is the cumulative frequency for the
30-under 40 class interval?

M A. 5
App B. 9
C. 13
D. 14

C 31. Cumulative frequencies are usually represented graphically by _______.

E A. histograms
Term B. pie charts
C. ogives
D. frequency polygons

B 32. An instructor has decided to graphically represent the grades on a test. The
instructor uses a plus/minus grading system (i.e. she gives grades of A-, B+, etc.).
Which of the following would provide the most information for the students?

M A. a histogram
App B. a stem and leaf plot
C. a cumulative frequency distribution
D. a frequency distribution
Chapter 2: Charts and Graphs 37

B 33. The following represent the ages of students in a class:


19, 23, 21, 19, 19, 20, 22, 31, 21, 20
If a stem and leaf plot were to be developed from this, how many stems would
there be?

M A. 2
App B. 3
C. 4
D. 10

B 34. The difference between the highest number and the lowest number in a set of data
is called the _______.

E A. difference
Term B. range
C. polygonal frequency
D. relative frequency

C 35. A person has decided to construct a frequency distribution for a set of data
containing 60 numbers. The lowest number is 23 and the highest number is 68. If
5 classes are used, the class width should be approximately _______.

E A. 4
Calc B. 12
C. 9
D. 5

B 36. A person has decided to construct a frequency distribution for a set of data
containing 60 numbers. The lowest number is 23 and the highest number is 68. If
7 classes are used, the class width should be approximately _______.

E A. 6
Calc B. 7
C. 9
D. 11
D 37. A frequency distribution was developed. The lower endpoint of the first class is
9.30, and the midpoint is 9.35. What is the upper endpoint of this class?

E A. 9.50
Calc B. 9.60
C. 9.70
D. 9.40
38 Test Bank

C 38. The cumulative frequency for a class is 27. The cumulative frequency for the next
(non-empty) class will be _______.

E A. less than 27
App B. equal to 27
C. greater than 27
D. 27 minus the next class frequency

B 39. Which of the following would be most helpful if you wished to construct a pie
chart?

E A. a frequency distribution
App B. a relative frequency distribution
C. a cumulative frequency distribution
D. an ogive

B 40. A person has constructed a frequency distribution for the grades on a test. This
person is not sure how to do this, and thus only 7 classes were developed, and
each class width was set at 10 units. If the lowest possible score is 0 and the
highest possible score is 100, which of the following is true?

M A. all numbers between 0 and 100 belong to one of these classes


App B. some number might not fit into any of these classes
C. some of these classes would have to overlap
D. there are too many classes

A 41. In a histogram, the highest bar represents the class with _______.

E A. the highest frequency


App B. the lowest frequency
C. the highest cumulative frequency
D. the lowest relative frequency
Chapter 2: Charts and Graphs 39

C 42. The following class intervals for a frequency distribution were developed to
provide information regarding the starting salaries for students graduating from a
particular school:
Salary Number of Graduates
($1,000s)
18-under 21 -
21-under 25 -
24-under 27 -
29-under 30 -
Before data was collected, someone questioned the validity of this arrangement.
Which of the following represents a problem with this set of intervals?

M A. there are too many intervals


App B. the class widths are too small
C. some numbers between 18,000 and 30,000 would fall into two different
intervals
D. the first and the second interval overlap

C 43. The following class intervals for a frequency distribution were developed to
provide information regarding the starting salaries for students graduating from a
particular school:
Salary Number of Graduates
($1,000s)
18-under 21 -
21-under 25 -
24-under 27 -
29-under 30 -
Before data was collected, someone questioned the validity of this arrangement.
Which of the following represents a problem with this set of intervals?

M A. there are too many intervals


App B. the class widths are too small
C. some numbers between 18,000 and 30,000 would not fall into any of these
intervals
D. the first and the second interval overlap
40 Test Bank

D 44. The following class intervals for a frequency distribution were developed to
provide information regarding the starting salaries for students graduating from a
particular school:
Salary Number of Graduates
($1,000s)
18-under 21 -
21-under 25 -
24-under 27 -
29-under 30 -
Before data was collected, someone questioned the validity of this arrangement.
Which of the following represents a problem with this set of intervals?

M A. there are too many intervals


App B. the class widths are too small
C. the class widths are too large
D. the second and the third interval overlap

D 45. Abel Alonzo, Director of Human Resources, is exploring employee absenteeism


at the Harrison Haulers Plant during the last operating year. A review of all
personnel records indicated that absences ranged from zero to twenty-nine days
per employee. The following class intervals were proposed for a frequency
distribution of absences.
Absences Number of Employees
(days)
0-under 5 -
5-under 10 -
10-under 15 -
15-under 20 -
20-under 25 -
25-under 30 -
Which of the following represents a problem with this set of intervals?

M A. there are too few intervals


BApp B. some numbers between 0 and 29 would not fall into any interval
C. the first and second interval overlap
D. none of the above (These intervals are okay.)
Chapter 2: Charts and Graphs 41

B 46. Abel Alonzo, Director of Human Resources, is exploring employee absenteeism


at the Harrison Haulers Plant during the last operating year. A review of all
personnel records indicated that absences ranged from zero to twenty-nine days
per employee. The following class intervals were proposed for a frequency
distribution of absences.
Absences Number of Employees
(days)
0-under 5 -
5-under 10 -
10-under 15 -
20-under 25 -
25-under 30 -
Which of the following represents a problem with this set of intervals?

M A. there are too few intervals


BApp B. some numbers between 0 and 29, inclusively, would not fall into any interval
C. the first and second interval overlap
D. there are too many intervals

A 47. Abel Alonzo, Director of Human Resources, is exploring employee absenteeism


at the Harrison Haulers Plant during the last operating year. A review of all
personnel records indicated that absences ranged from zero to twenty-nine days
per employee. The following class intervals were proposed for a frequency
distribution of absences.
Absences Number of Employees
(days)
0-under 10 -
10-under 20 -
20-under 30 -
Which of the following represents a problem with this set of intervals?

E A. there are too few intervals


BApp B. some numbers between 0 and 29 would not fall into any interval
C. the first and second interval overlap
D. there are too many intervals
42 Test Bank

A 48. Consider the relative frequency distribution given below:


Class Interval Relative Frequency
20-under 40 0.2
40-under 60 0.3
60-under 80 0.4
80-under 100 0.1
There were 60 numbers in the data set. How many numbers were in the interval
20-under 40?

E A. 12
Calc B. 20
C. 40
D. 10

C 49. Consider the relative frequency distribution given below:


Class Interval Relative Frequency
20-under 40 0.2
40-under 60 0.3
60-under 80 0.4
80-under 100 0.1
There were 60 numbers in the data set. How many numbers were in the interval
40-under 60?

E A. 30
Calc B. 50
C. 18
D. 12

D 50. Consider the relative frequency distribution given below:


Class Interval Relative Frequency
20-under 40 0.2
40-under 60 0.3
60-under 80 0.4
80-under 100 0.1
There were 60 numbers in the data set. How many of the number were less than
80?

E A. 90
Calc B. 80
C. 0.9
D. 54
Chapter 2: Charts and Graphs 43

B 51. Consider the following frequency distribution:


Class Interval Frequency
100-under 200 25
200-under 300 45
300-under 400 30
What is the midpoint of the first class?

E A. 100
Calc B. 150
C. 25
D. 250

A 52. Consider the following frequency distribution:


Class Interval Frequency
100-under 200 25
200-under 300 45
300-under 400 30
What is the relative frequency of the second class interval?

E A. 0.45
Calc B. 0.70
C. 0.30
D. 0.33

C 53. Consider the following frequency distribution:


Class Interval Frequency
100-under 200 25
200-under 300 45
300-under 400 30
What is the cumulative frequency of the second class interval?

E A. 25
Calc B. 45
C. 70
D. 250
44 Test Bank

C 54. Consider the following frequency distribution:


Class Interval Frequency
100-under 200 25
200-under 300 45
300-under 400 30
What is the approximate range of the data?

E A. 100
Calc B. 25
C. 300
D. 400

B 55. Consider the following frequency distribution:


Class Interval Frequency
100-under 200 25
200-under 300 45
300-under 400 30
What is the midpoint of the last class interval?

E A. 15
Calc B. 350
C. 300
D. 200

B 56. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The relative frequency of the first class interval is _________.

E A. 0.50
BCalc B. 0.33
C. 0.40
D. 0.27
Chapter 2: Charts and Graphs 45

C 57. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The cumulative frequency of the second class interval is _________.

E A. 1,500
BCalc B. 500
C. 900
D. 1,000

D 58. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The approximate range of the data is _________.

E A. 1,500
BCalc B. 2
C. 400
D. 10
46 Test Bank

C 59. Pinky Bauer, Chief Financial Officer of Harrison Haulers, Inc., suspects
irregularities in the payroll system, and orders an inspection of "each and every
payroll voucher issued since January 1, 1993." Each payroll voucher was
inspected and the following frequency distribution was compiled.
Errors Per Voucher Number of Vouchers
0-under 2 500
2-under 4 400
4-under 6 300
6-under 8 200
8-under 10 100
The midpoint of the first class interval is _________.

E A. 500
BCalc B. 2
C. 1.5
D. 1

D 60. The staffs of the accounting and the quality control departments rated their
respective supervisor's leadership style as either (1) authoritarian or (2)
participatory. Sixty-eight percent of the accounting staff rated their supervisor
"authoritarian," and thirty-two percent rated him "participatory." Forty percent of
the quality control staff rated their supervisor "authoritarian," and sixty percent
rated her "participatory." The best graphic depiction of these data would be two
___________________.

E A. histograms
BApp B. frequency polygons
C. ogives
D. pie charts
Chapter 2: Charts and Graphs 47

B 61. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Accelerated life tests were
performed on a sample of 100 batteries, and the following relative frequency
distribution was compiled.
Battery Life Relative Frequency
(months)
40-under 50 0.05
50-under 60 0.10
60-under 70 0.25
70-under 80 0.50
80-under 100 0.10
The number of batteries in 40-under 50 interval was _________.

E A. 45
BCalc B. 5
C. 10
D. 15

A 62. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Accelerated life tests were
performed on a sample of 100 batteries, and the following relative frequency
distribution was compiled.
Battery Life Relative Frequency
(months)
40-under 50 0.05
50-under 60 0.10
60-under 70 0.25
70-under 80 0.50
80-under 100 0.10
The number of batteries in 60-under 70 interval was _________.

E A. 25
BCalc B. 65
C. 40
D. 60
48 Test Bank

D 63. Jessica Salas, president of Salas Products, is reviewing the warranty policy for her
company's new model of automobile batteries. Accelerated life tests were
performed on a sample of 100 batteries, and the following relative frequency
distribution was compiled.
Battery Life Relative Frequency
(months)
40-under 50 0.05
50-under 60 0.10
60-under 70 0.25
70-under 80 0.50
80-under 100 0.10
The number of batteries which lasted less than 60 months was _________.

M A. 10
BCalc B. 55
C. 5
D. 15

C 64. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter Number of Holes
(inches)
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
The percentage of holes under 1" in diameter was _____________.

E A. 33%
BCalc B. 60%
C. 67%
D. 50%
Chapter 2: Charts and Graphs 49

B 65. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter Number of Holes
(inches)
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
The number of holes under 1" in diameter was _____________.

M A. 20
BCalc B. 60
C. 25
D. 30

B 66. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter Number of Holes
(inches)
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
The midpoint of the third class interval is _____________.

E A. 1.025
BCalc B. 25
C. 35
D. 0.975
50 Test Bank

C 67. The U.S. PC market is very competitive. In 1998 unit-shipment market shares
were: Dell 13.4%; Compaq 15.0%; Gateway 8.2%; Hewlett-Packard 8.4%; IBM
8.9%; and others 46.1%.
The best graphic depiction of these data would be ___________________.

E A. a histogram
BApp B. a frequency polygon
C. a pie chart
D. an ogive

B 68. The U.S. PC market is very competitive. In 1998 unit-shipment market shares
were: Dell 13.4%; Compaq 15.0%; Gateway 8.2%; Hewlett-Packard 8.4%; IBM
8.9%; and others 46.1%. In 1999 unit-shipment market shares were: Dell 17.1%;
Compaq 15.3%; Gateway 9.3%; Hewlett-Packard 8.2%; IBM 7.6%; and others
42.5%.
The best graphic depiction of these data would be ___________________.

E A. a pie chart
BApp B. two pie charts
C. a histogram
D. two histograms

A 69. The Worlwide PC market is very competitive. In 1998 unit-shipment market


shares were: Dell 8.2%; Compaq 13.4%; Gateway 3.8%; Hewlett-Packard 5.9%;
IBM 8.4%; and others 60.3%.
The best graphic depiction of these data would be ___________________.

E A. a pie chart
BApp B. a histogram
C. a frequency polygon
D. an ogive

D 70. The Worlwide PC market is very competitive. In 1998 unit-shipment market


shares were: Dell 8.2%; Compaq 13.4%; Gateway 3.8%; Hewlett-Packard 5.9%;
IBM 8.4%; and others 60.3%. In 1999 unit-shipment market shares were: Dell
10.8%; Compaq 12.8%; Gateway 4.3%; Hewlett-Packard 6.2%; IBM 7.6%; and
others 58.3%.
The best graphic depiction of these data would be ___________________.

E A. a histogram
BApp B. two histograms
C. a pie chart
Chapter 2: Charts and Graphs 51

D. two pie charts


B 71. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts.

1999 2000
C 22% C 19%

A
A B B
35%
33% 45% 46%
Which of the following is true?

E A. Only company B gained market share.


BApp B. Only company C lost market share.
C. Company A lost market share.
D. Company B lost market share.

A 72. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts. Total sales for this
industry were $1.5 billion in 1999 and $1.8 billion in 2000.

1999 2000
C 22% C 19%

A
A B B
35%
33% 45% 46%
Company C’s sales in 2000 were ___________.

E A. $342 million
BApp B. $630 million
C. $675 million
D. $828 million
52 Test Bank

D 73. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts. Total sales for this
industry were $1.5 billion in 1999 and $1.8 billion in 2000.

1999 2000
C 22% C 19%

A
A B B
35%
33% 45% 46%
Company B’s sales in 1999 were ___________.

E A. $342 million
BApp B. $630 million
C. $675 million
D. $828 million

A 74. The 1999 and 2000 market share data of the three competitors (A, B, and C) in an
oligopolistic industry are presented in the following pie charts.

1999 2000
C 22% C 19%

A
A B B
35%
33% 45% 46%
Which of the following MAY BE a false statement?

M A. Sales revenues declined at company C.


BApp B. Only company C lost market share.
C. Company A gained market share.
Chapter 2: Charts and Graphs 53

D. Company B gained market share.


75. Liz Chapa manages a portfolio of 250 common stocks. Her staff compiled the
following frequency distribution of dividends received (in $/share) during the
previous year.
Dividends Number of Stocks
($/share)
$0-under $0.50 25
0.50-under 1.00 50
1.00-under 1.50 100
1.50-under 2.00 50
2.00-under 2.50 25
Construct a histogram of the dividend frequency distribution on the following
grid.

E
BCalc

76. Liz Chapa manages a portfolio of 250 common stocks. Her staff compiled the
following frequency distribution of dividends received (in $/share) during the
previous year.
Dividends Number of Stocks
($/share)
$0-under $0.50 25
0.50-under 1.00 50
1.00-under 1.50 100
1.50-under 2.00 50
2.00-under 2.50 25
Construct a frequency polygon of the dividend frequency distribution on the
following grid.

E
BCalc
54 Test Bank

77. Liz Chapa manages a portfolio of 250 common stocks. Her staff compiled the
following frequency distribution of dividends received (in $/share) during the
previous year.
Dividends ($/share) Number of Stocks
$0-under $0.50 25
0.50-under 1.00 50
1.00-under 1.50 100
1.50-under 2.00 50
2.00-under 2.50 25

Construct a cumulative frequency ogive on the following grid.

E
BCalc

78. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Data collected from
measuring the ninety holes were compiled to form the following frequency
distribution.
Hole Diameter (inches) Number of Holes
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10

Construct a histogram on the following grid.

E
BCalc
Chapter 2: Charts and Graphs 55

79. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Measurements from the
ninety holes were compiled to form the following frequency distribution.
Hole Diameter (inches) Number of Holes
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10

Construct a frequency polygon on the following grid.

E
BCalc

80. Chili Robinson, Director of Quality Control, is concerned about the variability in
a drilling process. The process should produce 1" holes in aluminum castings. A
sample of ninety castings are drilled and inspected. Measurements from the
ninety holes were compiled to form the following frequency distribution.
Hole Diameter (inches) Number of Holes
0.85-under 0.90 10
0.90-under 0.95 20
0.95-under 1.00 30
1.00-under 1.05 20
1.05-under 1.10 10
Construct a cumulative frequency ogive on the following grid.

E
BCalc
56 Test Bank

B 81. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.

The total number of sales transactions on Saturday was _____________.

E A. 200
BCalc B. 500
C. 300
D. 100

D 82. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.

The percentage of sales transactions on Saturday that were under $100 each was
_____________.

M A. 100
BCalc B. 10
C. 80
D. 20
Chapter 2: Charts and Graphs 57

C 83. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.

The percentage of sales transactions on Saturday that were at least $100 each was
_____________.

M A. 100
BCalc B. 10
C. 80
D. 20

C 84. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and an ogive of sales transactions by dollar value of the transactions. Saturday's
cumulative frequency ogive follows.

The percentage of sales transactions on Saturday that were between $100 and
$150 was _____________.

M A. 20%
BCalc B. 40%
C. 60%
D. 80%
58 Test Bank

D 85. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and a histogram of sales transactions by dollar value of the transactions. Friday's
histogram follows.

On Tuesday, the approximate number of sales transactions in the 125-under 150


category was _____________.

E A. 50
BCalc B. 100
C. 150
D. 200

C 86. Each day, the office staff at Oasis Quick Shop prepares a frequency distribution
and a histogram of sales transactions by dollar value of the transactions. Friday's
histogram follows.

On Tuesday, the approximate number of sales transactions between $100 and


$150 was _____________.

E A. 100
BCalc B. 200
C. 300
Chapter 2: Charts and Graphs 59

D. 400
D 87. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.

The total number of walk-in customers included in the study was _________.

E A. 100
BCalc B. 250
C. 300
D. 450

A 88. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.

The percentage of walk-in customers waiting one minute or less was _________.

E A. 22%
BCalc B. 11%
C. 67%
D. 10%
60 Test Bank
Chapter 2: Charts and Graphs 61

B 89. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.

The percentage of walk-in customers waiting more than 6 minutes was ______.

E A. 22%
BCalc B. 11%
C. 67%
D. 10%

C 90. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a cumulative frequency ogive of waiting time for walk-in customers.

The percentage of walk-in customers waiting between 1 and 6 minutes was ___.

M A. 22%
BCalc B. 11%
C. 67%
D. 10%
62 Test Bank

D 91. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a frequency histogram of waiting time for walk-in customers.

Approximately _____ walk-in customers waited less than 2 minutes.

E A. 20
BCalc B. 30
C. 100
D. 180

B 92. The staff of Mr. Wayne Wertz, VP of Operations at Portland Peoples Bank,
prepared a frequency histogram of waiting time for walk-in customers.

Approximately ____ walk-in customers waited at least 7 minutes.

E A. 20
BCalc B. 30
C. 100
D. 180
Chapter 2: Charts and Graphs 63

B 93. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.

100%

Cumulative Percentage
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

M arket Capitalization
($1,000,000)

The median market capitalization of the corporations was _________


($1,000,000).

E A. 200
BCalc B. 26
C. 12
D. 60
64 Test Bank

C 94. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.

100%
Cumulative Percentage 90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

M arket Capitalization
($1,000,000)

The percentage of corporations with capitalization of $50,000,000 or less was


_________.

E A. 38
BCalc B. 26
C. 62
D. 43
Chapter 2: Charts and Graphs 65

A 95. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.

100%

Cumulative Percentage
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

M arket Capitalization
($1,000,000)

The percentage of corporations with capitalization exceeding $50,000,000 was


_________.

E A. 38
BCalc B. 26
C. 62
D. 43
66 Test Bank

B 96. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.

100%
Cumulative Percentage 90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

M arket Capitalization
($1,000,000)

The percentage of corporations with capitalization of $150,000,000 or less was


_________.

E A. 38
BCalc B. 85
C. 62
D. 15
Chapter 2: Charts and Graphs 67

D 97. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a cumulative percentage ogive of market capitalization of the 937
corporations listed on the American Stock Exchange in January 2000.

100%

Cumulative Percentage
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

M arket Capitalization
($1,000,000)

The percentage of corporations with capitalization exceeding $150,000,000 was


_________.

E A. 38
BCalc B. 85
C. 62
D. 15
68 Test Bank

B 98. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a frequency histogram of market capitalization of the 937 corporations
listed on the American Stock Exchange in January 2000.

AMEX Listed Securities


800
Number of Issues
600

400

200

0
$100 $200 $300 $400 $500

Market Capitalization
($1,000,000)

Approximately ________ corporations had capitalization exceeding


$200,000,000.

E A. 50
BCalc B. 100
C. 700
D. 800
Chapter 2: Charts and Graphs 69

D 99. The staff of Ms. Tamara Hill, VP of Technical Analysis at Blue Sky Brokerage,
prepared a frequency histogram of market capitalization of the 937 corporations
listed on the American Stock Exchange in January 2000.

AMEX Listed Securities


800

Number of Issues
600

400

200

0
$100 $200 $300 $400 $500

Market Capitalization
($1,000,000)

Approximately ________ corporations had capitalizations of $200,000,000 or


less.

E A. 50
BCalc B. 100
C. 700
D. 800

B 100. Destiny Houston needs to prepared several graphics for a report on competition
and growth of Internet advertising. Which Excel feature is most useful for this
task?

E A. Solver
Term B. Chart Wizard
C. Data Analysis
D. Pivot Table
Quantitative methods for
Management
Model paper
• A survey in which customers taste five
different brands of ice cream, and rank their
favorites from 1 to 5, would be an example of
which type of scale of measurement?
– Ordinal
– Nominal
– Interval
– Ratio
• State whether the following question provided
is qualitative or quantitative data and
indicates the measurement scale appropriate -
What is your age?
– Qualitative, ratio
– Quantitative, ratio
– Qualitative, nominal
– Quantitative, ordinal
• Abel Alonzo, Director of Human Resources, is
exploring the causes of employee absenteeism at
Batesville Bottling during the last operating year
(January 1, 1999 through December 31, 1999). For
this study, the set of all employees who worked at
Batesville Bottling during the last operating year is
a(a)____________________.

 Population  Statistic  Parameter  Sample


• A student makes an 82 on the first test in a
statistics course. From this, she assumes that
her average at the end of the semester (after
other tests) will be about 82. This is an
example of (a)____________________.
 Descriptive statistics
 Inferential statistics
 Nonparametric statistics
 Wishful thinking
• A statistics instructor collects information about
the background of his students. About 30% have
taken economics and about 40% have taken
accounting. There are 23 male students and 27
female students in this class. This is an example
of(a)____________________.
 Descriptive statistics
 Inferential statistics
 Nominal data
 Nonparametric statistics
• Consider the following frequency distribution:-

What is the relative frequency of the first class?


(a) 0.15
(b) 0.30
(c) 0.10
(d) None of the above
• The 1999 and 2000 market share data of the three
competitors (A, B and C) in an oligopolistic industry are
presented in the following pie charts:-

• Which of the following is true?


(a) Only company B gained market share
(b) Company A lost market share
(c) Only company C lost market share
(d) Company B lost market share
• A statistics student made the following grades
on 5 tests: 84, 78, 88, 72, and 72. What is the
median grade?
(a) 78
(b) 80
(c) 88
(d) 72
• Let A be the event that a student is enrolled in an
accounting course and let S be the event that a
student is enrolled in a statistics course. It is
known that 30% of all students are enrolled in an
accounting course and 40% of all students are
enrolled in statistics. Included in these numbers
are 15% who are enrolled in both statistics and
accounting. Find P(S).
(a) 0.30
(b) 0.55
(c) 0.15
(d) 0.40
• Given P(A) = 0.40, P(B) = 0.50, P(A ∩ B) = 0.15.
Which of the following is true?
(a) A and B are collectively exhaustive
(b) A and B are mutually exclusive
(c) A and B are independent
(d) A and B are not independent
• Variables which take on values only at certain
points over a given interval are
called(a)____________________.
 Value variables  Continuous random
variables  Point variables O Discrete
random variables
• The amount of time a patient waits in a
doctor's office is an example
of(a)____________________.
(a)  A discrete random variable  The
binomial distribution  The normal
distribution  A continuous random variable
• Twenty five items are sampled. Each of these
has the same probability of being defective.
The probability that exactly 2 of the 25 are
defective could best be found by using
the(a)____________________.
(a)  Normal distribution  Poisson
distribution  Exponential
distribution  Binomial distribution
• If x is a binomial random variable with n=8
and p=0.2, the variance of x is
(a)____________________.
(a)  0.96  4  1.28  1.6
• The long-run average or mean of a Poisson
distribution is usually referred to as:-
(a) λ
(b) σ
(c) µ
(d) α
• The area to the left of the mean in any normal
distribution is(a)____________________.
 Equal to 0.5  Equal to the
mean  Equal to 1  Equal to the variance
• Let z be a normal random variable with mean
0 and standard deviation 1. Use the normal
tables to find P(z > 2.4).
(a) 0.4793
(b) 0.9918
(c) 0.0082
(d) 0.4918
• Regression was used to develop a model to
predict sales based on advertising dollars
spent. The equation developed is Y = 1,000 +
20X, where X is advertising dollars and Y is
sales. If $300 is spent on advertising, what
would be the best prediction for sales?
(a) $1,600
(b) $7,000
(c) $6,000
(d) None of the above
• Which of the following is an example of
nonprobabilistic sampling?
a. simple random sampling

b. stratified simple random sampling

c. cluster sampling

d. judgment sampling
• Stratified random sampling is a method of
selecting a sample in which
a. the sample is first divided into strata, and then
random samples are taken from each stratum
b. various strata are selected from the sample
c. the population is first divided into strata, and
then random samples are drawn from each
stratum
d. None of these alternatives is correct.
• A tabular representation of the payoffs for a
decision problem is a
• a. decision tree
• b. payoff table
• c. matrix
• d. sequential matrix
• For a decision alternative, the weighted
average of the payoffs is known as
a. the expected value of perfect information
b. the expected value
c. the expected probability
d. perfect information
• The number of degrees of freedom for the
appropriate chi-square distribution in a test of
independence is
• a. n-1
• b. K-1
• c. number of rows minus 1 times number of
columns minus 1
• d. a chi-square distribution is not used
• In order to determine whether or not a particular medication was
effective in curing the common cold, one group of patients was given the
medication, while another group received sugar pills. The results of the
study are shown below.We are interested in determining whether or not
the medication was effective in curing the common cold.The test statistic
is

Patients Cured Patients Not Cured


• Received medication 70 10
• Received sugar pills 20 50

• a. 10.08
• b. 54.02
• c. 1.96
• d. 1.645
• An assumption made about the value of a
population parameter is called a
• a. hypothesis
• b. conclusion
• c. confidence
• d. significance
• In a regression and correlation analysis if r2 =
1, then
• a. SSE = SST
• b. SSE = 1
• c. SSR = SSE
• d. SSR = SST
• If a data set has SSR = 400 and SSE = 100, then
the coefficient of determination is
• a. 0.10
• b. 0.25
• c. 0.40
• d. 0.80
• If the coefficient of correlation is 0.4, the
percentage of variation in the dependent
variable explained by the variation in the
independent variable
• a. is 40%
• b. is 16%.
• c. is 4%
• d. can be any positive value
As the sample size increases, the margin of error
• a. increases
• b. decreases
• c. stays the same
• d. increases or decreases depending on the
size of the mean
MODEL QUESTION PAPER- QUANTITATIVE METHODS FOR MANAGEMENT

1. State whether the following question provided is qualitative or quantitative data and
indicate the measurement scale appropriate -Are you a male or female?
a. Qualitative, ratio
b. Quantitative, ratio
c. Qualitative, nominal
d. Quantitative, ordinal

2. State whether the following question provided is qualitative or quantitative data and
indicate the measurement scale appropriate - How long have you been in your present
job or position
a. Qualitative, ratio
b. Quantitative, ratio
c. Qualitative, nominal
d. Quantitative, ordinal

3. What is the median of the following set of numbers? {12,8,13,4,7,6,3,3,15}

a) 12 b) 8 c) 7 d) 7.5

4. The hourly wages of a sample of 130 system analysts are given below.
mean = 60 range = 20

mode = 73 variance = 324

median = 74

The coefficient of variation equals

a. 0.30%

b. 30%

c. 5.4%

d. 54%

5. The variance of a sample of 169 observations equals 576. The standard deviation of the
sample equals

a. 13
b. 24

c. 576

d. 28,461

6. The median of a sample will always equal the

a. mode

b. mean

c. 50th percentile

d. all of the above answers are correct

7. A sample selected in such a manner that each sample of size n has the same probability of
being selected is

a. a convenience sample

b. a judgment sample

c. nonprobabilistic sampling

d. a simple random sample

8. Which of the following sampling methods is a probabilistic sampling method?

a. judgment sampling

b. convenience sampling

c. cluster sampling

d. None of these alternatives is correct.

9. An uncertain future event affecting the consequence, or payoff, associated with a decision is
known as

a. unconditional probability

b. unknown probability

c. chance event
d. uncertain probability

10. Nodes indicating points where a decision is made are known as

a. decision nodes

b. chance nodes

c. marginal nodes

d. conditional nodes

11. In computing an expected value (EV), the weights are

a. decision alternative probabilities

b. in pounds or some unit of weight

c. in dollars or some units of currency

d. the state-of-nature probabilities

12. Below you are given a payoff table involving three states of nature and two decision
alternatives.

Decision States of Nature

Alternative S1 S2 S3

A 80 45 -20

B 40 50 15

The probability that S1 will occur is 0.1; the probability that S2 will occur is 0.6. The
recommended decision based on the expected value criterion is

a. A

b. B

c. Both alternatives are the same.

d. None of these alternatives is correct.

13. The degrees of freedom for a contingency table with 6 rows and 3 columns is

a. 18

b. 15
c. 6

d. 10

14. If the coefficient of determination is a positive value, then the coefficient of correlation

a. must also be positive

b. must be zero

c. can be either negative or positive

d. must be larger than 1

15. Regression analysis is a statistical procedure for developing a mathematical equation that
describes how

a. one independent and one or more dependent variables are related

b. several independent and several dependent variables are related

c. one dependent and one or more independent variables are related

d. None of these alternatives is correct.

16. Given below are five observations collected in a regression study on two variables x
(independent variable) and y (dependent variable). Develop the least squares estimated
regression equation

x y

10 7

20 5

30 4

40 2

50 1

a. Y = 8.3-0.15x
b. Y= 9+0.15x
c. Y= 8.3+0.15x
d. 9-0.15x
17. As the sample size increases, the margin of error

a. increases

b. decreases
c. stays the same

d. increases or decreases depending on the size of the mean

18. In computing the standard error of the mean, the finite population correction factor is used
when

a. N/n > 0.05

b. N/n 0.05

c. n/N > 0.05

d. n/N 30

19. Z is a standard normal random variable. What is the value of Z if the area to the right of Z is
0.9803?

a. -2.06

b. 0.4803

c. 0.0997

d. 3.06

20. For a standard normal distribution, the probability of obtaining a z value between -2.4 to -2.0
is

a. 0.4000

b. 0.0146

c. 0.0400

d. 0.5000

21. For a standard normal distribution, the probability of obtaining a z value of less than 1.6 is

a. 0.1600

b. 0.0160

c. 0.0016

d. 0.9452
22. The number of electrical outages in a city varies from day to day. Assume that the number of
electrical outages (x) in the city has the following probability distribution.

x f(x)

0 0.80

1 0.15

2 0.04

3 0.01

The mean and the standard deviation for the number of electrical outages (respectively) are

a. 2.6 and 5.77

b. 0.26 and 0.577

c. 3 and 0.01

d. 0 and 0.8

23. Assume that you have a binomial experiment with p = 0.4 and a sample size of 50. The
variance of this distribution is

a. 20

b. 12

c. 3.46

d. 144

24. In a binomial experiment the probability of success is 0.06. What is the probability of two
successes in seven trials?

a. 0.0036

b. 0.0600

c. 0.0555

d. 0.2800

25. If P(A) = 0.62, P(B) = 0.47, and P(A  B) = 0.88, then P(A U B) =

a. 0.2914
b. 1.9700

c. 0.6700

d. 0.2100
Quantitative Methods in
Management
Term II
4 credits
MGT 408
Business Statistics
A First course
David M.Levine
Kathryn A.Szabat
David F.Stephan
P.K.Viswanathan
PEARSON PUBLICATIONS 7e
Additional Readings
• Statistics for Business and Economics- Anderson, Sweeney , Williams

• Business statistics- Ken Black

• Business statistics – J.K. Sharma


Assessment
• Assignments – ONE 40 marks

• End term Objective questions 60

( concepts and simple problems)


Subject Outline
• Introduction ch-1
• Data collection, classification and presentation ch-2
• Measures of central tendencies and dispersion ch-3
• Correlation and Regression analysis ch-12
• Probability concepts ch-04
• Probability distributions – Binomial and Poisson ch-05
• Probability distribution – Normal ch-06
• Sampling techniques ch-07
• Estimation and Inference statistics ch-08
• Testing of Hypothesis – Non Parametric (Chi square) ch-9, 11
• Bayesian Analysis and decision theory ch-15
Quantitative Methods in Management
• Introduction
• Definition
• Importance and limitations
• Applications
• Terminologies
• Scale of measurement
• Type of variables
• Qualitative, quantitative
• Time series and cross sectional
• Types of statistics
• Sources of data
• Classification of data
• Statistical software
INTRODUCTION
DAY 1
PAGES : 1-32
Introduction….
• Of the 18000 foodmakers, the largest 20 now account for
nearly 54% of checkout sales.
• The Consumer Price Index (CPI) declined 0.3% in April.
• The average compensation package for CEOs across 50
large corporations was Rs. 1 million.
• E-commerce sites spend an average of Rs. 5000 to
acquire each customer.
• Stocks account for 75% of the average investor’s
portfolio.
• The Hindu reaches 46% of the region’s households
during weekdays and 61% on Sundays
• CFO’s were asked as to which initiative they would
put on hold in an uncertain economy
• 32% Expansion
• 23% M&A
• 10% New Product Launch
• 18% Technology upgrade
• 9% None
• 8% Any other
Are These Numbers Useful In Making Decisions
• A survey of 1,179 adults 18 and over reported that 54% thought that 15
seconds was an acceptable online ad length before seeing free content.

• A survey reported women were more likely than men to cite seeing photos
or videos, sharing with man people at one, seeing entertaining or funny
posts, learning about ways to help others, and receiving support from
people in your network as reasons to use Facebook.

• A study found the number of times a specific product was mentioned in


comments in the Twitter social messaging service could be used to make
accurate predictions of sales trends for that product.
Without Statistics You Can’t

• Determine if the numbers in these studies are useful information

• Validate claims of predictability or causality

• See patterns that large amounts of data sometimes reveal


In Today’s Business World we
Cannot Escape From Data
• In today’s digital world ever increasing amounts of data are gathered,
stored, reported on, and available for further study.

• Data are facts about the world and are constantly reported as
numbers by an ever increasing number of sources.

• Information based decision making using statistical analysis is


absolutely essential in the present environment characterized by
intense competition, onslaught of new products and services,
globalization, and revolution of information technology.
Each Business Person Faces A Choice Of How To Deal
With This Explosion Of Data

• They can ignore it and hope for the best.

• They can count on other people’s summaries of data and hope they
are correct.

• They can develop their own capability and insight into data by
learning about statistics and its application to business.
Statistics Is Evolving So Businesses Can Use The
Vast Amount Of Data Available

The emerging field of Business Analytics makes


“extensive use of:
• Data
• Statistical and quantitative analysis
• Explanatory & predictive models
• Fact based management
to drive decisions and actions.”
DATA
Data- facts about the world ( a value associated with
something, or collective, a list of values associated with
something).

Decision
making
Knowledge
Information
DATA
What is Statistics?
“Statistics is a way to get information from data”

Statistics

Data Information
The word Statistics derived from the Latin word ‘status’
meaning a state
Statistics is a tool for creating new understanding from a set of
numbers.
Statistics – A way of thinking
Methods that allow to work with data effectively
Method which help to make better decisions
DEFINITION
STATISTICS
COLLECTION
COMPILATION
CLASSIFICATION
PRESENTATION
ANALYSIS &
INTERPRETATION OF DATA
Statistics
• Art and Science of Collecting and Understanding DATA:
• DATA = Recorded Information
• e.g., Sales, Productivity, Quality, Costs, Return, …
• Why? Because you want:
• Best use of imperfect information:
• e.g., 50,000 customers, 1,600 workers, 386,000 transactions,…
• Good decisions in uncertain conditions:
• e.g., new product launch: Fail? OK? Make you rich?
• Competitive Edge
• e.g., for you and your business!
To Properly Apply Statistics Follow A Framework To Minimize
Possible Errors
DCOVA

• Define the data you want to study in order to solve a


problem or meet an objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data collected to reach conclusions and
present results
Using The DCOVA Framework Helps To
Apply Statistics To:
• Summarize & visualize business data

• Reach conclusions from those data

• Make reliable predictions about business activities

• Improve business processes


Changing face of statistics
• Business analytics
• Use SM to analyze and explore data to uncover unforeseen relationships.
• Use MS methods to develop optimization models that impact an organization’s
strategy, planning, and operations

• Big data
• Collections of data that cannot be easily browsed or analyzed using traditional
methods.
• Use information systems’ methods to collect and process data sets of all sizes,
including very large data sets that would otherwise be hard to examine efficiently

• Integral role of software in statistics


The Growth Of “Big Data” Spurs The
Use Of Business Analytics
• “Big Data” is still a fuzzy concept.

• Very large data sets are arising because of the automatic


collection of high volumes of data at very fast rates.

• Attributes that distinguish “Big Data” from well structured


large data sets are “volume” of data, “velocity” of the data
collection, and “variety” of the data.
Business Analytics Has Already Been Applied In Many
Business Decision-Making Contexts

• Human resource managers (HR) understanding relationships


between HR drivers, key business outcomes, employee skills,
capabilities, and motivation.

• Financial analysts determining why certain trends occur to


predict future financial environments.

• Marketers driving loyalty programs and customer marketing


decisions to drive sales.

• Supply chain managers planning and forecasting based on


product distribution and optimizing sales distribution based
on key inventory measures.
Statistics: An Important Part of Your
Business Education
• You need analytical skills for the increasingly data-
driven environment of business.

• Studies show an increase in productivity, innovation,


and competitiveness for organizations that embrace
business analytics.

• To quote Hal Varian, the chief economist at Google


Inc., “the sexy job in the next 10 years will be
statisticians. And I’m not kidding.”
Activities of Statistics
1. Designing the study:
• First step
• Plan for data-gathering
• Random sample (control bias and error)

2. Exploring the data:


• First step (once you have data)
• Look at, describe, summarize the data
• Are you on the right track?
Activities of Statistics (continued)
3. Modeling the data
• A framework of assumptions and equations
• Parameters represent important aspects of the data
• Helps with estimation and hypothesis testing
4. Estimating an unknown:
• Best “guess” based on data
• Wrong - but by how much?
• Confidence interval - “we’re 95% sure that the unknown is between …”
Activities of Statistics (continued)
5. Hypothesis testing:
• Data decide between two possibilities
• Does “it” really work? [or is “it” just randomly better?]
• Is financial statement correct? [or is error material?]
• Whiter, brighter wash?
• Is the difference statistically significant?
Why a Manager Needs to Know about
Statistics
• To know how to properly present information
• To know how to draw conclusions about populations
based on sample information
• To know how to improve processes
• To know how to obtain reliable forecasts
Why Learn Statistics?
to make better sense of the ubiquitous use of numbers:
• Business memos
• Business research
• Technical reports
• Technical journals
• Newspaper articles
• Magazine articles
Statistical View of the World
• Data are imperfect
• We do the best we can -- Statistics helps!
• Events are random
• Can’t be right 100% of the time
• Use statistical methods
• Along with common sense and good judgment
• Be skeptical!
• Statistics can be used to support contradictory conclusions
• Look at who funded the study?
Applications in
Business and Economics
• Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of
marketing research applications.
Production
A variety of statistical quality control charts are used
to monitor the output of a production process.
Statistics in Business: Examples
• Advertising
• Effective? Which commercial? Which markets?
• Quality control
• Defect rate? Cost? Are improvements working?
• Finance
• Risk - How high? How to control? At what cost?
• Accounting
• Audit to check financial statements. Is error material?
• Other
• Economic forecasting, background info, measuring and controlling
productivity (human and machine), …
• IMPORTANCE OF STATISTICS • LIMITATIONS OF STATISTICS

• It simplifies complexity • Only quantitative data


• It measures periodic changes • Does not study individual
• Facts are properly presented events
• Formulation of policies • Results are true only on
• Enlarge human experience averages
and knowledge • Does not give importance to
• Helps in comparison all items
• Forecasting • Can be misused
• Testing a hypothesis • Single purpose only
Defining and collecting data
Chapter 1
TERMS AND
TERMINOLOGIES
Define - the variables that you want to study to solve a problem or meet an
objective
Collect - the data for those variables from appropriate sources.
* Data are the facts and figures collected, analyzed, and summarized for
presentation and interpretation.
• Data Set:
• Measurements of items
• e.g., Yearly sales volume for your 23 salespeople
• e.g., Cost and number produced, daily, for the past month
• All the data collected in a particular study are referred to as the data set for the study
• Elementary Units:
• The items being measured
• e.g., Salespeople, Days, Companies, Catalogs, …
• Elements are the entities on which data are collected
• A Variable:
• The type of measurement being done
• e.g., Sales volume, Cost, Productivity, Number of defects, …
• A variable is a characteristic of interest for the elements.
• The set of measurements obtained for a particular element is called an
observation
• A data set with n elements contains n observations

* The total number of data values in a complete data set is the number of elements
multiplied by the number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Variables

Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram NQ 73.10 0.86


EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40 0.33
Psychemedics N 17.60 0.13

Data Set
How Many Variables?
• Univariate data set: One variable measured for each
elementary unit
• e.g., Sales for the top 30 computer companies.
• Can do: Typical summary, diversity, special features
• Bivariate data set: Two variables
• e.g., Sales and # Employees for top 30 computer firms
• Can also do: relationship, prediction
• Multivariate data set: Three or more variables
• e.g., Sales, # Employees, Inventories, Profits, …
• Can also do: predict one from all other variables
Types of Variables
Categorical (qualitative) variables have values that can only be placed
into categories, such as “yes” and “no.”

Numerical (quantitative) variables have values that represent


quantities.

Time series or cross sectional data


LEVELS OF MEASUREMENTS
Scales of Measurement
Scales of measurement include:
Nominal Interval

Ordinal Ratio

The scale determines the amount of information


contained in the data.

The scale indicates the data summarization and


statistical analyses that are most appropriate.
Levels of Data Measurement

• Nominal — Lowest level of measurement


• Ordinal
• Interval
• Ratio — Highest level of measurement
Nominal Level Data
• Numbers are used to classify or categorize
• Data are labels or names used to identify an attribute of the element.
• A nonnumeric label or numeric code may be used.

• Students of a university are classified by the school in which they are enrolled
using a nonnumeric label such as Business, Humanities, Education, and so on.

• Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes
Business, 2 denotes Humanities, 3 denotes Education, and so on).

Example: Employment Classification


• 1 for Educator
• 2 for Construction Worker
• 3 for Manufacturing Worker
Example: Ethnicity
• 1 for African-American
• 2 for Anglo-American
• 3 for Hispanic-American
Ordinal Level Data
• Numbers are used to indicate rank or order
• Relative magnitude of numbers is meaningful
• Differences between numbers are not comparable
• The data have the properties of nominal data and the order or rank of the data is meaningful.
• A nonnumeric label or numeric code may be used.

Example: Ranking productivity of employees


Example: Taste test ranking of three brands of soft drink
Example: Position within an organization
• 1 for President
• 2 for Vice President
• 3 for Plant Manager
• 4 for Department Supervisor
• 5 for Employee

• Students of a university are classified by their class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.

• Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Example of Ordinal Measurement

1 f
6 i
2 n
4 i
s
3
5 h
Ordinal Data

Faculty and staff should receive preferential


treatment for parking space.

Strongly Agree Neutral Disagree Strongly


Agree Disagree

1 2 3 4 5
Numbers or Categories?
• Quantitative Variable: Meaningful numbers
• e.g., Sales, # Employees
• Can add, rank, count
• Qualitative Variable: Categories
• Ordinal Variable: Categories with meaningful ordering
• e.g., Bond rating (AA, A, B, …), Diamonds (VSI, SI, …)
• Can rank, count
• Nominal Variable: categories without meaningful ordering
• e.g., State, Type of business, Field of study
• Can count
Interval Level Data
• Distances between consecutive integers are equal
• The data have the properties of ordinal data, and the interval between
observations is expressed in terms of a fixed unit of measure.
• Interval data are always numeric.

• Relative magnitude of numbers is meaningful


• Differences between numbers are comparable
• Location of origin, zero, is arbitrary
• Vertical intercept of unit of measure transform function is not zero
• Example:
Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored
115 points more than Kevin.

Example: Fahrenheit Temperature


Example: Calendar Time
Example: Monetary Utility
Ratio Level Data
• Highest level of measurement
• Relative magnitude of numbers is meaningful
• Differences between numbers are comparable
• Location of origin, zero, is absolute (natural)
• Vertical intercept of unit of measure transform function is
zero

Examples: Height, Weight, and Volume


Example: Monetary Variables, such as Profit and Loss,
Revenues, and Expenses
Example: Financial ratios, such as P/E Ratio, Inventory Turnover,
and Quick Ratio.
• The data have all the properties of interval data and the ratio of
two values is meaningful.
• Variables such as distance, height, weight, and time use the
ratio scale.
• This scale must contain a zero value that indicates that nothing
exists for the variable at the zero point.
Categorical and Quantitative Data

Data can be further classified as being categorical


or quantitative.

The statistical analysis that is appropriate depends


on whether the data for the variable are categorical
or quantitative.

In general, there are more alternatives for statistical


analysis when the data are quantitative.
Categorical Data
Labels or names used to identify an attribute of
each element

Often referred to as qualitative data

Use either the nominal or ordinal scale of


measurement

Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited


Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Scales of Measurement

Data

Categorical Quantitative

Numeric Non-numeric Numeric

Nominal Ordinal Nominal Ordinal Interval Ratio


Types of Data
Data

Categorical Numerical

Examples:
Marital Status
Political Party Discrete Continuous
Eye Color
(Defined categories) Examples: Examples:
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured characteristics)
Example

Firm Sales Industry Group S&P Rating


IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-
Example (continued)
Multivariate Data (3 variables)
Firm Sales Industry Group S&P Rating
IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-

Elementar Quantitativ Nominal Ordinal


y e variable Qualitative Qualitative
units variable variable
Usage Potential of Various
Levels of Data
Ratio
Interval
Ordinal

Nominal
Data Level, Operations,
and Statistical Methods
Statistical
Data Level Meaningful Operations
Methods

Nominal Classifying and Counting Nonparametric

Ordinal All of the above plus Ranking Nonparametric

Interval All of the above plus Addition, Parametric


Subtraction, Multiplication, and
Division

Ratio All of the above Parametric


Cross-Sectional Data

Cross-sectional data are collected at the same or


approximately the same point in time.

Example: data detailing the number of building


permits issued in February 2010 in each of the
counties of Ohio
Time Series Data

Time series data are collected over several time


periods.

Example: data detailing the number of building


permits issued in Lucas County, Ohio in each of
the last 36 months
Time-Series or Cross-Sectional?
• Time-Series Data: Data values recorded in meaningful sequence
• Elementary units might be days or quarters or years
• e.g., Daily Dow-Jones stock market average close for the past 90 days
• e.g., Your firm’s quarterly sales over the past 5 years
• Cross-Sectional Data: No meaningful sequence
• e.g., Sales of 30 companies
• e.g., Productivity of each sales division
• Easier than time series!
Example
Year Unemployment Rate
2003 5.7%
2004 5.4%
2005 4.9%
2006 4.4%
2007 5.0%
2008 7.3%
2009 9.9%
2010 9.4%
Example
Year Unemployment Rate
2003 5.7%
2004 5.4%
2005 4.9%
2006 4.4%
2007 5.0%
2008 7.3%
2009 9.9%
2010 9.4%

Elementary unit
defined by “year” Quantitative data
Stock Market – Time Series
• Dow Jones Stock Index, monthly since 1928

Dow Jones Industrial Stock Market Index, Monthly from 1928 to early 2011

16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Basic Vocabulary of Statistics
Basic Vocabulary of Statistics

POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a numerical measure that describes a characteristic
of a population.

STATISTIC
A statistic is a numerical measure that describes a characteristic of
a sample.
Population vs. Sample
Population Sample

Measures used to describe the Measures computed from


population are called parameters sample data are called statistics
Population − the set of all elements of interest in a
particular study
Sample − a subset of the population

Statistical inference − the process of using data obtained


from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census − collecting data for the entire population

Sample survey − collecting data for a sample


Population Sample

Subset

Parameter Statistic
Populations have Parameters Samples have Statistics.
Descriptive measures of population descriptive measures of sample

A census is a complete enumeration of every item in a population.


Symbols for
Population Parameters
µ denotes population parameter

σ
2
denotes population variance
σ denotes population standard deviation
Symbols for
Sample Statistics

x denotes sample mean


2
S denotes sample variance
S denotes sample standard deviation
Types of statistics
Descriptive
inferential
Types of Statistics
• Statistics
• The branch of mathematics that transforms data into useful
information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, and Drawing conclusions and/or


describing data making decisions concerning a
population based only on sample
data
Descriptive Statistics

• Collect data
• e.g., Survey

• Present data
• e.g., Tables and graphs

• Characterize data
• e.g., Sample mean =
∑X i

n
Inferential Statistics
• Estimation
• e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
• e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a large group of individuals based on a subset of the
large group.
Descriptive Statistics
Most of the statistical information in newspapers,
magazines, company reports, and other
publications consists of data that are summarized
and presented in a form that is easy to
understand.

Such summaries of data, which may be tabular,


graphical, or numerical, are referred to as descriptive
statistics.
Probability
• “Inverse” of statistics
Statistics
The You
world see
Probability

• Statistics: generalizes from data to the world


• Probability: “What if …” Assuming you know how the world works, what data
are you likely to see?
• Examples of probability:
• Flip coin, stock market, future sales, IRS audit, …
• Foundation for statistical inference
Statistical Inference
Statistical inference is the process of making an estimate, prediction, or
decision about a population based on a sample.

Population

Sample

Inference

Statistic
Parameter

What can we infer about a Population’s Parameters


based on a Sample’s Statistics?
Process of Inferential Statistics
Calculate x
to estimate µ
Population Sample
µ x
(parameter) (statistic )

Select a
random sample
Sources of data collection
Collecting Data Correctly Is A Critical Task
DCOVA
Need to avoid data flawed by biases,
ambiguities, or other types of errors.

Results from flawed data will be suspect or in


error.

Even the most sophisticated statistical


methods are not very useful when the data is
flawed.
Developing Operational Definitions Is Crucial To Avoid
Confusion / Errors
DCOVA
• An operational definition is a clear and precise
statement that provides a common understanding of
meaning

• In the absence of an operational definition


miscommunications and errors are likely to occur.

• Arriving at operational definition(s) is a key part of


the Define step of DCOVA
Why to Collect Data?
A marketing research analyst needs to assess the effectiveness of a new television
advertisement.

A pharmaceutical manufacturer needs to determine whether a new drug is more


effective than those currently in use.

An operations manager wants to monitor a manufacturing process to find out whether


the quality of the product being manufactured is conforming to company standards.

An auditor wants to review the financial transactions of a company in order to


determine whether the company is in compliance with generally accepted accounting
principles.
Sources of Data
Primary Sources: The data collector is the one using the data for analysis
Data from a political survey
Data collected from an experiment
Observed data Production data from your factory
Your firm’s marketing studies

Secondary Sources: The person performing data analysis is not the data collector
Analyzing census data
Examining data from print journals or data published on the internet.
Government data: economics and demographics
Media reports – TV, newspapers, Internet
Companies that specialize in gathering data
Sources of data fall into five
categories DCOVA
• Data distributed by an organization or an individual

• The outcomes of a designed experiment

• The responses from a survey

• The results of conducting an observational study

• Data collected by ongoing business activities


Examples Of Data Distributed By
Organizations or Individuals
DCOVA
• Financial data on a company provided by investment
services.

• Industry or market data from market research firms


and trade associations.

• Stock prices, weather conditions, and sports


statistics in daily newspapers.
Examples of Data From A Designed
Experiment DCOVA
• Consumer testing of different versions of a product
to help determine which product should be pursued
further.

• Material testing to determine which supplier’s


material should be used in a product.

• Market testing on alternative product promotions to


determine which promotion to use more broadly.
Data Sources
• Statistical Studies - Experimental
In experimental studies the variable of interest is
first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.

The largest experimental study ever conducted is


believed to be the 1954 Public Health Service
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 1- 3) were selected.
Examples of Survey Data
DCOVA
• A survey asking people which laundry detergent has
the best stain-removing abilities

• Political polls of registered voters during political


campaigns.

• People being surveyed to determine their


satisfaction with a recent product or service
experience.
Examples of Data Collected From
Observational Studies DCOVA
• Market researchers utilizing focus groups to elicit
unstructured responses to open-ended questions.

• Measuring the time it takes for customers to be


served in a fast food establishment.

• Measuring the volume of traffic through an


intersection to determine if some form of advertising
at the intersection is justified.
Data Sources

Statistical Studies - Observational


In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest. a survey is a good example

Studies of smokers and nonsmokers are


observational studies because researchers
do not determine or control
who will smoke and who will not smoke.
Examples of Data Collected From
Ongoing Business Activities DCOVA
• A bank studies years of financial transactions to help
them identify patterns of fraud.

• Economists utilize data on searches done via Google


to help forecast future economic conditions.

• Marketing companies use tracking data to evaluate


the effectiveness of a web site.
Structured Data Follows An Organizing Principle &
Unstructured Data Does Not
DCOVA
• A Stock Ticker Provides Structured Data:
• The stock ticker repeatedly reports a company name, the number of shares last
traded, the bid price, and the percent change in the stock price.
• Due to their inherent structure, data from tables and forms are
structured data.
• E-mails from five people concerning stock trades is an example of
unstructured data.
• In these e-mails you cannot count on the information being shared in a specific
order or format.
• This book deals exclusively with structured data
All Of The Methods In our study Deal
With Structured Data
DCOVA

• To use the techniques in this book on unstructured


data you need to convert the unstructured into
structured data.

• For many of the questions you might want to


answer, the starting point can / will be tabular data.
Data Can Be Formatted and / or Encoded In
More Than One Way
DCOVA
• Some electronic formats are more readily usable
than others.

• Different encodings can impact the precision of


numerical variables and can also impact data
compatibility.

• As you identify and choose sources of data you need


to consider / deal with these issues
Data Cleaning Is Often A Necessary Activity
When Collecting Data
DCOVA
• Often find “irregularities” in the data
• Typographical or data entry errors
• Values that are impossible or undefined
• Missing values
• Outliers
• When found these irregularities should be reviewed
/ addressed
• Both Excel & Minitab can be used to address
irregularities
After Collection It Is Often Helpful To Recode
Some Variables
DCOVA
• Recoding a variable can either supplement or replace the
original variable.
• Recoding a categorical variable involves redefining categories.
• Recoding a quantitative variable involves changing this
variable into a categorical variable.
• When recoding be sure that the new categories are mutually
exclusive (categories do not overlap) and collectively
exhaustive (categories cover all possible values).
Data Acquisition Considerations
Time Requirement
• Searching for information can be time consuming.
• Information may no longer be useful by the time it
is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Examples of Types of Variables
DCOVA

Question Responses Variable Type

Do you have a Facebook


profile? Yes or No Categorical (Qualitative)

How many text messages Numerical


have you sent in the past --------------- (discrete)
three days?
How long did the mobile Numerical
app update take to --------------- (continuous)
download?
• For each of the following variables, determine whether the variable is
categorical or numerical. If the variable is numerical, determine whether
the variable is discrete or continuous.
• Number of cellphones in the household.
• Monthly data usage ( in MB)
• Number of text messages exchanged per month
• Voice usage per month ( in minutes)
• Whether the cellphone is used for email.
• Name of the internet service provider
• Time, in hours, spend surfing the internet per week
• Whether the individual uses a mobile phone to connect to the internet
• Number of online purchases made in a month
• Organizing the data … Editing/ Coding/
Data Mining
• Search for patterns in large data sets
• Businesses data: marketing, finance, production ...
• Collected for some purpose, often useful for others
• From government or private companies
• Makes use of
• Statistics – all the basic activities, and
• Prediction, classification, clustering
• Computer science – efficient algorithms (instructions) for
• Collecting, maintaining, organizing, analyzing data
• Optimization – calculations to achieve a goal
• Maximize or minimize (e.g. sales or costs)
Computers and Statistical Analysis

Statisticians often use computer software to perform


the statistical computations required with large
amounts of data.
To facilitate computer usage, many of the data sets
in this book are available on the website that
accompanies the text.
The data files may be downloaded in either Minitab
or Excel formats.
Also, the Excel add-in StatTools can be downloaded
from the website.
Chapter ending appendices cover the step-by-step
procedures for using Minitab, Excel, and StatTools.
Statistical software
• MS- EXCEL
• Minitab
• SAS
• SPSS
• StatTools

( chapter 1 pages : 1-32)


Quantitative Methods in
Management
Term II
4 credits
MGT 408
Recap..
• Introduction
• Definition
• Terms and terminologies
• Types of statistics
• Types of data
• Levels of measurements
• Application of statistics in
business
• A survey in which customers taste five
different brands of ice cream, and rank their
favorites from 1 to 5, would be an example of
which type of scale of measurement?
– Ordinal
– Nominal
– Interval
– Ratio
• State whether the following question provided
is qualitative or quantitative data and
indicates the measurement scale appropriate -
What is your age?
– Qualitative, ratio
– Quantitative, ratio
– Qualitative, nominal
– Quantitative, ordinal
• Abel Alonzo, Director of Human Resources, is
exploring the causes of employee absenteeism at
Batesville Bottling during the last operating year
(January 1, 1999 through December 31, 1999). For
this study, the set of all employees who worked at
Batesville Bottling during the last operating year is
a(a)____________________.

Population Statistic Parameter Sample


• A student makes an 82 on the first test in a
statistics course. From this, she assumes that
her average at the end of the semester (after
other tests) will be about 82. This is an
example of (a)____________________.
Descriptive statistics
Inferential statistics
Nonparametric statistics
Wishful thinking
• A statistics instructor collects information about
the background of his students. About 30% have
taken economics and about 40% have taken
accounting. There are 23 male students and 27
female students in this class. This is an example
of(a)____________________.
Descriptive statistics
Inferential statistics
Nominal data
Nonparametric statistics
• Abel Alonzo, Director of Human Resources, is
exploring the causes of employee
absenteeism at Batesville Bottling during the
last operating year (January 1, 1999 through
December 31, 1999). The average number of
absences per employee, computed from the
personnel data of all employees, is a
(a)____________________.
(a) O Parameter Population Sample
Statistic
• Pinky Bauer, Chief Financial Officer of Harrison
Haulers, Inc., suspects irregularities in the
payroll system, and orders an inspection of
"each and every payroll voucher issued since
January 1, 1991". Five percent of the payroll
vouchers contained material errors. This is an
example of(a)____________________.
(a) Nonparametric statistics Nominal
data Inferential statistics O Descriptive
statistics
Organizing and visualizing
variables
Chapter 2
Pg. 33-98
CLASSIFICATION OF DATA
• Qualitative
• Quantitative
• Geographical
• Chronological
– Time series (is a set of observations collected at usually
discrete and equally spaced time intervals- Eg. Daily closing
stock price of a certain stock recorded over the last six weeks
)
– Cross sectional (observations from different individuals or
groups at a single point in time – inventory of all ice creams
in stock at a particular store)
PRESENTATION OF DATA

TABULAR
DIAGRAMS
GRAPHS
• TABULATION

SPECIMEN OF A TABLE

Stub Caption Total


Stub Body of the table
Entries
Stub entries

Total Grand
Total

Foot Note
Sources
DESCRIPTIVE STATISTICS:
ORGANIZING AND VISUALIZING
VARIABLES
CHAPTER 2
Descriptive Statistics:
Tabular and Graphical
Presentations
• Summarizing Categorical Data
Summarizing Quantitative Data

Categorical data use labels or names


to identify categories of like items.

Quantitative data are numerical values


that indicate how much or how many.
Categorical Data Are
Organized By Utilizing Tables
DCOVA
Categorical
Data

Tallying Data

One Two
Categorical Categorical
Variable Variables

Summary Contingency
Table Table
Organizing Categorical Data: Summary Table
DCOVA
A summary table tallies the frequencies or percentages of items in a set
of categories so that you can see differences between categories.

Main Reason Young Adults Shop Online

Reason For Shopping Online? Percent


Better Prices 37%
Avoiding holiday crowds or hassles 29%
Convenience 18%
Better selection 13%
Ships directly 3%
Source: Data extracted and adapted from “Main Reason Young Adults Shop Online?”
USA Today, December 5, 2012, p. 1A.
Frequency Distribution

A frequency distribution is a tabular summary of


data showing the frequency (or number) of items
in each of several non-overlapping classes.

The objective is to provide insights about the data


that cannot be quickly obtained by looking only at
the original data.
Relative Frequency Distribution

The relative frequency of a class is the fraction or


proportion of the total number of data items
belonging to the class.

A relative frequency distribution is a tabular


summary of a set of data showing the relative
frequency for each class.
Percent Frequency
Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.

A percent frequency distribution is a tabular


summary of a set of data showing the percent
frequency for each class.
Frequency Distribution…
Example – 4 soft drinks – 15 households
Coke Pepsi 7 Up Coke Mirinda
Coke 7 Up 7 Up Coke Coke
Mirinda 7 Up Coke Mirinda
Coke

Drink Frequency
Coke 7
Pepsi 1
Mirinda 3
7 Up 4
Total 15
Frequency Distribution…
Soft Drink Frequency Relative Percent
frequency frequency
Coke 7 0.46 46
Pepsi 1 0.07 7
Mirinda 3 0.20 20
7 Up 4 0.27 27
Total 15 1.00 100
Frequency Distribution
Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average, below
average, or poor. The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Frequency Distribution

Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20 = .05
A Contingency Table Helps Organize Two or More
Categorical Variables
DCOVA
• Used to study patterns that may exist between
the responses of two or more categorical
variables

• Cross tabulates or tallies jointly the responses of


the categorical variables

• For two variables the tallies for one variable are


located in the rows and the tallies for the second
variable are located in the columns
Contingency Table -
Example DCOVA
• A random sample of 400
invoices is drawn. Contingency Table Showing
Frequency of Invoices Categorized
• Each invoice is categorized By Size and The Presence Of Errors
as a small, medium, or large No
amount. Errors Errors Total

• Each invoice is also Small 170 20 190


Amount
examined to identify if there
Medium 100 40 140
are any errors. Amount
• This data are then organized Large 65 5 70
in the contingency table to Amount
the right. 335 65 400
Total
Contingency Table Based On
Percentage Of Overall Total
No
DCOVA
Errors Errors Total 42.50% = 170 / 400
Small 170 20 190 25.00% = 100 / 400
Amount 16.25% = 65 / 400
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 42.50% 5.00% 47.50%
Total 335 65 400 Amount
Medium 25.00% 10.00% 35.00%
Amount
83.75% of sampled invoices
Large 16.25% 1.25% 17.50%
have no errors and 47.50% Amount
of sampled invoices are for Total 83.75% 16.25% 100.0%
small amounts.
Contingency Table Based On
Percentage of Row Totals
No
DCOVA
Errors Errors Total 89.47% = 170 / 190
Small 170 20 190 71.43% = 100 / 140
Amount 92.86% = 65 / 70
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 89.47% 10.53% 100.0%
Total 335 65 400 Amount
Medium 71.43% 28.57% 100.0%
Amount
Medium invoices have a
Large 92.86% 7.14% 100.0%
larger chance (28.57%) of Amount
having errors than small Total 83.75% 16.25% 100.0%
(10.53%) or large (7.14%)
invoices.
Contingency Table Based On
Percentage Of Column Totals
No
DCOVA
Errors Errors Total 50.75% = 170 / 335
Small 170 20 190 30.77% = 20 / 65
Amount
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 50.75% 30.77% 47.50%
Total 335 65 400 Amount
Medium 29.85% 61.54% 35.00%
Amount
There is a 61.54% chance
Large 19.40% 7.69% 17.50%
that invoices with errors are Amount
of medium size. Total 100.0% 100.0% 100.0%
Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Ungrouped Versus
Grouped Data
• Ungrouped data
• have not been summarized in any way
• are also called raw data
• Grouped data
• have been organized into a frequency
distribution
Tables Used For
Organizing
Numerical Data DCOVA

Numerical Data

Ordered Array Frequency Cumulative


Distributions Distributions
Organizing Numerical Data:
Ordered Array DCOVA
An ordered array is a sequence of data, in rank order, from the
smallest value to the largest value.
Shows range (minimum value to maximum value)
May help identify outliers (unusual observations)

Age of Day Students


Surveyed
16 17 17 18 18 18
College
Students 19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Organizing Numerical Data:
Frequency Distribution
DCOVA
The frequency distribution is a summary table in which the data are arranged into
numerically ordered classes.

You must give attention to selecting the appropriate number of class groupings for the
table, determining a suitable width of a class grouping, and establishing the
boundaries of each class grouping to avoid overlapping.

The number of classes depends on the number of values in the data. With a larger
number of values, typically there are more classes. In general, a frequency
distribution should have at least 5 but no more than 15 classes.

To determine the width of a class interval, you divide the range (Highest value–
Lowest value) of the data by the number of class groupings desired.
Organizing Numerical Data:
Frequency Distribution Example
DCOVA

Example: A manufacturer of insulation randomly selects 20


winter days and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53,
27
Organizing Numerical Data:
Frequency Distribution Example
DCOVA
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits):
Class 1: 10 but less than 20
Class 2: 20 but less than 30
Class 3: 30 but less than 40
Class 4: 40 but less than 50
Class 5: 50 but less than 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Organizing Numerical Data:
Frequency Distribution
Example
Data in ordered array:
DCOVA

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Midpoints Frequency

10 but less than 20 15 3


20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Total 20
Organizing Numerical Data: Relative & Percent
Frequency Distribution Example
DCOVA
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15%
20 but less than 30 6 .30 30%
30 but less than 40 5 .25 25%
40 but less than 50 4 .20 20%
50 but less than 60 2 .10 10%
Total 20 1.00 100%

Relative Frequency = Frequency / Total, e.g. 0.10 = 2 / 20


Organizing Numerical Data: Cumulative Frequency
Distribution Example
DCOVA
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15% 3 15%


20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%
50 but less than 60 2 10% 20 100%
Total 20 100 20
100%

Cumulative Percentage = Cumulative Frequency / Total * 100 e.g. 45% = 100*9/20


• LESS THAN CUMULATIVE FREQUENCY SERIES

NO. OF
HOURS
WORKERS

LESS THAN 10 5
LESS THAN 30 15
LESS THAN 60 30
LESS THAN 90 50
• MORE THAN CUMULATIVE FREQUENCY SERIES

PROFITS (RS. IN LAKHS) NO. OF COMPANIES

MORE THAN 100 150


MORE THAN 150 90
MORE THAN 200 40
MORE THAN 250 5
• INCLUSIVE CLASS INTERVAL

CLASS INTERVAL FREQUENCY

10 – 19 17
20 – 29 15
30 – 39 12
40 – 49 10
• EXCLUSIVE CLASS INTERVAL

NO. OF
REVENUE (RS.)
PRODUCTS
100 – 200 15
200 – 300 20
300 – 400 10
400 – 500 5
TOTAL 50
• OPEN END CLASS INTERVAL

SALARY (RS.) NO. OF CLERKS


LESS THAN 1500 10
1500 – 1700 25
1700 – 1900 45
1900 – 2100 11
MORE THAN 2100 9
TOTAL 100
Why Use a Frequency
Distribution? DCOVA
• It condenses the raw data into a more
useful form
• It allows for a quick visual interpretation
of the data
• It enables the determination of the
major characteristics of the data set
including where the data are
concentrated / clustered
Frequency Distributions:
Some Tips
DCOVA
• Different class boundaries may provide different pictures for the
same data (especially for smaller data sets)

• Shifts in data concentration may show up when different class


boundaries are chosen

• As the size of the data set increases, the impact of alterations in


the selection of class boundaries is greatly reduced

• When comparing two or more groups with different sample


sizes, you must use either a relative frequency or a percentage
distribution
FEW MORE EXAMPLES
Frequency Distribution

Example: Hudson Auto Repair


Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Relative Frequency and
Percent Frequency
Distributions
Example: Hudson Auto Repair

Parts Relative Percent


Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
2/50 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
multiplied
Total 1.00 100 by 100.
Relative Frequency and
Percent Frequency Distributions
Example: Hudson Auto Repair
Insights Gained from the % Frequency Distribution:
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.
Example of Ungrouped
Data
42 26 32 34 57

30 58 37 50 30

53 40 30 47 49
Ages of a Sample of
Managers from
50 40 32 31 40 Urban Child Care
52 28 23 35 25 Centers in the
United States
30 36 32 26 50

55 30 58 64 52

49 33 43 46 32

61 31 30 40 60

74 37 29 43 54
Frequency Distribution
of Child Care Manager’s
Ages
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Data Range
42 26 32 34 57 Range = Largest - Smallest
30 58 37 50 30

53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25

30 36 32 26 50

55 30 58 64 52 Smallest
49 33 43 46 32

61 31 30 40 60 Largest
74 37 29 43 54
Number of Classes and Class
Width
• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive
summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
51
Approximate Class Width = = 8.5
6
Class Width = 10
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
Cumulative Frequency
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
Class Midpoints, Relative Frequencies, and
Cumulative Frequencies

Relative Cumulative
Class Interval Frequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative
Frequencies
Cumulative
Relative Cumulative Relative
Class Interval Frequency Frequency Frequency Frequency
20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Cumulative Distributions

The last entry in a cumulative frequency distribution


always equals the total number of observations.
The last entry in a cumulative relative frequency
distribution always equals 1.00.
The last entry in a cumulative percent frequency
distribution always equals 100.
PRACTICE QUESTIONS
• The response to a question has three
alternatives: A, B and C. A sample of 120
responses provides 60 A, 24 B and 36 C. Show
the frequency and relative frequency
distributions.
• A partial relative frequency distribution is given .

Clas Relative frequency


s
A 0.22
B 0.18
C 0.40
D

• What is the relative frequency of class D?


• The total sample size is 200. what is the frequency of class
D?
• Show the frequency distribution.
• Show the percent frequency distribution
• A ________________is a tabular summary of
data showing the number of items in each of
several non overlapping classes.
– Frequency distribution
– Relative frequency
– Probability distribution
– Cumulative distribution
• When studying the simultaneous responses to
two categorical questions, you should set up a
• a) contingency table.
• b) frequency distribution table.
• c) cumulative percentage distribution table.
• d) histogram.
• In a cumulative relative frequency
distribution, the last class will have a
cumulative relative frequency equal to
• a. one
• b. zero
• c. the total number of elements in the data set
• d. None of these alternatives is correct.
Visualizing Categorical Data Through
Graphical Displays
DCOVA
Categorical
Data
Visualizing Data

Summary Contingency
Table For One Table For Two
Variable Variables

Bar Pareto Side By Side


Chart Chart Bar Chart
Pie Chart
Visualizing Categorical Data:
The Bar Chart
DCOVA
The bar chart visualizes a categorical variable as a series of bars. The
length of each bar represents either the frequency or percentage of values
for each category. Each bar is separated by a space called a gap.

Reason For Percent


Shopping Online?
Better Prices 37%
Avoiding holiday 29%
crowds or hassles
Convenience 18%
Better selection 13%
Ships directly 3%
Visualizing Categorical Data:
The Pie Chart
DCOVA
The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.

Reason For Shopping Percent


Online?
Better Prices 37%
Avoiding holiday crowds or 29%
hassles
Convenience 18%
Better selection 13%
Ships directly 3%
Visualizing Categorical Data:
The Pareto Chart
DCOVA
• Used to portray categorical data
• A vertical bar chart, where categories are shown in
descending order of frequency
• A cumulative polygon is shown in the same graph
• Used to separate the “vital few” from the “trivial
many”
Visualizing Categorical Data:
The Pareto Chart (con’t) DCOVA
Ordered Summary Table For Causes
Of Incomplete ATM Transactions
Cumulative
Cause Frequency Percent Percent
Warped card jammed 365 50.41% 50.41%
Card unreadable 234 32.32% 82.73%
ATM malfunctions 32 4.42% 87.15%
ATM out of cash 28 3.87% 91.02%
Invalid amount requested 23 3.18% 94.20%
Wrong keystroke 23 3.18% 97.38%
Lack of funds in account 19 2.62% 100.00%
Total 724 100.00%
Source: Data extracted from A. Bhalla, “Don’t Misuse the Pareto Principle,” Six Sigma Forum
Magazine, May 2009, pp. 15–18.
Visualizing Categorical Data:
The Pareto Chart (con’t) DCOVA

The “Vital
Few”
Visualizing Categorical Data:
Side By Side Bar Charts DCOVA
The side by side bar chart represents the data from a contingency table.

No
Errors Errors Total
Invoice Size Split Out By Errors
Small 50.75% 30.77% 47.50% & No Errors
Amount
Medium 29.85% 61.54% 35.00% Errors

Amount
Large 19.40% 7.69% 17.50% No Errors

Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
Total 100.0% 100.0% 100.0% Large Medium Small

Invoices with errors are much more likely to be of


medium size (61.54% vs 30.77% and 7.69%)
Visualizing Numerical Data By
Using Graphical Displays
DCOVA
Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Histogram Polygon Ogive
Display
Stem-and-Leaf Display
DCOVA

• A simple way to see how the data are distributed and


where concentrations of data exist

METHOD: Separate the sorted data series


into leading digits (the stems) and
the trailing digits (the leaves)
Organizing Numerical Data:
Stem and Leaf Display
DCOVA
A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves) branch
out to the right on each row.

Age of College Students

Age of Night Students Day Students Night Students


Surveyed
16 17 17 18 18 18 Stem Leaf
College Stem Leaf
Students 19 19 20 20 21 22
1 8899 1 67788899
22 25 27 32 38 42
Day Students 2 0138 2 0012257
18 18 19 19 20 21
3 23 3 28
23 28 32 33 41 45
4 2
4 15
Visualizing Numerical Data:
The Histogram
DCOVA
A vertical bar chart of the data in a frequency distribution is
called a histogram.

In a histogram there are no gaps between adjacent bars.

The class boundaries (or class midpoints) are shown on the


horizontal axis.

The vertical axis is either frequency, relative frequency, or


percentage.

The height of the bars represent the frequency, relative


frequency, or percentage.
Visualizing Numerical Data:
The Histogram
DCOVA
Relative
Class Frequency Percentage
Frequency

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10 8
Total 20 1.00 100 Daily High Temperature
6

Frequency
4
(In a percentage
histogram the
vertical axis would 2
be defined to show
the percentage of 0
observations per
class) 5 15 25 35 45 55
Visualizing Numerical Data:
The Polygon
DCOVA

A percentage polygon is formed by having the midpoint of


each class represent the data in that class and then connecting
the sequence of midpoints at their respective class
percentages.

The cumulative percentage polygon, or ogive, displays the


variable of interest along the X axis, and the cumulative
percentages along the Y axis.

Useful when there are two or more groups to compare.


Visualizing Numerical Data:
The Percentage Polygon DCOVA
Useful When Comparing Two or More Groups
Visualizing Numerical Data:
The Percentage Polygon
DCOVA
Visualizing Two Numerical Variables By Using
Graphical Displays
DCOVA

Two Numerical
Variables

Scatter Time-
Plot Series
Plot
Visualizing Two Numerical
Variables: The Scatter Plot
DCOVA
Scatter plots are used for numerical data consisting of paired
observations taken from two numerical variables

One variable is measured on the vertical axis and the other


variable is measured on the horizontal axis

Scatter plots are used to examine possible relationships


between two numerical variables
Scatter Plot Example
DCOVA

Volume Cost per


per day day
Cost per Day vs. Production Volume
23 125
26 140 250
29 146 200

C o s t p er D ay
33 160 150
38 167 100
42 170 50

50 188 0
20 30 40 50 60 70
55 195
Volume per Day
60 200
Visualizing Two Numerical
Variables: The Time Series Plot
DCOVA
• A Time-Series Plot is used to study patterns
in the values of a numeric variable over time

• The Time-Series Plot:


• Numeric variable is measured on the vertical
axis and the time period is measured on the
horizontal axis
Time Series Plot Example
Number of
DCOVA
Year Franchises
1996 43
1997 54
1998 60
1999 73
2000 82
2001 95
2002 107
2003 99
2004 95 Number of Franchises, 1996 - 2004
150
Franchises
Number of

100

50

0
1994 1996 1998 2000 2002 2004 2006
Year
Organizing Many Categorical Variables: The
Multidimensional Contingency Table
DCOVA
• A multidimensional contingency table is constructed by tallying
the responses of three or more categorical variables.

• In Excel creating a Pivot Table to yield an interactive display of


this type.

• While Minitab will not create an interactive table, it has many


specialized statistical & graphical procedures (not covered in this
book) to analyze & visualize multidimensional data.
Using Excel Pivot Tables To Organize & Visualize
Many Variables
DCOVA
A pivot table:
• Summarizes variables as a multidimensional summary table
• Allows interactive changing of the level of summarization and
formatting of the variables
• Allows you to interactively “slice” your data to summarize
subsets of data that meet specified criteria
• Can be used to discover possible patterns and relationships in
multidimensional data that simpler tables and charts would
fail to make apparent.
A Multidimensional Contingency Table Tallies
Responses Of Three or More Categorical Variables
DCOVA

Two Dimensional Table Showing Three Dimensional Table


The Mean 10 Year Return % Showing The Mean 10 Year
Broken Out By Type Of Fund & Return % Broken Out By Type
Risk Level Of Fund, Market Cap, &Risk
Level
Data Discovery Methods Can Yield
Initial Insights Into Data
DCOVA
• Data discovery are methods enable the performance
of preliminary analyses by manipulating interactive
summarizations
• Are used to:
• Take a closer look at historical or status data
• Review data for unusual values
• Uncover new patterns in data
• Drill-down is perhaps the simplest form of data
discovery
Drill-Down Reveals The Data Underlying A
Higher-Level Summary
DCOVA
Results of drilling down to
the details about small
market cap value funds with
low risk.
Some Data Discovery Methods Are
Primarily Visual DCOVA
• A treemap is such a method

• A treemap visualizes the comparison of two or more


variables using the size and color of rectangles to
represent values

• When used with one or more categorical variables it


forms a multilevel hierarchy or tree that can uncover
patterns among numerical variables.
An Example Of A Treemap
DCOVA
A treemap of the numerical variables assets (size) and 10-year
return percentage (color) for growth and value funds that have
small market capitalizations and low risk
The Challenges in Organizing and
Visualizing Variables DCOVA
• When organizing and visualizing data need to be
mindful of:
• The limits of others ability to perceive and comprehend
• Presentation issues that can undercut the usefulness of
methods from this chapter.
• It is easy to create summaries that
• Obscure the data or
• Create false impressions
An Example Of Obscuring Data, Information Overload
DCOVA
False Impressions Can Be Created In Many Ways
DCOVA
• Selective summarization
• Presenting only part of the data collected

• Improperly constructed charts


• Potential pie chart issues
• Improperly scaled axes
• A Y axis that does not begin at the origin or is a broken axis
missing intermediate values

• Chartjunk
An Example of Selective Summarization, These Two
Summarizations Tell Totally Different Stories

DCOVA

Change
from Prior
Company Year Company Year 1 Year 2 Year 3
A +7.2% A -22.6% -33.2% +7.2%
B +24.4% B -4.5% -41.9% +24.4%
C +24.9% C -18.5% -31.5% +24.9%
D +24.8% D -29.4% -48.1% +24.8%
E +12.5% E -1.9% -25.3% +12.5%
F +35.1% F -1.6% -37.8% +35.1%
G +29.7% G +7.4% -13.6% +29.7%
How Obvious Is It That Both Pie Charts Summarize The Same Data?

DCOVA

Why is it hard to tell? What would you do to improve?


Graphical Errors:
No Relative Basis DCOVA

Bad Presentation Good Presentation


A’s received by A’s received by
Freq. students. % students.
30%
300
200 20%

100 10%

0 0%
FR SO JR SR FR SO JR SR

FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior


Graphical Errors:
Compressing the Vertical Axis
DCOVA

Bad Presentation Good Presentation


Quarterly Sales Quarterly Sales
$ $
200 50

100 25

0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Graphical Errors: No Zero Point on the
Vertical Axis
DCOVA

Bad Presentation Good Presentations

Monthly Sales $ Monthly Sales


$ 45
45
42
42 39
39 36
36 0
J F M A M J J F M A M J

Graphing the first six months of sales


Graphical Errors: Chart Junk, Can
You Identify The Junk?
DCOVA
Bad Presentation Good Presentation
Graphical Errors: Chart Junk,
Can You Identify The Junk?
DCOVA

Bad Presentation Good Presentation

Minimum Wage Minimum Wage


1960: $1.00
$
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
In Excel It Is Easy To Inadvertently
Create Distortions
• Excel often will create a graph where the vertical axis
does not start at 0

• Excel offers the opportunity to turn simple charts


into 3-D charts and in the process can create
distorted images

• Unusual charts offered as choices by Excel will most


often create distorted images
Best Practices for Constructing
Visualizations DCOVA

Use the simplest possible visualization


Include a title
Label all axes
Include a scale for each axis if the chart contains axes
Begin the scale for a vertical axis at zero
Use a constant scale
Avoid 3D effects
Avoid chartjunk
Crosstabulations and Scatter
Diagrams
Thus far we have focused on methods that are used
to summarize the data for one variable at a time.
Often a manager is interested in tabular and
graphical methods that will help understand the
relationship between two variables.
Crosstabulation and a scatter diagram are two
methods for summarizing the data for two variables
simultaneously.
Crosstabulation

A crosstabulation is a tabular summary of data for


two variables.
Crosstabulation can be used when:
• one variable is qualitative and the other is
quantitative,
• both variables are qualitative, or
• both variables are quantitative.
The left and top margin labels define the classes for
the two variables.
Crosstabulation

Example: Finger Lakes Homes


The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
quantitative categorical
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100
Crosstabulation
Example: Finger Lakes Homes
Insights Gained from Preceding Crosstabulation
• The greatest number of homes (19) in the sample
are a split-level style and priced at less than
$200,000.
• Only three homes in the sample are an A-Frame
style and priced at $200,000 or more.
Crosstabulation
Frequency
Example: Finger Lakes Homes distribution
for the
price range
variable

Price Home Style


Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution for


the home style variable
Crosstabulation: Row or Column Percentages
Converting the entries in the table into row percentages or
column percentages can provide additional insight about the
relationship between the two variables.
Crosstabulation: Row Percentages

Example: Finger Lakes Homes

Price Home Style


Range Colonial Log Split A-Frame Total
< $200,000 32.73 10.91 34.55 21.82 100
> $200,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

(Colonial and > $200K)/(All > $200K) x 100 = (12/45) x 100


Crosstabulation: Column Percentages

Example: Finger Lakes Homes

Price Home Style


Range Colonial Log Split A-Frame
< $200,000 60.00 30.00 54.29 80.00
> $200,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

(Colonial and > $200K)/(All Colonial) x 100 = (12/30) x 100


Cross Tabulation
• Understanding relationship between 2 variables
• Example – Quality rating of meals of various prices at 10
restaurants

# Rating Price # Rating Price


1 Good 18 7 Excellent 19
2 Very Good 22 8 Very Good 11
3 Good 28 9 Good 23
4 Excellent 38 10 Very Good 13
5 Good 33 11 Excellent 18
6 Very Good 28 12 Excellent 33
Cross Tabulation…
• One variable is qualitative (Rating) and the other quantitative( Price) –
Row % included

Price
Rating 10 - 19 20 - 29 30 - 39 Total
Good 1 2 1 4
25% 50% 25% 100%

Very Good 2 2 0 4
50% 50% 100%

Excellent 2 0 2 4
50% 50% 100%

Total 5 4 3 12
Cross Tabulation …
Problem - In a study of job satisfaction for 4 occupations
– higher the scores indicate high satisfaction – Provide a
cross tab of occupation & satisfaction score

Lawyer 44 Comp Analyst 54 Lawyer 53

Doctor 80 Lawyer 42 Physiatrist 48

Lawyer 62 Physiatrist 59 Doctor 62

Physiatrist 55 Doctor 79 Lawyer 86

Lawyer 64 Physiatrist 76 Comp Analyst 79

Comp Analyst 73 Doctor 50 Physiatrist 60

Physiatrist 86 Comp Analyst 86 Doctor 52

Lawyer 71 Comp Analyst 50 Lawyer 79

Doctor 78 Physiatrist 76 Comp Analyst 69


• TWO WAY FREQUENCY SERIES/BIVARIATE SERIES

CLASS
0–5 5 – 10 10 – 15 15 – 20
INTERVAL
0 – 10 1 - 2 -
10 – 20 4 3 - -
20 – 30 - - 1 -
30 – 40 2 - 1 -
Scatter Diagram and Trendline

A scatter diagram is a graphical presentation of the


relationship between two quantitative variables.
One variable is shown on the horizontal axis and
the other variable is shown on the vertical axis.
The general pattern of the plotted points suggests
the overall relationship between the variables.
A trendline provides an approximation of the
relationship.
Scatter Diagram
A Positive Relationship
y

x
Scatter Diagram
A Negative Relationship
y

x
Scatter Diagram
No Apparent Relationship
y

x
Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
Scatter Diagram
y
35

Number of Points Scored


30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Example: Panthers Football Team

Insights Gained from the Preceding Scatter Diagram


• The scatter diagram indicates a positive relationship
between the number of interceptions and the
number of points scored.
• Higher points scored are associated with a higher
number of interceptions.
• The relationship is not perfect; all plotted points in
the scatter diagram are not on a straight line.
Scatter Diagram and Trendline

Scatter Diagram for the Panthers


35
30

Points Scored.
25
Number of 20
15
10
5
0
0 1 2 3 4
Number of Interceptions
Tabular and GraphicalDataMethods
Categorical Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

• Frequency • Bar Chart • Frequency • Dot Plot


Distribution • Pie Chart Distribution • Histogram
• Rel. Freq. Dist. • Rel. Freq. Dist. • Ogive
• Percent Freq. • % Freq. Dist. • Stem-and-
Distribution • Cum. Freq. Dist. Leaf Display
• Crosstabulation • Cum. Rel. Freq. • Scatter
Distribution Diagram
• Cum. % Freq.
Distribution
• Crosstabulation
Methods of Summarizing Data
Tabular Graphical
Presentation Presentation
Frequency Distribution Dot Plot, Line Chart
Relative Frequency Distribution Histogram
Percent Frequency Distribution Bar diagram, Pie
Cumulative Frequency Distribution Ogive, Freq polygon
Cum Relative Frequency Distribution Freq Curve
Cum Percent Frequency Distribution Stem & Leaf Display
Cross tabulation Scatter Diagram
Ogive

An ogive is a graph of a cumulative distribution.


The data values are shown on the horizontal axis.
Shown on the vertical axis are the:
• cumulative frequencies, or
• cumulative relative frequencies, or
• cumulative percent frequencies
The frequency (one of the above) of each class is
plotted as a point.
The plotted points are connected by straight lines.
Ogive

Hudson Auto Repair


• Because the class limits for the parts-cost data are
50-59, 60-69, and so on, there appear to be one-unit
gaps from 59 to 60, 69 to 70, and so on.
• These gaps are eliminated by plotting points
halfway between the class limits.
• Thus, 59.5 is used for the 50-59 class, 69.5 is used
for the 60-69 class, and so on.
Ogive with Cumulative Percent Frequencies

Example: Hudson Auto Repair

100 Tune-up Parts Cost


Cumulative Percent Frequency
80

60 (89.5, 76)

40

20
Parts
Cost ($)
50 60 70 80 90 100 110
Histogram – more insight
18
Tune-up Parts Cost
16
14
Frequency
12
10
8
6
4
2
Parts
50−59 60−69 70−79 80−89 90−99 100-110 Cost ($)
Histograms Showing Skewness

Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness

Moderately Skewed Left


• A longer tail to the left
• Example: exam scores
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness

Moderately Right Skewed


• A Longer tail to the right
• Example: housing values
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness

Highly Skewed Right


• A very long tail to the right
• Example: executive salaries
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Frequency Distribution…
Example – BMW manufactures racing cars and
has gathered the follg info on the number of
models of engines in different size categories used
in the racing market it serves.

Engine Size # of Engine Size # of


cu inches models cu inches models
101 – 150 1 301 – 350 17
151 – 200 7 351 – 400 16
201 – 250 7 401 – 450 15
251 – 300 8 451 – 500 7
Frequency Distribution…
- Construct a cumulative relative frequency distribution.

- 70% of engine models are larger than what size?

- What is the approx middle value in the original data set?

- If BMW designs a fuel injection system that can be used


on engines upto 400 cu inches, about what % of engine
models will not be able to use the system?
• END OF CHAPTER 2

Descriptive statistics: Tabular and Graphical Displays

(Page :33-98)
• Categorical data can be graphically represented by using a(n)
• a. histogram
• b. frequency polygon
• c. ogive
• d. bar chart
• A questionnaire provides 58 yes, 42 no and 20 no-opinion answers.
• In the construction of a pie chart , how many degrees would be in the section
of the pie showing the Yes answers?
• How many degrees would be in the section of the pie showing the No
answers.
Quantitative Methods in
Management
Day-4
Recap..
• Introduction
• Definition
• Terms and terminologies
– Population, sample, parameter, statistic, element, data,
datasets, variable,
• Types of statistics
• Types of data
• Types of variables – qualitative, quantitative and
time series
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing variables
• Tables
– Frequency distribution
– Relative frequency distribution
– Relative percent frequency distribution
– Cumulative frequency distribution
– Univariate
– Bivariate / cross tabulation
• Diagrams
– Bar charts
– Pie charts
• Graphs
– Histogram
– Frequency polygon
– Frequency curve
– Cumulative frequency curve ( Ogive)
• EDA
– Stem and leaf plot
– Scatter diagram
– Dot plots
– Pareto chart
Numerical descriptive statistics

Day 3
Pg. 99-148
.
.
. . ……. . . . .. . ..
. . . .

. . . . . . . … …..

. ....… . . .. . .
..
.
.
..
.
• MCT .
. . ……. . . . .. . ..
. . . .

. ......… X …..

. ....… . . .. . .
..
.
.
..
.
• MCT .
. . ……. . . . .. . ..
. . . .

• MD . ......… X …..

. ....… . . .. . .
..
.
.
..
.
• MCT .
. . ……. . . . .. . ..
. . . .

• MD . ......… X …..

. ....… . . .. . .
..
• Positive or .
.
Negative (SKEW) ..
Objectives

In this chapter, you learn to:


• Describe the properties of central tendency,
variation, and shape in numerical data
• Construct and interpret a boxplot
• Compute descriptive summary measures for a
population
• Calculate the covariance and the coefficient of
correlation
Summary Definitions
DCOVA
The central tendency is the extent to which the
values of a numerical variable group around a
typical or central value.

The variation is the amount of dispersion or


scattering away from a central value that the
values of a numerical variable show.

The shape is the pattern of the distribution of


values from the lowest value to the highest
value.
Summarization of data
• Measures of central tendencies
– AM, WM, GM
– Positional averages – median, percentiles, quartiles
– Mode
– Empirical formula
• Measures of dispersion
– Range
– Quartile deviation
– Mean deviation
– Standard deviation
– Variance
– Coefficient of variation RAW DATA
Arithmetic Mean
• Commonly called ‘the mean’
• is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including
extreme values
• Computed by summing all values in the data set and
dividing the sum by the number of values in the data
set
• It is possible to find the average, if we know the
aggregate and number of items, not necessarily to
know the value of the individual
Measures of Central Tendency:
The Mean DCOVA

• The arithmetic mean (often just called the


“mean”) is the most common measure of
central tendency
Pronounced x-bar The ith value

– For a sample
n of size n:

∑X i
X1 + X 2 + L + Xn
X= i=1
=
n n
Sample size Observed values
Population Mean

µ=
∑ X
= X +X
1 2
+ X 3
+ ... + X N
N N
24 + 13 + 19 + 26 + 11
=
5
93
=
5
= 18 . 6
Measures of Central Tendency:
The Mean (con’t) DCOVA

• The most common measure of central tendency


• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)

11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Mean = 13 Mean = 14

11 + 12 + 13 + 14 + 15 65 11 + 12 + 13 + 14 + 20 70
= = 13 = = 14
5 5 5 5
Properties of AM
• Sum of deviations from AM is ZERO
• Sum of squares of deviation taken from AM
will be minimum
• Combined mean
• It is affected by change of scale and change of
origin
Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean.
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
Weighted Mean

x= ∑ wxi i

∑w i

where:
xi = value of observation i
wi = weight for observation i
Weighted mean
Purchase Cost per Number of
Pound($) pounds
1 3.00 1200
2 3.40 500
3 2.80 2750
4 2.90 1000
5 3.25 800

• WM= 2.96 AM=$3.07


(mean cost per pound for the raw material is
$2.96)
Geometric mean
• Used in analyzing growth rates in financial
data.
• nth root of the product of n values.
Median
• Middle value in an ordered array of numbers.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
• Unaffected by extremely large and extremely
small values.
Median: Computational Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median is
the middle term of the ordered array.
– If there is an even number of terms, the median is
the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is given
by (n+1)/2.
Measures of Central Tendency:
The Median DCOVA

• In an ordered array, the median is the “middle”


number (50% above, 50% below)

11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20

Median = 13 Median = 13

• Less sensitive than the mean to extreme values


Measures of Central Tendency:
Locating the Median
DCOVA
• The location of the median when the values are in numerical order
(smallest to largest):

n +1
Median position = position in the ordered data
2
• If the number of values is odd, the median is the middle number

• If the number of values is even, the median is the average of the two
middle numbers

Note that
n + 1 is not the value of the median, only the position of
2
the median in the ranked data
Percentiles
• Measures of central tendency that divide a group
of data into 100 parts
• At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data lie
above the nth percentile

• Example: 90th percentile indicates that at least


90% of the data lie below it, and at most 10% of
the data lie above it
• The median and the 50th percentile have the same
value.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
Percentiles: Computational Procedure
• Organize the data into an ascending ordered
array.
• Calculate the
P
percentile location:
i= (n)
100
• Determine the percentile’s location and its value.

• If i is a whole number, the percentile is the


average of the values at the i and (i+1) positions.

• If i is not a whole number, the percentile is at


the (i+1) position in the ordered array.
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Location of
30
30th percentile: i = (8) = 2.4
100
• The location index, i, is not a whole number; i+1 =
2.4+1=3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array;
the 30th percentile is 13.
Quartiles
• Measures of central tendency that divide a group of
data into four subgroups

• Q1: 25% of the data set is below the first quartile


• Q2: 50% of the data set is below the second quartile
• Q3: 75% of the data set is below the third quartile

• Q1 is equal to the 25th percentile


• Q2 is located at 50th percentile and equals the
median
• Q3 is equal to the 75th percentile
• Quartile values are not necessarily members of the
data set
Quartiles

Q1 Q2 Q3

25% 25% 25% 25%


Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
Q1 25 109 + 114
i= (8 ) = 2 Q1 = = 111 .5
100 2

50 116+121
Q2: i= (8) = 4 Q2 = = 1185
.
100 2
75 122+125
Q3: i= (8) = 6 Q3 = = 1235
.
100 2
Measures of Central Tendency:
The Mode
DCOVA
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical
data
• There may be no mode
• There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

Mode = 9 No Mode
Mode
• The most frequently occurring value in a data
set
• Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)

• Bimodal -- Data sets that have two modes


• Multimodal -- Data sets that contain more
than two modes
Mode -- Example
• The mode is 44.
35 41 44 45
• There are more 44s
than any other value. 37 41 44 46

37 43 44 46

39 43 44 46

40 43 44 46

40 43 45 48
Measures of Central Tendency:
Review Example
DCOVA
House Prices: Mean: ($3,000,000/5)
$2,000,000 = $600,000
$ 500,000
$ 300,000
Median: middle value of ranked
$ 100,000 data
$ 100,000 = $300,000
Sum $ 3,000,000 Mode: most frequent value
= $100,000
Measures of Central Tendency:
Which Measure to Choose?
DCOVA
The mean is generally used, unless extreme
values (outliers) exist.
The median is often used, since the median is
not sensitive to extreme values. For example,
median home prices may be reported for a
region; it is less sensitive to outliers.
In some situations it makes sense to report
both the mean and the median.
Measures of Central Tendency:
Summary
DCOVA
Central Tendency

Arithmetic Median Mode


Mean
n

∑X i
X= i=1
n Middle value in Most
the ordered frequently
array observed
value
Empirical formula

MODE = 3 MEDIAN – 2 MEAN


Problem
• The cost of consumer purchases such as single family
housing, gasoline, internet services, tax preparation ,
and hospitalization were provided in The Wall Street
journal. Sample data typical of the cost of tax return
preparation by services such as H&R block are shown
below
120 230 110 115 160 130 150
105 195 155 105 360 120 120
140 100 115 180 235 255
- Compute the mean, median and mode
- Compute the first and third quartiles
- Compute and interpret the 90th percentile
Measures of Variability
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we


might consider not only the average delivery time for
each, but also the variability in delivery time for each.
Variability
No Variability in Cash Flow Mean
Mean

Variability in Cash Flow Mean


Mean
Variability

Variability

No Variability
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread or the
dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
Measures of Variation
Variation DCOVA

Range Variance Standard Coefficient of


Deviation Variation

Measures of variation give information


on the spread or variability or
dispersion of the data values.

Same center,
different variation
Measures of Variation:
The Range
DCOVA
Simplest measure of variation
Difference between the largest and the smallest values:

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12
Measures of Variation:
Why The Range Can Be Misleading
DCOVA
Does not account for how the data are
distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

Sensitive to outliers Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 120 - 1 = 119


Range
• The difference between the largest and the
smallest values in a set of data
• Simple to compute 35 41 44 45

• Ignores all data points except 37 41 the 44 46


two extremes
37 43 44 46
• Example:
Range 39 43 = 44 46
Largest - Smallest =
48 - 35 = 13 40 43 44 46

40 43 45 48
Interquartile Range

• Range of values between the first and third


quartiles
• Range of the “middle half”
• Less influenced by extremes

Interquartile Range = Q 3 − Q1
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:
µ=
∑ X 65 = = 13
N 5
• Deviations from the mean: -8, -4, 3, 4, 5
+5
-4 +4
-8 +3

0 5 10 15 20

µ
Mean Absolute Deviation
• Average of the absolute deviations from the
mean
X X − µ X − µ ∑ X −µ
M . A. D. =
5 -8 +8 N
9 -4 +4
16 +3 +3 24
17 +4 +4 =
18 +5 +5 5
0 24 = 4.8
Population Variance
• Average of the squared deviations from the
arithmetic mean

X − µ (X
X )
∑ (X − µ )
2

−µ 2

σ
2
5 -8 64 =
9 -4 16 N
16 +3 9
130
=
17 +4 16
18 +5 25
0 130
5
= 2 6 .0
Population Standard Deviation
• Square root of the
variance
∑ (X − µ )
2

X − µ (X ) σ
2
X −µ
2
=
N
5 -8 64 130
9 -4 16 =
16 +3 9 5
17
18
+4
+5
16
25 = 2 6 .0
0 130
σ = σ
2

= 2 6 .0
= 5 .1
Measures of Variation:
The Sample Variance
DCOVA
• Average (approximately) of squared
deviations of values from the mean
n

– Sample variance: ∑ (X − X)
i
2

S =2 i=1
n -1
Where X= arithmetic mean
n = sample size
Xi = ith value of the variable X
Measures of Variation:
The Sample Standard Deviation
DCOVA
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
n

– Sample standard deviation:


∑ (X − X)
i
2

S= i=1
n -1
Measures of Variation:
The Standard Deviation
DCOVA
Steps for Computing Standard Deviation

1. Compute the difference between each value


and the mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample
variance.
5. Take the square root of the sample variance to
get the sample standard deviation.
Measures of Variation:
Sample Standard Deviation:
Calculation Example
DCOVA
Sample
Data (Xi) : 10 12 14 15 17 18 18 24

n=8 Mean = X = 16

(10 − X)2 + (12 − X)2 + (14 − X)2 + L + (24 − X)2


S=
n −1

(10 − 16)2 + (12 − 16)2 + (14 − 16)2 + L + (24 − 16)2


=
8 −1

A measure of the “average” scatter


130
= = 4.3095 around the mean
7
Measures of Variation:
Comparing Standard Deviations
DCOVA
Data A
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
S = 3.338

Data B Mean = 15.5


S = 0.926
11 12 13 14 15 16 17 18 19 20 21

Data C Mean = 15.5


S = 4.567
11 12 13 14 15 16 17 18 19 20 21
Measures of Variation:
Comparing Standard Deviations
DCOVA

Smaller standard deviation

Larger standard deviation


Uses of Standard Deviation
• Indicator of financial risk
• Quality Control
– construction of quality control charts
– process capability studies
• Comparing populations
– household incomes in two cities
– employee absenteeism at two plants
Measures of Variation:
Summary Characteristics
DCOVA
The more the data are spread out, the greater the
range, variance, and standard deviation.

The more the data are concentrated, the smaller


the range, variance, and standard deviation.

If the values are all the same (no variation), all


these measures will be zero.

None of these measures are ever negative.


Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial µ σ
Security

A 15% 3%
B 15% 7%

3-60
Measures of Variation:
The Coefficient of Variation
DCOVA
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare the variability of two or
more sets of data measured in different units

 S
CV =   ⋅ 100%

X 
Measures of Variation:
Comparing Coefficients of Variation
DCOVA
• Stock A:
– Average price last year = $50
– Standard deviation = $5
S $5
CVA =   ⋅ 100% = ⋅ 100% = 10%
X $50 Both stocks have
• Stock B: the same
standard
– Average price last year = $100 deviation, but
stock B is less
– Standard deviation = $5 variable relative
to its price
S $5
CVB =   ⋅ 100% = ⋅ 100% = 5%
X $100
Measures of Variation:
Comparing Coefficients of Variation (con’t)

• Stock A:
DCOVA
– Average price last year = $50
– Standard deviation = $5
S $5
CVA =   ⋅ 100% =
  ⋅ 100% = 10%
X $50 Stock C has a
• Stock C: much smaller
standard
– Average price last year = $8 deviation but a
much higher
– Standard deviation = $2 coefficient of
variation
 S  $2
CVC =   ⋅ 100% = ⋅ 100% = 25%

X  $8
Coefficient of Variation
µ = 29
1
µ = 84
2

σ 1
= 4.6 σ 2
= 10
σ (100) σ (100)
. .=µ
CV 1
1
. .=µ
CV 2
2

1 2

4.6 10
= (100) = (100)
29 84
= 1586
. = 1190
.
Measures of shapes

skewness
Shape of a Distribution
DCOVA

• Describes how data are distributed


• Two useful shape related statistics are:
– Skewness
• Measures the extent to which data values are not
symmetrical
– Kurtosis
• Kurtosis affects the peakedness of the curve of the
distribution—that is, how sharply the curve rises
approaching the center of the distribution
Shape of a Distribution (Skewness)
DCOVA
• Measures the extent to which data is not
symmetrical

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean

Skewness
<0 0 >0
Statistic
Skewness

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
Skewness

Mean Mode Mean Mean


Mode
Median
Median Mode Median

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
Coefficient of Skewness
• Summary measure for skewness
3( µ − Md )
S=
σ
• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).
• >1 or <-1 high degree of skewness
• 0.5 to 1 or -0.5 to -1 moderate skewness
• 0.5 and -0.5 relative symmetry
Distribution Shape: Skewness ( FOR PRACTICE)

Example: Apartment Rents

Seventy efficiency apartments were randomly sampled


in a college town. The monthly rent prices for the
apartments are listed below in ascending order.

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Distribution Shape: Skewness

Example: Apartment Rents

.35 Skewness = .92


.30
Relative Frequency

.25

.20

.15

.10
.05
0
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out

Leptokurtic

Mesokurtic
Platykurtic
Shape of a Distribution -- Kurtosis measures how sharply the
curve rises approaching the center of the distribution

DCOVA
Sharper Peak
Than Bell-Shaped
(Kurtosis > 0)

Bell-Shaped
(Kurtosis = 0)

Flatter Than
Bell-Shaped
(Kurtosis < 0)
RELATIVE LOCATION

Z score
Chebyshev's inequality
Empirical rule
Relative location – Z score
* In addition to measures of location, variability, and
shape, we are also interested in the relative location of
values within a data set.

* Measures of relative location help us determine how


far a particular value is from the mean.

* By using both the mean and standard deviation, we


can determine the relative location of any
observation.
z-Scores

The z-score is often called the standardized value.

It denotes the number of standard deviations a data


value xi is from the mean.

x −x
zi = i
s

Excel’s STANDARDIZE function can be used to


compute the z-score.
Locating Extreme Outliers:
Z-Score DCOVA
To compute the Z-score of a data value, subtract the mean
and divide by the standard deviation.

The Z-score is the number of standard deviations a data


value is from the mean.

A data value is considered an extreme outlier if its Z-score


is less than -3.0 or greater than +3.0.

The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Locating Extreme Outliers:
Z-Score DCOVA
X−X
Z=
S

where X represents the data value


X is the sample mean
S is the sample standard
deviation
Locating Extreme Outliers:
Z-Score DCOVA
Suppose the mean math SAT score is 490, with a
standard deviation of 100.
Compute the Z-score for a test score of 620.

X − X 620 − 490 130


Z= = = = 1.3
S 100 100

A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
z-Scores

An observation’s z-score is a measure of the relative


location of the observation in a data set.
A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a z-score greater than zero.
A data value equal to the sample mean will have a
z-score of zero.
Distribution Shape: Skewness ( FOR PRACTICE)

Example: Apartment Rents

Seventy efficiency apartments were randomly sampled


in a college town. The monthly rent prices for the
apartments are listed below in ascending order.

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
z-Scores
Example: Apartment Rents
• z-Score of Smallest Value (425)
xi − x 425 − 490.80
z= = = − 1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Detecting Outliers
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set
Detecting Outliers
FOR PRACTICE
Example: Apartment Rents
• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Empirical Rule

When the data are believed to approximate a


bell-shaped distribution …

The empirical rule can be used to determine the


percentage of data values that must be within a
specified number of standard deviations of the
mean.

The empirical rule is based on the normal


distribution, which is covered in later chapter.
Empirical Rule
For data having a bell-shaped
68.26% of the values of a normal random variable
distribution:
are within
+/- 1 standard deviation of its mean.

95.44% of the values of a normal random variable


are within
+/- 2 standard deviations of its mean.

99.72% of the values of a normal random variable


are within
+/- 3 standard deviations of its mean.
Empirical Rule
99.72%
95.44%
68.26%

µ
x
µ – 3σ µ – 1σ µ + 1σ µ + 3σ
µ – 2σ µ + 2σ
The Empirical Rule

• The empirical rule approximates the variation


of data in a bell-shaped distribution
• Approximately 68% of the data in a bell
shaped distribution is within 1 standard
deviation of the mean
or µ±1σ

68%

µ
µ±1σ
The Empirical Rule
• Approximately 95% of the data in a bell-
shaped distribution lies within two standard
deviations of the mean, or µ ± 2σ
• Approximately 99.7% of the data in a bell-
shaped distribution lies within three standard
deviations of the mean, or µ ± 3σ

95% 99.7
%

µ±2σ µ±3σ
Using the Empirical Rule
Suppose that the variable Math SAT scores is bell-shaped with a mean of 500 and a
standard deviation of 90. Then,

68% of all test takers scored between 410 and 590


(500 ± 90).

95% of all test takers scored between 320 and 680


(500 ± 180).

99.7% of all test takers scored between 230 and


770 (500 ± 270).
Chebyshev’s
Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.

Chebyshev’s theorem requires z > 1, but z need not


be an integer.
Chebyshev’s Theorem

At least 75% of the data values must be


within z = 2 standard deviationsof the mean.

At least 89% of the data values must be


within z = 3 standard deviationsof the mean.

At least 94% of the data values must be


within z = 4 standard deviationsof the mean.
Chebyshev’s
Theorem
Example: Apartment Rents
Let z = 1.5 with x = 490.80 and s = 54.74

At least (1 − 1/(1.5)2) = 1 − 0.44 = 0.56 or 56%


of the rent values must be between
x - z(s) = 490.80 − 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573

(Actually, 86% of the rent values


are between 409 and 573.)
Chebyshev Rule

• Regardless of how the data are distributed, at least (1 -


1/k2) x 100% of the values will fall within k standard
deviations of the mean (for k > 1)

– Examples:
At withi
least n

(1 - 1/22) x 100% = 75% …........ k=2 (μ ± 2σ)


(1 - 1/32) x 100% = 89% ………. k=3 (μ ± 3σ)
EXPLORATORY DATA ANALYSIS

FIVE NUMBER SUMMARY


BOX PLOT
Exploratory Data
Analysis
Exploratory data analysis procedures enable us to use
simple arithmetic and easy-to-draw pictures to
summarize data.

We simply sort the data values into ascending order


and identify the five-number summary and then
construct a box plot.
FIVE NUMBER SUMMARY
1. MINIMUM
2. QUARTILE 1
3. MEDIAN
4. QUARTILE 3
5. MAXIMUM
Computing the five number summary
• 80,100,100,110,130,190,200

• Q1= 100 Q3 = 190 Min = 80 Max


200

( for small sample size ; conflicting results may


occur , the shape cannot be clearly determined.)
• The monthly starting salaries for a sample of 12 business school
graduates are given below ( in ascending order)
3310 3355 3450 3480 3480
3490 3520 3540 3550 3650
3730 3925

FIVE NUMBER SUMMARY ARE


Min = 3310
Q1 = 3465
Médian = 3505
Q3 = 3600
Maximum = 3925
• The data shows a smallest
value of 3310 and a largest
value of 3925.

• Approximately one-fourth, or
25%, of the observations are
between adjacent numbers
in a five-number summary.
Box Plot

A box plot is a graphical summary of data that is


based on a five-number summary.

A key to the development of a box plot is the


computation of the median and the quartiles Q1 and
Q3.

Box plots provide another way to identify outliers.


Box and Whisker Plot

• Five secific values are used:


– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot

Minimum Q1 Q2 Q3 Maximum
Five-Number
Summary
Example: Apartment Rents
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot

Example: Apartment Rents


• A box is drawn with its ends located at the first and
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).

400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475
Box Plot

Limits are located (not drawn) using


the interquartile range (IQR).
Data outside these limits are
considered outliers.
The locations of each outlier is shown
with the symbol * .
continued
• LL = Q1 -1.5 (IQR)
• UL = Q3+1.5(IQR)

• If x < LL or x > UL ;;; x is an outlier


Box Plot

Example: Apartment Rents


• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.
General Descriptive Stats
Using Microsoft Excel
DCOVA
Functions
House Prices Descriptive Statistics
$ 2,000,000 Mean $ 600,000 =AVERAGE(A2:A6)
$ 500,000 Standard Error $ 357,770.88 =D6/SQRT(D14)
$ 300,000 Median $ 300,000 =MEDIAN(A2:A6)
$ 100,000 Mode $ 100,000.00 =MODE(A2:A6)
$ 100,000 Standard Deviation $ 800,000 =STDEV(A2:A6)
Sample Variance 640,000,000,000 =VAR(A2:A6)
Kurtosis 4.1301 =KURT(A2:A6)
Skewness 2.0068 =SKEW(A2:A6)
Range $ 1,900,000 =D12 - D11
Minimum $ 100,000 =MIN(A2:A6)
Maximum $ 2,000,000 =MAX(A2:A6)
Sum $ 3,000,000 =SUM(A2:A6)
Count 5 =COUNT(A2:A6)
salary
salary

3310
Mean 3540
3355
Standard Error 47.81989569
3450
Median 3505
3480
Mode 3480
3480 Standard Deviation 165.6529779
3490 Sample Variance 27440.90909
3520 Kurtosis 1.718883645
3540 Skewness 1.091108688
3550 Range 615
3650 Minimum 3310
3730 Maximum 3925
3925 Sum 42480
Count 12
General Descriptive Stats
Using Microsoft Excel Data
Analysis Tool
1.
DCOVA
Select Data.
2. Select Data Analysis.
3. Select Descriptive Statistics
and click OK.
General Descriptive Stats
Using Microsoft Excel DCOVA
4. Enter the cell range.
5. Check the Summary
Statistics box.
6. Click OK
Excel output DCOVA
House Prices
Microsoft Excel
Mean 600000
descriptive statistics output, using
the house price data: Standard Error 357770.8764
Median 300000
Mode 100000
Standard Deviation 800000
House Prices: Sample Variance 640,000,000,000
Kurtosis 4.1301
$2,000,000 Skewness 2.0068
500,000 Range 1900000
300,000 Minimum 100000
100,000 Maximum 2000000
100,000 Sum 3000000
Count 5
Minitab Output DCOVA
Minitab descriptive statistics output using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Descriptive Statistics: House Price

Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000

N for
Variable Median Maximum Range Mode Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2 2.01 4.13
Distribution Shape and
The Boxplot DCOVA

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Boxplot Example
DCOVA

• Below is a Boxplot for the following


Xdata:
Q Q / Median
smallest 1 2 Q X 3 largest

0 0022233525 2 3 3 4
27
5
27
5
9 27
Sample statistics versus
population parametersDCOVA
Measure Population Sample
Parameter Statistic
Mean
µ X
Variance
σ2 S2
Standard
σ S
Deviation
Measuring two variables

Co variance
correlation
We Discuss Two Measures Of The Relationship Between
Two Numerical Variables

Scatter plots allow you to visually


examine the relationship between
two numerical variables and now
we will discuss two quantitative
measures of such relationships.
The Covariance
The Coefficient of Correlation
The Covariance
DCOVA
• The covariance measures the strength of the linear
relationship between two numerical variables (X & Y)

• The sample covariance:


n

∑ ( X − X)( Y − Y )
i i
cov ( X , Y ) = i=1
n −1
• Only concerned with the strength of the relationship
• No causal effect is implied
Interpreting Covariance
DCOVA
• Covariance between two variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent

• The covariance has a major flaw:

– It is not possible to determine the relative


strength of the relationship from the size of
the covariance
Coefficient of Correlation
DCOVA
• Measures the relative strength of
the linear relationship between
two numerical variables
• Sample coefficient
cov (X , Y) of correlation:
r=
SX SY

n n n

where ∑ (Xi − X)(Yi − Y) ∑ (X − X)


i
2
∑ (Y − Y )
i
2

cov (X , Y) = i =1
SX = i=1
SY = i=1
n −1 n −1 n −1
Features of the
Coefficient of Correlation
DCOVA
• The population coefficient of correlation is referred as ρ.
• The sample coefficient of correlation is referred to as r.
• Either ρ or r have the following features:
– Unit free
– Range between –1 and 1
– The closer to –1, the stronger the negative linear relationship
– The closer to 1, the stronger the positive linear relationship
– The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with Various
Coefficients of Correlation
Y Y
DCOVA

X X
r = -1 r = -.6

Y
Y Y

X X X
r = +1 r = +.3 r=0
The Coefficient of Correlation Using Microsoft Excel
Function
DCOVA
Test #1 Score Test #2 Score Correlation Coefficient
78 82 0.7332 =CORREL(A2:A11,B2:B11)
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77
The Coefficient of Correlation Using Microsoft Excel
Data Analysis Tool

1. Select Data
DCOVA
2. Choose Data Analysis
3. Choose Correlation &
Click OK
The Coefficient of
Correlation
Using Microsoft Excel DCOVA

4. Input data range and select


appropriate options
5. Click OK to get output
Interpreting the Coefficient of Correlation
Using Microsoft Excel
DCOVA

r = .733
Scatter Plot of Test Scores

100

There is a relatively 95

strong positive linear

Test #2 Score
90

relationship between 85
test score #1 and test 80
score #2. 75

70
70 75 80 85 90 95 100
Students who scored Test #1 Score
high on the first test
tended to score high on
second test.
Pitfalls in Numerical
Descriptive Measures
DCOVA
• Data analysis is objective
– Should report the summary measures
that best describe and communicate the
important aspects of the data set

• Data interpretation is subjective


– Should be done in fair, neutral and clear
manner
Ethical Considerations
DCOVA
Numerical descriptive measures:
• Should document both good and
bad results
• Should be presented in a fair,
objective and neutral manner
• Should not use inappropriate
summary measures to distort facts
Chapter Summary
In this chapter we have discussed:
• Describing the properties of central
tendency, variation, and shape in numerical
data
• Constructing and interpreting a boxplot
• Computing descriptive summary measures
for a population
• Calculating the covariance and the
coefficient of correlation
• A home theatre in a box is the easiest and cheapest way to provide surround
sound for a home entertainment centre. A sample of prices is shown here
(Consumer Reports Buying Guide, 2013). The prices are for models with a
DVD player and for models without a DVD player.

Models with DVD Player Price Models without DVD Player Price
Sony HT-1800DP $450 Pioneer HTP-230 $300
Pioneer HTD-330DV 300 Sony HT-DDW750 300
Sony HT-C800DP 400 Kenwood HTB-306 360
Panasonic SC-HT900 500 RCA RT-2600 290
Panasonic SC-MTI 400 Kenwood HTB-206 300

• Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theatre unit?
• Compute the range, variance, and standard deviation for the two samples. What does
this information tell you about the prices for models with and without a DVD player?
Price with DVD player Price without DVD player

Mean 410 Mean 310

Standard Error 33.1662479 Standard Error 12.64911064

Median 400 Median 300

Mode 400 Mode 300

Standard Deviation 74.16198487 Standard Deviation 28.28427125

Sample Variance 5500 Sample Variance 800

Kurtosis 0.867768595 Kurtosis 4.578125

Skewness -0.551618069 Skewness 2.099223257

Range 200 Range 70

Minimum 300 Minimum 290

Maximum 500 Maximum 360

Sum 2050 Sum 1550

Count 5 Count 5
• The following data were used to construct the histograms of the number
of days required to fill orders for Dawson Supply, Inc., and J.C. Clark
Distributors

Dawson Supply Days for Delivery :11 10 9 10 11 11 10 11 10 10


Clark Distributors Days for Delivery : 8 10 13 7 10 11 10 7 15 12

• Use the range and standard deviation to support that Dawson Supply
provides the more consistent and reliable delivery times.
dawson clark

Mean 10.3 Mean 10.3


Standard Error 0.213437475 Standard Error 0.817176711
Median 10 Median 10
Mode 10 Mode 10
Standard Deviation 0.674948558 Standard Deviation 2.584139659

Sample Variance 0.455555556 Sample Variance 6.677777778

Kurtosis -0.282994816 Kurtosis -0.350865189

Skewness -0.433637384 Skewness 0.359288855

Range 2 Range 8

Minimum 9 Minimum 7

Maximum 11 Maximum 15

Sum 103 Sum 103

Count 10 Count 10

coefficient of variation 25.08873455


coefficient of variation 6.552898619
Practice

• The following times were recorded by the quarter-mile and mile runners
of a university track team (times are in minutes).
Quarter-Mile Times: .92 .98 1.04 .90 .99
Mile Times: 4.52 4.35 4.60 4.70 4.50
After viewing this sample of running times, one of the coaches commented
that the quarter milers turned in the more consistent times. Use the standard
deviation and the coefficient of variation to summarize the variability in the
data. Does the use of the coefficient of variation indicate that the coach’s
statement should be qualified?
•A statistics student made the
following grades on 5 tests: 84, 78,
88, 72, and 72.
What is the median grade?
(a) 78
(b) 80
(c) 88
(d) 72
Quantitative Methods in
Management
Day-5

Simple Regression
Page: 430-445
Recap..
• Introduction
• Definition
• Terms and terminologies
• Types of statistics
• Types of data
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing variables
• Tables
• Frequency distribution
• Relative frequency distribution
• Relative percent frequency distribution
• Cumulative frequency distribution
• Univariate
• Bivariate / cross tabulation
• Diagrams
• Bar charts
• Pie charts
• Graphs
• Histogram
• Frequency polygon
• Frequency curve
• Cumulative frequency curve ( Ogive)
• EDA
• Stem and leaf plot
• Scatter diagram
• Dot plots
• Pareto chart
Numerical descriptive statistics
Measures of location
Measures of dispersion
Measures of shapes
Kurtosis
Relative location
- Z score
- Chebyshev's inequality
- Empirical rule
Exploratory data analysis
- Five number summary
- Box plot
Relationship between two variables
- Co variance
- correlation
Simple linear regression
Chapter 12
Learning Objectives
• How to use regression analysis to predict the
value of a dependent variable based on an
independent variable
• The meaning of the regression coefficients b0
and b1
• Measures of variation ( SSE, SSR, SST)
• Coefficient of determination
Steps
• Plot the scatter diagram
• Identify the independent and dependent variables
• Fit a regression line by estimating b0 and b1
• Estimate the value ( predict Y^)
• Measures of variation
• SSR
• SSE
• SST
• Coefficient of determination
• Sign of b1(Sqrt (r2)) correlation
Correlation vs. Regression
• A scatter plot can be used to show the relationship between
two variables
• Correlation analysis is used to measure the strength of the
association (linear relationship) between two variables
• Correlation is only concerned with strength of the relationship
• No causal effect is implied with correlation
Introduction to
Regression Analysis
• Regression analysis is used to:
• Predict the value of a dependent variable based on the value of at least
one independent variable
• Explain the impact of changes in an independent variable on the
dependent variable
Dependent variable: the variable we wish to predict or explain
Independent variable: the variable used to predict or explain the
dependent variable
Regression Analysis
• Regression analysis is a tool for building
mathematical and statistical models that characterize
relationships between a dependent (ratio) variable
and one or more independent, or explanatory
variables (ratio or categorical), all of which are
numerical.
• Simple linear regression involves a single
independent variable.
• Multiple regression involves two or more
independent variables.
Simple Linear Regression Model

• Only one independent variable, X


• Relationship between X and Y is described
by a linear function
• Changes in Y are assumed to be related to
changes in X
Using Statistics
• Regression refers to the statistical technique of modeling the
relationship between variables.
• In simple linear regression, we model the relationship
between two variables.
• One of the variables, denoted by Y, is called the dependent
variable and the other, denoted by X, is called the
independent variable.
• The model we will use to depict the relationship between X and
Y will be a straight-line relationship.
• A graphical sketch of the pairs (X, Y) is called a scatter plot.
Using Statistics
This scatterplot locates pairs of observations of Scatterplot of Advertising Expenditures (X) and Sales (Y)
advertising expenditures on the x-axis and sales 140

on the y-axis. We notice that: 120

100

Sales
80
Larger (smaller) values of sales tend to be 60
associated with larger (smaller) values of 40

advertising. 20

0
0 10 20 30 40 50
A d ve rtising

The scatter of points tends to be distributed around a positively sloped straight line.

The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
Types of Relationships
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Types of Relationships
(continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships
(continued)
No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi = β0 + β1Xi + ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)

Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi

εi Slope = β1

Predicted Value Random Error for this Xi


of Y for Xi value

Intercept = β0

Xi
X
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an estimate of the
population regression line

Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept

Value of X for

Ŷi = b0 + b1Xi
observation i
The Least Squares Method
b0 and b1 are obtained by finding the values of
that minimize the sum of the squared differences
between Y and Ŷ :

min ∑ (Yi −Ŷi ) = min ∑ (Yi − (b0 + b1Xi ))


2 2
Simple Linear Regression Model
The equation that describes how y is related to x and
an error term is called the regression model.

The simple linear regression model is:

y = β 0 + β 1x + ε

where:
β0 and β1 are called parameters of the model,
ε is a random variable called the error term.
Simple Linear Regression Equation
Positive Linear Relationship

E(y)

Regression line

Intercept Slope β1
β0 is positive

x
Simple Linear Regression Equation

Negative Linear Relationship

E(y)

Intercept
β0 Regression line

Slope β1
is negative

x
Simple Linear Regression Equation

No Relationship

E(y)

Intercept Regression line


β0
Slope β1
is 0

x
Estimated Simple Linear Regression Equation

The estimated simple linear regression equation

ŷ = b0 + b1 x

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.
Estimation Process
Regression Model Sample Data:
y = β0 + β1x +ε x y
Regression Equation x1 y1
E(y) = β0 + β1x . .
Unknown Parameters . .
β0, β1 xn yn

Estimated
Regression Equation
b0 and b1
provide estimates of ŷ = b0 + b1 x
β0 and β1
Sample Statistics
b0, b1
Least Squares Method
• Least Squares Criterion

min ∑ (y i − y$ i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
Least Squares Method
• Slope for the Estimated Regression Equation

∑ ( x − x )( y − y )
b1 = i i

∑ (x − x )
i
2

where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares Method

y-Intercept for the Estimated Regression Equation

b0 = y − b1 x
Simple Linear Regression
Example: Reed Auto Sales

Reed Auto periodically has a special week-long sale.


As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
Simple Linear Regression

Example: Reed Auto Sales

Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Σx = 10 Σy = 100
x=2 y = 20
Columns required to calculate b1 and bo
X Y XY X2
Estimated Regression Equation
Slope for the Estimated Regression Equation
∑ ( x − x )( y − y ) 20
b1 = i i
= =5
∑ (x − x )i
2
4

y-Intercept for the Estimated Regression Equation


b0 = y − b1 x = 20 − 5(2) = 10
Estimated Regression Equation
yˆ = 10 + 5x
Using Excel’s Chart Tools for
Scatter Diagram & Estimated Regression Equation

Reed Auto Sales Estimated Regression Line


30

25
Cars Sold 20
y = 5x + 10
15
10
5

0
0 1 2 3 4
TV Ads
Columns required to calculate Measures of
variation
X Y XY X2 Y^ =… +…X Y- Y^ (Y-y^)2 (Y-Y) (Y-Y)2

SSE SST
Coefficient of Determination
• Relationship Among SST, SSR, SSE
SST = SSR + SSE

∑ i
( y − y ) 2
= ∑ i
( ˆ
y − y ) 2
+ ∑ i i
( y − ˆ
y ) 2

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Coefficient of Determination

The coefficient of determination is:

r2 = SSR/SST

where:
SSR = sum of squares due to regression
SST = total sum of squares

• Goodness of fit
• Perfect fit : SSR= SST or SST/SSR = 1
• Poorer fit result in larger values for SSE ( occurs when SSR=0 and SSE = SST)
Coefficient of Determination

r2 = SSR/SST = 100/114 = .8772


The regression relationship is very strong; 87.72%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Sample Correlation Coefficient

rxy = (sign of b1 ) Coefficient of Determination


rxy = (sign of b1 ) r 2

where:
b1 = the slope of the estimated regression
equation
yˆ = b0 + b1 x
Sample Correlation Coefficient

rxy = (sign of b1 ) r 2

The sign of b1 in the equation yˆ = 10 + 5 x is “+”.

rxy = + .8772

rxy = +.9366
Simple Linear Regression Example

• A real estate agent wishes to examine the


relationship between the selling price of a home and
its size (measured in square feet)

• A random sample of 10 houses is selected


• Dependent variable (Y) = house price in $1000s
• Independent variable (X) = square feet
Simple Linear Regression
Example: Data
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Simple Linear Regression Example: Scatter Plot
House price model: Scatter Plot

450
400

House Price ($1000s)


350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Simple Linear Regression Example: Using Excel
Simple Linear Regression Example: Excel Output
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 houseprice = 98.24833+ 0.10977(squarefeet)
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Simple Linear Regression Example: Minitab Output
The regression equation is:
The regression equation is

Price = 98.2 + 0.110 Square Feet


house price = 98.24833 +
Predictor Coef SE Coef T P
0.10977
Constant 98.25 58.03 1.69 0.129 (square feet)
Square Feet 0.10977 0.03297 3.33 0.010

S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%

Analysis of Variance

Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
Simple Linear Regression Example: Graphical
Representation

House price model: Scatter Plot and Prediction Line

450
400

House Price ($1000s)


350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price = 98.24833 + 0.10977 (square feet)


Simple Linear Regression Example:
Interpretation of bo

house price = 98.24833 + 0.10977 (square feet)

• b0 is the estimated mean value of Y when the value


of X is zero (if X = 0 is in the range of observed X
values)
• Because a house cannot have a square footage of 0,
b0 has no practical application
Simple Linear Regression Example:
Interpreting b1

house price = 98.24833 + 0.10977 (square feet)

• b1 estimates the change in the mean value of


Y as a result of a one-unit increase in X
• Here, b1 = 0.10977 tells us that the mean value of a house
increases by 0.10977($1000) = $109.77, on average, for
each additional one square foot of size
Simple Linear Regression
Example: Making Predictions
Predict the price for a house
with 2000 square feet:

house price = 98.25 + 0.1098 (sq.ft.)

= 98.25 + 0.1098(2000)

= 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Simple Linear Regression Example:
Making Predictions
• When using a regression model for prediction, only
predict within the relevant range of data
Relevant range for
interpolation

450
400
House Price ($1000s)

350
300
250
200
150 Do not try to
100
extrapolate
50
0
beyond the
0 500 1000 1500 2000 2500 3000 range of
Square Feet observed X’s
Measures of Variation

• Total variation is made up of two parts:

SST = SSR + SSE


Total Sum of Regression Sum of Error Sum of
Squares Squares Squares

SST = ∑ ( Yi − Y )2 SSR = ∑ ( Ŷi − Y )2 SSE = ∑ ( Yi − Ŷi )2


where:
Y = Mean value of the dependent variable
Yi = Observed value of the dependent variable
Yˆi = Predicted value of Y for the given Xi value
Measures of Variation
(continued)

• SST = total sum of squares (Total Variation)


• Measures the variation of the Yi values around their mean
Y
• SSR = regression sum of squares (Explained Variation)
• Variation attributable to the relationship between X and Y
• SSE = error sum of squares (Unexplained Variation)
• Variation in Y attributable to factors other than X
Measures of Variation
(continued)
Y
Yi ∧ ∧
SSE = ∑(Yi - Yi )2 Y

_
SST = ∑(Yi - Y)2

Y ∧ _
SSR = ∑(Yi - Y)2
_ _
Y Y

Xi X
Coefficient of Determination, r2
• The coefficient of determination is the portion of
the total variation in the dependent variable that is
explained by variation in the independent variable
• The coefficient of determination is also called r-
squared and is denoted as r2

2 SSR regression sum of squares


r = =
SST total sum of squares

note:
0 ≤r ≤1
2
R 2

• R2 (R-squared) is a measure of the “fit” of the line to the data.


• The value of R2 will be between 0 and 1.
• A value of 1.0 indicates a perfect fit and all data points would lie on the line;
the larger the value of R2 the better the fit.
Examples of r 2 Values
Y

r2 = 1

Perfect linear relationship between X and Y:

X 100% of the variation in Y is explained by


r2 = 1 variation in X
Y

X
r2 =1
Examples of r2 Values
Y

0 < r2 < 1

Weaker linear relationships between X


and Y:
X
Some but not all of the variation in Y is
explained by variation in X
Y

X
Examples of r2 Values

r2 = 0
Y

No linear relationship between X and Y:

The value of Y does not depend on X.


(None of the variation in Y is explained by
variation in X)
X
r2 = 0
Simple Linear Regression Example:
Coefficient of Determination, r2 in Excel
SSR 18934.9348
Regression Statistics
r = 2
= = 0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in house
Standard Error 41.33032 prices is explained by variation in
Observations 10
square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
QUANTITATIVE METHODS IN
MANAGEMENT
Chapter 4, 5 (part) : 149-188

DAY 6
Course content
Chapter Page number content
1 11-32 Introduction, variables, levels of measurement, types of statistics
2 33-98 Organizing and visualizing variables
3 99-148 Numerical descriptive measures – categorical / numerical – Measures of central
tendencies, measures of dispersion, skewness, kurtosis, measures of relations –
co variance and correlation
12 430-446 Simple linear regression, estimating bo and b1, measures of variations, SST, SSR
and SSE, coefficient of determination, coefficient of correlation.
4 149-182 Basic probability
5,6 183-232 Discrete probability distributions – binomial and poisson
Continuous probability distribution – Normal
7 234-257 Sampling distribution
8 258-293 Confidence interval – mean, proportion and determining sample size
9 294-304 Fundamentals of testing of hypothesis
11 402-415 Chi square test
15 15-1 to 15-17 Decision analysis
RECAP
• Introduction – definition, types of statistics, levels of
measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and
correlation
• Regression analysis – simple, measures of variations ( SSE,
SSR, SST, coefficient of determination and coefficient of
correlation)
INFERENCE STATISTICS
• Preliminaries concepts on probability and random
variables, theoretical distributions
• Sampling distribution
• Estimation and
• Testing of hypothesis
Probability
• Concepts
• Definition - different ways of assigning probability.
• Understand and apply marginal, union, joint, and
conditional probabilities.
• Solve problems using the laws of probability including
the laws of addition, multiplication and conditional
probability
• Revise probabilities using Bayes’ rule.
CERTAIN/ REAL UNCERTAIN/ABSTRACT
• Survey • Experiment

• Data • Events

• Descriptive statistics • Probability inference statistics


Introduction
• A student considers enrolling in a program and
would like to know how difficult it is. She obtains
the marks distribution of students who have
appeared for the most recent exam

Marks % # of students
0 – 25 45
25 – 50 280
50 – 75 205
75 –100 30
Introduction…
• Assuming the next exam is equally tough and
there is a same % of dull and bright students, she
can conclude that the % of students in the 4
classes of marks would be

Marks % # of students % of students


0 – 25 45 8
25 – 50 280 50
50 – 75 205 37
75 –100 30 5
Introduction…
• The first distribution is related to past data and is
a frequency distribution
• The second relates to the future and is called a
probability distribution
• Inference
• 8% of the students score 0 – 25 mks, 50% between 25
& 50 & so on
• If the student considers herself in the top 5%, she can
expect to score 75 – 100 mks
• She has a 8% chance of scoring 0 – 25 mks etc
Probability…
• Business decisions are often based on an analysis
of uncertainties such as
• What are the chances that sales will decrease if we
increase prices?
• What is the likelihood a new assembly method will
increase productivity?
• How likely is it that the project will be finished on
time?
• What are the odds in favour of a new investment being
profitable?
• What are the VPs prospects of turning around a dept?
Probability
• This chance is “PROBABILITY”
• Some of the earliest works on probability
originated in a series of letters between Pierre de
Fermat and Blaise Pascal in the 1650s
• First general theory of probability was initially
applied to gambling – 19th century – Pierre Simon
Laplace in his book “Theorie Analytique de
Probabilities” in 1812
Probability is
A quantitative measure of uncertainty
A measure of the strength of belief in the
occurrence of an uncertain event
A measure of the degree of chance or likelihood of
occurrence of an uncertain event
Measured by a number between 0 and 1 (or
between 0% and 100%)
Basis for inference statistics
Probability as a Numerical Measure
of the Likelihood of Occurrence

Increasing Likelihood of Occurrence

0 .5 1
Probability:

The event The occurrence The event


is very of the event is is almost
unlikely just as likely as certain
to occur. it is unlikely. to occur.
Basic Probability Concepts

• Probability – the chance that an uncertain


event will occur (always between 0 and 1)

• Impossible Event – an event that has no chance of


occurring (probability = 0)

• Certain Event – an event that is sure to occur


(probability = 1)
Events

Each possible outcome of a variable is an event/ collection of


sample points.

• Simple event
• An event described by a single characteristic
• e.g., A red card from a deck of cards
• Joint event
• An event described by two or more characteristics
• e.g., An ace that is also red from a deck of cards
• Complement of an event A (denoted A’)
• All events that are not part of event A
• e.g., All cards that are not diamonds
Sample Space
The Sample Space is the collection of all possible events
e.g. All 6 faces of a die:

e.g. All 52 cards of a bridge deck:


Visualizing Events
• Contingency Tables
Ace Not Ace Total

Black 2 24 26
Red 2 24 26

Total 4 48 52

• Decision Trees 2
Sample
Space
Sample
Space 24
Full Deck
of 52 Cards
2

24
Definition
• Classical method of assigning probability (rules and
laws)

• Relative frequency of occurrence (cumulated


historical data)

• Subjective Probability (personal intuition or


reasoning)

• Axiomatic
Assessing Probability
There are three approaches to assessing
the probability of an uncertain event:
1. a priori -- based on prior knowledge of the process
X number of ways the event can occur
probability of occurrence = =
Assuming
T total number of elementary outcomes
all
outcomes 2. empirical probability
are equally
likely number of ways the event can occur
probability of occurrence =
total number of elementary outcomes

3. subjective probability
based on a combination of an individual’s past experience,
personal opinion, and analysis of a particular situation
Example of a priori probability

Find the probability of selecting a face card (Jack, Queen,


or King) from a standard deck of 52 cards.

X number of face cards


Probability of Face Card = =
T total number of cards

X 12 face cards 3
= =
T 52 total cards 13
Empirical probability
• 383 of 751 business graduates were employed in the
past. The probability that a particular graduate will be
employed in his or her major area is 383/751 = 0.51
or 51%.

• The probability that your income tax return will be


audited if there are two million mailed to your district
office and 2,400 are to be audited is 2,400/2,000,000
= 0.0012 or 0.12%.
Example of empirical probability
Find the probability of selecting a male taking statistics from the
population described in the following table:

Taking Stats Not Taking Total


Stats
Male 84 145 229
Female 76 134 210
Total 160 279 439

number of males taking stats 84


Probability of male taking stats = = = 0.191
total number of people 439
Relative Frequency Approach
• Probability is defined as the proportion of
times A occurs, if the experiment is
repeated several times under the same or
similar conditions
• Example - An insurance company knows
from past data that of all men, 60 yrs old,
about 50 out of every 1,00,000 will die
within a year
P = 50 / 100000 = 0.0005
• As the number of trials increase, the
estimate would reach true probability
Relative Frequency Method
Example: Lucas Tool Rental
Lucas Tool Rental would like to assign probabilities
to the number of car polishers it rents each day.
Office records show the following frequencies of daily
rentals for the last 40 days.
Number of Number
Polishers Rented of Days
0 4
1 6
2 18
3 10
4 2
Relative Frequency Method
Example: Lucas Tool Rental
Each probability assignment is given by dividing
the frequency (number of days) by the total frequency
(total number of days).
Number of Number
Polishers Rented of Days Probability
0 4 .10
1 6 .15
2 18 .45 4/40
3 10 .25
4 2 .05
40 1.00
Subjective Probability
• Comes from a person’s intuition or reasoning
• Subjective -- different individuals may (correctly) assign different numeric
probabilities to the same event
• Degree of belief
• Useful for unique (single-trial) experiments
• New product introduction
• Initial public offering of common stock
• Site selection decisions
• Sporting events

Example – an analyst on share prices may opine that the price of Reliance share has a 20%
probability of increasing by Rs.500 in the next 2 months
• Estimating the probability that a person wins a jackpot lottery.
• Estimating the probability that the GM will lose its first ranking in car sales.
Axiomatic ( basic rules)
• Probability lies between 0 and 1
• P(sure event) = 1
• P( impossible event) = 0
• P(AUB) = P(A) +P(B) – P(A∩B)
Definitions
Simple vs. Joint Probability
• Simple Probability refers to the probability of a
simple event.
• ex. P(King)
• ex. P(Spade)

• Joint Probability refers to the probability of an


occurrence of two or more events (joint event).
• ex. P(King and Spade)
SIMPLE PROBLEMS
Problem
• An experiment with three outcomes has been repeated
50 times, and it was learned that E1 occurred 20 times,
E2 occurred 13 times, and E3 occurred 17 times.
Assign probabilities to the outcomes. What method
did you use?

• A decision maker subjectively assigned the following


probabilities to the four outcomes of an experiment:
P(E1)= .10, P(E2) = .15, P(E3)= .40, and P(E4) =.20.
Are these probability assignments valid? Explain.
• The following data are the results of a random
poll of machinists and foremen regarding a
wage package

Opinion Machinists Foremen


Strongly Support 9 10
Mildly Support 11 3
Undecided 2 2
Mildly oppose 4 8
Strongly oppose 4 7
• What is the probability that
• A machinist mildly supports.
• A foreman is undecided.

• What types of probability estimates are


these?
Problem sol…
• What is the probability that
• A machinist mildly supports (11/60)
• A foreman is undecided ( 2/ 60)

• What types of probability estimates are these?


(Relative frequency)
Problem
• Data on the functioning of photocopiers in the
office
Copier Days functioning Days out of service
---------------------------------------------------------------------
1 209 51
2 217 43
3 258 02
4 229 31
5 247 13
---------------------------------------------------------------------
What is the probability of a copier being out of
service?
Problem 4
• Availability of VC funding has provided a big boost to companies. In
2017, 2374 VC disbursements were made – of these, 1434 were made
to companies in Bangalore, 390 in Chennai, 217 in Hyderabad and 112
in Mumbai. 22% of companies were in early stages of development &
55% in an expansion stage. If you were to randomly chose a company,
what is the probability that the company will be
• Be from Bangalore?
• Not be from one of the four places mentioned?
• Not be in early stages of development?
• The total funds was 32.4 crores. Estimate the amount that went to Mumbai.
• How many companies in Chennai were in early stage of development if they
were evenly distributed across the cities ?
solution
a. 1434/2374
b. 221/2374
c. 78%
d. 112/2374*32.4 crores
e. 22% of 390
Exercise
• 6 states in the US have the largest number
of Fortune 500 companies
• New York 56 - California 53
• Texas 43 - Illinois 37
• Ohio 28 - Pennsylvania 28
If a Fortune 500 company is chosen for a
survey, what are the probabilities of the
following?
• P(New York)
• P(Texas)
Mutually Exclusive Events
• Mutually exclusive events
• Events that cannot occur simultaneously

Example: Drawing one card from a deck of cards

A = queen of diamonds; B = queen of clubs

• Events A and B are mutually exclusive


Organizing & Visualizing Events
• Venn Diagram For All Days In 2015
Sample Space (All Days Days That Are In January and Are
In 2015) Wednesdays

January Days

Wednesdays
Organizing & Visualizing Events
(continued)

• Contingency Tables -- For All Days in 2015


Jan. Not Jan. Total

Wed. 4 48 52
Not Wed. 27 286 313

Total 31 334 365

• Decision Trees 4
Total
Number
Sample Of
Space 27 Sample
All Days Space
In 2015 Outcomes
48

286
Definition: Simple Probability
• Simple Probability refers to the probability of a
simple event.
• ex. P(Jan.)
• ex. P(Wed.)

Jan. Not Jan. Total


P(Wed.) = 52 / 365
Wed. 4 48 52
Not Wed. 27 286 313

Total 31 334 365

P(Jan.) = 31 / 365
Definition: Joint Probability
• Joint Probability refers to the probability of an
occurrence of two or more events (joint event).
• ex. P(Jan. and Wed.)
• ex. P(Not Jan. and Not Wed.)

Jan. Not Jan. Total


P(Not Jan. and Not Wed.)
Wed. 4 48 52
= 286 / 365
Not Wed. 27 286 313

Total 31 334 365

P(Jan. and Wed.) = 4 / 365


Collectively Exhaustive Events
• Collectively exhaustive events
• One of the events must occur
• The set of events covers the entire sample space

Example: Randomly choose a day from 2015

A = Weekday; B = Weekend;
C = January; D = Spring;

• Events A, B, C and D are collectively exhaustive (but


not mutually exclusive – a weekday can be in January
or in Spring)
• Events A and B are collectively exhaustive and also
mutually exclusive
Collectively Exhaustive Events
• Collectively exhaustive events
• One of the events must occur
• The set of events covers the entire sample space

example:
A = aces; B = black cards;
C = diamonds; D = hearts

• Events A, B, C and D are collectively exhaustive (but


not mutually exclusive – an ace may also be a heart)
• Events B, C and D are collectively exhaustive and also
mutually exclusive
Addition theorem

If X and Y are mutually exclusive,


P ( X ∪ Y ) = P ( X ) + P (Y )

Y
X
Law of Addition

P( X ∪Y) = P( X) + P(Y) − P( X ∩Y)

X Y
Rules of Probability- Addition
theorem
• The probability of the entire sample space is 1
• 0 ≤ p(A) ≤ 1

• Mutually exclusive events, p(A or B) = p(A) + p(B)


• For non mutually exclusive events, p(A or B) =
p(A) + p(B) – p(A and B)

Example – 80% of tourists visit Delhi, 70% visit


Mumbai and 60% visit both
• P(D or M) – 0.9
• P(D nor M) – 0.1
Computing Joint and
Marginal Probabilities

• The probability of a joint event, A and B:


number of outcomes satisfying A and B
P( A and B) =
total number of elementary outcomes

• Computing a marginal (or simple) probability:

P(A) = P(A and B1 ) + P(A and B 2 ) + L + P(A and Bk )


• Where B1, B2, …, Bk are k mutually exclusive and collectively
exhaustive events
Joint Probability Example
P(Red and Ace)

number of cards that are red and ace 2


= =
total number of cards 52

Color
Type Red Black Total
Ace 2 2 4
Non-Ace 24 24 48
Total 26 26 52
Marginal Probability Example
P(Ace)

2 2 4
= P( Ace and Re d) + P( Ace and Black ) = + =
52 52 52

Color
Type Red Black Total
Ace 2 2 4
Non-Ace 24 24 48
Total 26 26 52
Marginal & Joint Probabilities In
A Contingency Table

Event
Event B1 B2 Total
A1 P(A1 and B1) P(A1 and B2) P(A1)

A2 P(A2 and B1) P(A2 and B2) P(A2)

Total P(B1) P(B2) 1

Joint Probabilities Marginal (Simple) Probabilities


Joint Probability Example

P(Jan. and Wed.)


number of days that are in Jan. and are Wed. 4
= =
total number of days in 2015 365

Jan. Not Jan. Total

Wed. 4 48 52
Not Wed. 27 286 313

Total 31 334 365


Marginal Probability Example

P(Wed.)
4 48 52
= P(Jan. and Wed.) + P(Not Jan. and Wed.) = + =
365 365 365

Jan. Not Jan. Total

Wed. 4 48 52
Not Wed. 27 286 313

Total 31 334 365


Probability Summary So Far
• Probability is the numerical measure of the
likelihood that an event will occur 1 Certain

• The probability of any event must be


between 0 and 1, inclusively
0 ≤ P(A) ≤ 1 For any event A

• The sum of the probabilities of all mutually 0.5


exclusive and collectively exhaustive events is
1
P(A) + P(B) + P(C) = 1
If A, B, and C are mutually exclusive and
collectively exhaustive
0 Impossible
General Addition Rule

General Addition Rule:


P(A or B) = P(A) + P(B) - P(A and B)

If A and B are mutually exclusive, then


P(A and B) = 0, so the rule can be simplified:

P(A or B) = P(A) + P(B)


For mutually exclusive events A and B
General Addition Rule Example

P(Red or Ace) = P(Red) +P(Ace) - P(Red and Ace)

= 26/52 + 4/52 - 2/52 = 28/52


Don’t count
the two red
Color aces twice!
Type Red Black Total
Ace 2 2 4
Non-Ace 24 24 48
Total 26 26 52
Computing Conditional Probabilities
• A conditional probability is the probability of one event,
given that another event has occurred:

P(A and B) The conditional


P(A | B) = probability of A given
P(B) that B has occurred

P(A and B) The conditional


P(B | A) = probability of B given
P(A) that A has occurred

Where P(A and B) = joint probability of A and B


P(A) = marginal or simple probability of A
P(B) = marginal or simple probability of B
Conditional Probability Example

Of the cars on a used car lot, 70% have air


conditioning (AC) and 40% have a CD player (CD).
20% of the cars have both.

• What is the probability that a car has a CD player,


given that it has AC ?

i.e., we want to find P(CD | AC)


Conditional Probability Example
(continued)
Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have a CD
player (CD).
20% of the cars have both.

CD No CD Total

AC 0.2 0.5 0.7


No AC 0.2 0.1 0.3
Total 0.4 0.6 1.0

P(CD and AC) 0.2


P(CD | AC) = = = 0.2857
P(AC) 0.7
Conditional Probability Example
(continued)
Given AC, we only consider the top row (70% of the cars). Of
these, 20% have a CD player. 20% of 70% is about 28.57%.

CD No CD Total

AC 0.2 0.5 0.7


No AC 0.2 0.1 0.3
Total 0.4 0.6 1.0

P(CD and AC) 0.2


P(CD | AC) = = = 0.2857
P(AC) 0.7
Using Decision Trees
.2
Given AC or no AC: .7 P(AC and CD) = 0.2

P(AC and CD’) = 0.5


.5
.7
All Conditional
Probabilities
Cars
.2
.3 P(AC’ and CD) = 0.2

.1 P(AC’ and CD’) = 0.1


.3
Using Decision Trees
(continued)
.2
Given CD or no CD: .4 P(CD and AC) = 0.2

P(CD and AC’) = 0.2


.2
.4
All Conditional
Probabilities
Cars
.5
.6 P(CD’ and AC) = 0.5

.1 P(CD’ and AC’) = 0.1


.6
Independence
• Two events are independent if and only if:

P(A | B) = P(A)
• Events A and B are independent when the probability of
one event is not affected by the fact that the other event
has occurred
Multiplication Rules

• Multiplication rule for two events A and B:

P(A and B) = P(A | B) P(B)

Note: If A and B are independent, then


P(A | B) = P(A)
and the multiplication rule simplifies to

P(A and B) = P(A) P(B)


Marginal Probability
• Marginal probability for event A:

P(A) = P(A | B1 ) P(B1 ) + P(A | B 2 ) P(B 2 ) + L + P(A | Bk ) P(Bk )

• Where B1, B2, …, Bk are k mutually exclusive and collectively exhaustive events
Problem Contingency Table

Counts

AT& T IBM Total

Telecommunication 40 10 50
Probability that a project is
undertaken by IBM given it is a
Computers 20 30 50 telecommunications project:

Total 60 40 100

Probabilities
P ( IBM I T )
AT& T IBM Total P ( IBM T ) =
P (T )
Telecommunication .40 .10 .50
0 . 10
= = 0 .2
Computers .20 .30 .50 0 . 50

Total .60 .40 1.00


Problem
• Job Satisfaction scores (0-100)

<50 50-60 60-70 70-80 80-90

Carpenter 0 2 4 3 1

Lawyer 6 2 1 1 0

Therapist 0 5 2 1 2

System 2 1 4 3 0
Analyst
Problem ….
• Develop a joint probability table
• What is the p that one of the participants had a
score in the 80s
• What is the p of a score in the 80s given he was a
therapist
• What is the p that one of the participants was a
lawyer
• What is the p that one of the participants was a
lawyer and received a score under 50
• What is the p of a score under 50 given that he is
a lawyer
• What is the p of being a lawyer given that his
score is under 50
• What is the p of a score of 70 or higher
Problem
• Joint Probability table

<50 50-60 60-70 70-80 80-90 Total

Carpenter .000 .050 .100 .075 .025 .250

Lawyer .150 .050 .025 .025 .000 .250

Therapist .000 .125 .050 .025 .050 .250

System Analyst .050 .025 .100 .075 .000 .250

TOTAL .200 .250 .275 .200 .075 1.000


Problem
• Preference of a brand of soap in 4 cities

Delhi Kolkata Chennai Mumbai

Yes 45 55 60 50

No 35 45 35 45

No 5 5 5 5
opinion
Problem …
What is the probability that a consumer selected at
random
• Preferred the brand = 210/390
• Preferred the brand and was from Chennai = 60/390
• Preferred the brand given that he was from Chennai =
60/100
• Given that a consumer preferred the brand, what
is the p that he was from Mumbai = 50/210
Bayes’ Theorem
• Bayes’ Theorem is used to revise previously
calculated probabilities based on new information.

• Developed by Thomas Bayes in the 18th Century.

• It is an extension of conditional probability.


Bayes’ Theorem

P(A | B i )P(Bi )
P(Bi | A) =
P(A | B 1 )P(B1 ) + P(A | B 2 )P(B2 ) + ⋅ ⋅ ⋅ + P(A | B k )P(Bk )

• where:
Bi = ith event of k mutually exclusive and collectively
exhaustive events
A = new event that might impact P(Bi)
Bayes’ Theorem …
Prior Probabilities

New Information

Bayesian Theorem

Posterior Probabilities
Bayes’ Theorem Example
• A drilling company has estimated a 40% chance of
striking oil for their new well.
• A detailed test has been scheduled for more
information. Historically, 60% of successful wells
have had detailed tests, and 20% of unsuccessful
wells have had detailed tests.
• Given that this well has been scheduled for a
detailed test, what is the probability
that the well will be successful?
Bayes’ Theorem Example
(continued)

• Let S = successful well


U = unsuccessful well
• P(S) = 0.4 , P(U) = 0.6 (prior probabilities)
• Define the detailed test event as D
• Conditional probabilities:
P(D|S) = 0.6 P(D|U) = 0.2
• Goal is to find P(S|D)
Bayes’ Theorem Example
(continued)
Apply Bayes’ Theorem:

P(D | S)P(S)
P(S | D) =
P(D | S)P(S) + P(D | U)P(U)
(0.6)(0.4)
=
(0.6)(0.4) + (0.2)(0.6)
0.24
= = 0.667
0.24 + 0.12

So the revised probability of success, given that this well has


been scheduled for a detailed test, is 0.667
Bayes’ Theorem Example
(continued)

• Given the detailed test, the revised probability of a


successful well has risen to 0.667 from the original
estimate of 0.4

Prior Conditional Joint Revised


Event
Prob. Prob. Prob. Prob.
S (successful) 0.4 0.6 (0.4)(0.6) = 0.24 0.24/0.36 = 0.667
U (unsuccessful) 0.6 0.2 (0.6)(0.2) = 0.12 0.12/0.36 = 0.333

Sum = 0.36
Problem 13
The probability of 3 events A, B and C occurring
are
p(A) = .35 p(B) = .45 p(C) = 0.2
Assuming that A, B or C has occurred, the
probabilities of another event, X, occurring are
p(X/A) = .8 p(X/B) = .65 p(X/C) = 0.3

Find p(A/X), p(B/X) and p(C/X)


Problem
Prior Conditional Joint Posterior

P(A) P(X/A) = P(XA) = .80 * P(A/X) = .28/.6325


= .35 .80 .35 = .4427
= .28
P(B) P(X/B) = .65 P(XB) = .65 * .45 P(B/X)= .2925/.6325
= .45 = .2925 = .4624
P(C) P(X/C) = .30 P(XC) = .30 * .20 P(C/X) = .06/.6325
= .20 = .06 = .0949
P(X) = p(XA) + .6325 1.0000
p(XB) + p(XC)
Problem
.8 .8*.35 =.28 .28/.6325 =.4427

.35

.45 .65 .65*.45=.2925 .2925/.6325=.4624

.20
.3 .3*.2 = .06 .06/.6325=.0949

Prior Conditional Joint Posterior


Problem 14
When a machine is set correctly, it produces 25%
defectives – otherwise it produces 60% defectives.
From past experience, manufacturer knows that
the chances that the machine is set correctly or
wrongly is 50/50. The machine was set and
before production, one piece was inspected and
found to be defective. What is the probability of
the machine set up being correct?
Problem …
.25 5*.25 =.125 .29

.5

.5
.60 .5*.6 = .30 .71

Prior Condl Joint Posterior


Problem 15
An item is manufactured by 3 machines, M1, M2 and M3.
Out of the total manufactured during a specific
production period, 50% are manufactured on M1, 30% on
M2 and 20% on M3. 2% of the items are produced by M1
& M2 are defective, while 3% of those by M3 are defective.
All items are put into one bin. From the bin, one item is
drawn at random and is found to be defective. What is
the ‘p’ that it was made on M1, M2 or M3?
Problem …
.02 .01 .4546

.5

.3 .02 .006 .2727

.2
.03 .006 .2727

Prior Condl Joint Posterior


Problem 16
A doctor has decided to prescribe 2 new drugs
to 200 heart patients as follows: 50 get Drug
A, 50 get Drug B and 100 get both. The 200
patients were chosen so that each had a 80%
chance of having a heart attack, if given
neither drug. Drug A reduces the probability
of a heart attack by 35%, B by 20% and the 2
drugs taken together work independently. If
a randomly selected patient in a program has
a heart attack, what is the probability that
the patient was given both drugs?
Problem 16…
Prior Conditional Joint Posterior

P(A) P(H/A) = P(HA) = .52 * .25 P(A/H) = .13/.498


= .25 .8 * .65 = .520 = .13 = .261
P(B) P(H/B) = P(HB) = .64 * .25 P(B/H)= .16/.498
= .25 .8*.8=.640 = .16 = .321
P(A&B) P(H/A & B) = P(HAB) = P(A&B/H) = .208/.498
= .50 .8*.65*.8=.416 .416*.5 = .418
= .208
P(H) .498 1.0000
Counting Rules
• Rules for counting the number of possible outcomes

• Counting Rule 1:
• If any one of k different mutually exclusive and
collectively exhaustive events can occur on each of n
trials, the number of possible outcomes is equal to

kn
• Example
• If you roll a fair die 3 times then there are 63 = 216 possible
outcomes
Counting Rules
(continued)
• Counting Rule 2:
• If there are k1 events on the first trial, k2 events on the
second trial, … and kn events on the nth trial, the number
of possible outcomes is

(k1)(k2)9(kn)
• Example:
• You want to go to a park, eat at a restaurant, and see a movie.
There are 3 parks, 4 restaurants, and 6 movie choices. How
many different possible combinations are there?
• Answer: (3)(4)(6) = 72 different possibilities
Counting Rules
(continued)

• Counting Rule 3:
• The number of ways that n items can be arranged in order
is

n! = (n)(n – 1)9(1)
• Example:
• You have five books to put on a bookshelf. How many different
ways can these books be placed on the shelf?

• Answer: 5! = (5)(4)(3)(2)(1) = 120 different possibilities


Counting Rules (continued)

• Counting Rule 4:
• Permutations: The number of ways of arranging X objects
selected from n objects in order is

n!
n Px =
• Example: (n − X)!
• You have five books and are going to put three on a bookshelf. How
many different ways can the books be ordered on the bookshelf?

• Answer: different possibilities

n! 5! 120
n Px = = = = 60
(n − X)! (5 − 3)! 2
Counting Rules
(continued)
• Counting Rule 5:
• Combinations: The number of ways of selecting X objects
from n objects, irrespective of order, is
n!
n Cx =
X!(n − X)!
• Example:
• You have five books and are going to select three are to read.
How many different combinations are there, ignoring the order
in which they are selected?

n! 5! 120
n Cx =
• Answer: = = = 10 different possibilities
X!(n − X)! 3! (5 − 3)! (6)(2)
Chapter Summary
• Discussed basic probability concepts
• Sample spaces and events, contingency tables, Venn diagrams, simple
probability, and joint probability

• Examined basic probability rules


• General addition rule, addition rule for mutually exclusive events, rule
for collectively exhaustive events

• Defined conditional probability


• Statistical independence, marginal probability, decision trees, and the
multiplication rule

• Discussed Bayes’ theorem


QUANTITATIVE METHODS IN
MANAGEMENT
Class 7
Course content
Chapter Page number content
1 11-32 Introduction, variables, levels of measurement, types of statistics
2 33-98 Organizing and visualizing variables
3 99-148 Numerical descriptive measures – categorical / numerical – Measures of central
tendencies, measures of dispersion, skewness, kurtosis, measures of relations –
co variance and correlation
12 430-446 Simple linear regression, estimating bo and b1, measures of variations, SST, SSR
and SSE, coefficient of determination, coefficient of correlation.
4 149-182 Basic probability
5,6 183-232 Discrete probability distributions – binomial and poisson
Continuous probability distribution – Normal
7 234-257 Sampling distribution
8 258-293 Confidence interval – mean, proportion and determining sample size
9 294-304 Fundamentals of testing of hypothesis
11 402-415 Chi square test
15 15-1 to 15-17 Decision analysis
RECAP
• Introduction – definition, types of statistics, levels of
measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and correlation
• Regression analysis – simple, measures of variations ( SSE, SSR,
SST, coefficient of determination and coefficient of correlation)
• Introduction to Probability – Addition and multiplication
theorem, Baye’s theorem
Business Statistics:
A First Course
5th Edition

Chapter 5

RANDOM VARIABLES 183-188


Learning Objectives
In this chapter, you learn:
• The properties of a probability distribution
• To calculate the expected value and variance of
a probability distribution
• Discrete and continuous distribution
Random Variable
• Numerical description of the outcome of the experiment

• A variable that assumes different numerical values as a


result of random experiments or occurrences

• The values assumed by these variables are random and


cannot be predicted

• Example - Rainfall measured in cms, temp in Celsius,


share prices
Definitions
Random Variables
• A random variable represents a possible numerical
value from an uncertain event.

• Discrete random variables produce outcomes that


come from a counting process (e.g. number of
courses you are taking this semester).

• Continuous random variables produce outcomes


that come from a measurement (e.g. your annual
salary, or your weight).
Definitions
Random Variables

Random
Variables

Ch. 5 Discrete Continuous Ch. 6


Random Variable Random Variable
Discrete Random Variables
• Can only assume a countable number of values
Examples:

• Roll a die twice


Let X be the number of times 4 occurs
(then X could be 0, 1, or 2 times)

• Toss a coin 5 times.


Let X be the number of heads
(then X = 0, 1, 2, 3, 4, or 5)
Probability Distribution For A
Discrete Random Variable
A listing of all the outcomes of an experiment and the probability associated with each outcome.

Related to frequency distributions by simply replaces the actual numbers (frequencies) with the

proportion of the total at each level of frequency.

• A probability distribution for a discrete random variable is a mutually exclusive listing of all
possible numerical outcomes for that variable and a probability of occurrence associated with each
outcome.

Number of Classes Taken Probability


2 0.2
3 0.4
4 0.24
5 0.16
Example of a Discrete Random Variable
Probability Distribution

Experiment: Toss 2 Coins. Let X = # heads.


4 possible outcomes
Probability Distribution
T T X Value Probability
0 1/4 = 0.25

T H 1 2/4 = 0.50
2 1/4 = 0.25
H T

Probability
0.50

0.25
H H
0 1 2 X
Discrete Random Variable
• A random variable that assumes a finite number of
values or an infinite sequence of values such as 0, 1,
2…. is a discrete random variable

• The number of values is limited

• Generated from experiments in which things are


‘counted’ not ‘measured’
Discrete Random Variable
• Example
• Number of people who visit a doctor
• Customers who place an order
• Number of defective radios in a shipment
• Gender of the customer
• Number of new subscribers to a magazine
• Number of bad checks received by a restaurant
• Number of absent employees on a given day
Continuous Random Variable

• A random variable that may assume any numerical value in

an interval or collection of intervals is a continuous random

variable

• Outcomes based on time, weight, distance or temperature

• Generated from experiments in which things are ‘measured’

not ‘counted’
Continuous Random Variable
• Example
• Temperature between 29oC and 30oC can be 29.1, 29.5
or 29.9
• Time between customer arrivals at a bank
• Current Ratio of a motorcycle distributorship
• Elapsed time between arrivals of bank customers
• Percent of the labor force that is unemployed
Discrete random variable Continuous random variable
• (X, p(x)) • (x, f(x))
• PMF (probability mass function) • PDF ( Probability Density
Function)
• ΣP(x) = 1
• ∫f(x)dx = 1

E(X) E(X)
V(X) V(X)
• Decide which of the following distributions are probability
distributions:
a. The distribution takes the values -2,-1 ,0,1 and P(-2) =-0.5, P(-1) =
0.7, P(0) = 0.2 and P(1) = 0.6
b. The distribution takes the values 1,2,3,4 and corresponding
probabilities are 0.1,0.2,0.25,0.3
c. The distribution takes the values 20,30,40,50 with corresponding
probabilities as 0.1,0.2,0.3,0.4
Discrete Random Variables
Expected Value (Measuring Center)
• Expected Value (or mean) of a discrete
random variable (Weighted Average)
N
µ = E(X) = ∑ Xi P( Xi )
i=1

X P(X)
• Example: Toss 2 coins,
0 0.25
X = # of heads,
1 0.50
compute expected value of X:
2 0.25

E(X) = ((0)(0.25) + (1)(0.50) + (2)(0.25))


= 1.0
Discrete Random Variables
Measuring Dispersion
• Variance of a discrete random variable
N
σ = ∑ [Xi − E(X)] P(Xi )
2 2

i=1

• Standard Deviation of a discrete random variable

N
σ = σ2 = ∑ i
[X
i =1
− E(X)] 2
P(Xi )

where:
E(X) = Expected value of the discrete random variable X
Xi = the ith outcome of X
P(Xi) = Probability of the ith occurrence of X
Discrete Random Variables
Measuring Dispersion
(continued)

• Example: Toss 2 coins, X = # heads,


compute standard deviation (recall E(X) = 1)

σ= ∑ [X − E(X)] P(X )
i
2
i

σ = (0 − 1)2 (0.25) + (1− 1)2 (0.50) + (2 − 1)2 (0.25) = 0.50 = 0.707

Possible number of heads


= 0, 1, or 2
X P(X) x.P(X) X2 X2. P(X)

E(X) E(X2)
• V(X) = E(X2) – [E(X)]2

• SD(X) = SQRT ( V(X))


Probability Distribution
• Since the value of a RV cannot be predicted
accurately, probabilities are assigned to all the
likely values the variable might take
Example
Price of Share Probability
X P(X)
15 0.12
20 0.20
23 0.08
25 0.10
30 0.50
Total 1.00
Expected Value
• The Expected Value is the Mean of the probability
distribution
• It is the weighted average of the value that the RV
can assume
• The probabilities assigned are used as weights
• The mean price of the share is Rs.25.14
Discrete Distribution - Example

Distribution of Daily
Crises P
Number of r 0.5
Probability o
Crises 0.4
b
0 0.37 a 0.3
b
1 0.31 i
0.2
2 0.18 l 0.1
3 0.09 i
0
4 0.04 t 0 1 2 3 4 5
y
5 0.01 Number of Crises
Mean of the Crises Data Example

µ = E( X ) = ∑ X ⋅ P( X ) = 115
.
X P(X) X•P(X) P
r 0.5
0 .37 .00
o 0.4
1 .31 .31 b
a 0.3
2 .18 .36 b
0.2
i
3 .09 .27
l 0.1
4 .04 .16 i
0
t 0 1 2 3 4 5
5 .01 .05 y
Number of Crises
1.15
Variance & SD - Crises Data

∑ ( X − µ ) ⋅ P ( X ) = 1.41 σ σ
2

σ =
2
= 141
. = 119
2
= .
X P(X) (X- µ ) (X- µ ) 2 (X- µ ) 2 • P(X)
0 .37 -1.15 1.32 .49
1 .31 -0.15 0.02 .01
2 .18 0.85 0.72 .13
3 .09 1.85 3.42 .31
4 .04 2.85 8.12 .32
5 .01 3.85 14.82 .15
1.41
Discrete Variables Expected Value (Measuring
Center)
• Expected Value (or mean) of a discrete
variable (Weighted Average)
N
µ = E(X) = ∑ x i P ( X = x i )
i =1

Interruptions Per Day In Probability


Computer Network (xi) P(X = xi) xiP(X = xi)
0 0.35 (0)(0.35) = 0.00
1 0.25 (1)(0.25) = 0.25
2 0.20 (2)(0.20) = 0.40
3 0.10 (3)(0.10) = 0.30
4 0.05 (4)(0.05) = 0.20
5 0.05 (5)(0.05) = 0.25
1.00 μ = E(X) = 1.40
Discrete Variables:
Measuring Dispersion
• Variance of a discrete variable
N
σ 2 = ∑ [x i − E(X)]2 P(X = x i )
i =1

• Standard Deviation of a discrete variable

N
σ = σ2 = ∑ i
[x
i =1
− E(X)] 2
P(X = x i )

where:
E(X) = Expected value of the discrete variable X
xi = the ith outcome of X
P(X=xi) = Probability of the ith occurrence of X
Discrete Variables:
Measuring Dispersion (continued)
N
σ= ∑ [x
i =1
i − E(X)] P(X = x i )
2

Interruptions Per
Day In Computer Probability
Network (xi) P(X = xi) [xi – E(X)]2 [xi – E(X)]2P(X = xi)
0 0.35 (0 – 1.4)2 = 1.96 (1.96)(0.35) = 0.686
1 0.25 (1 – 1.4)2 = 0.16 (0.16)(0.25) = 0.040
2 0.20 (2 – 1.4)2 = 0.36 (0.36)(0.20) = 0.072
3 0.10 (3 – 1.4)2 = 2.56 (2.56)(0.10) = 0.256
4 0.05 (4 – 1.4)2 = 6.76 (6.76)(0.05) = 0.338
5 0.05 (5 – 1.4)2 = 12.96 (12.96)(0.05) = 0.648
σ2 = 2.04, σ = 1.4283
Problem
• An auto dealer determines the demand he can
expect for autos during a 1-month period. The
probability of demand for 50, 55, 60 & 65 cars
sold per month is 0.15, 0.2, 0.3 and 0.35. Find
the expected value

(Expected # of cars sold during a 1- month


period is 59.25)
Problem
• A person expects a gain of Rs.80, Rs.120, Rs.160
and Rs.20 with associated probabilities of 0.2,
0.4, 0.3 & 0.1 respectively. If he wishes to
compare this with another security whose gains
are 150, 80 & 20 with prob of 0.1, 0.8 & 0.1
respectively, where would he invest?
Problem
Share 1 Share 2
Gain(X) P PX P(X- )2 Gain(Y) P PY P(Y-Y)2
80 .2 16 231.2 150 .1 15 476.1
120 .4 48 14.4 80 .8 64 0.8
160 .3 48 634.8 20 .1 2 372.1
20 .1 2 883.6
∑ 1.0 114 1764 1 81 849
Problem
Share 1 Share 2

Mean 114 81

Variance 1764 849

SD 42 29.14

CV 36.84% 35.97%

Return – Share 1
Risk – Share 2
Problem
The probability distribution for the # of TV sets
per household is
X 0 1 2 3 4 5
P(X) .01 .23 .41 .2 .1 .05

• Compute the expected value of # of TVs per household and


compare it with the average. What are the variance and SD
of the # of TV sets per household?

(EV = 2.3 / Variance = 1.23 / SD = 1.11)


• A software company conducts a survey among its programmers and project
leaders regarding their job satisfaction. The data obtained regarding job
satisfaction of 50 programmers and 15 project leaders is given below.
Job satisfaction score Programmers Project leaders
1 5 1
2 10 3
3 20 3
4 10 6
5 5 2

• Develop probability distribution for the job satisfaction of programmers and


project leaders.
• Find the mean and variance
• Who is more satisfied with his job, a randomly selected programmer or a
randomly selected project leader?
RANDOM VARIABLE
Bob Walters, who frequently invests in the stock market, carefully studies any potential investment.
He is currently examining the possibility of investing in the Trinity Power Company. Through
studying past performance, Walters has broken the potential results of the investment into five
possible outcomes with accompanying probabilities. The outcomes are annual rates of return on a
single share of stock that currently costs $150. Find the expected value of the return for investing in
a single share of Trinity Power
Return Investment 0.00 10.00 15.00 25.00 50.00
Probability 0.20 0.25 0.30 0.15 0.10

If Walters purchases stock whenever the expected rate of return exceeds 10 per cent, will he
purchase the stock, according to these data? What is your suggestion to Walters?

• Answer= Yes, he will purchase the stock because he carefully studies any potential investment.
Exercises
• 5.5
• 5.6
• 5.7
• 5.8
page no. 187,188
Probability Distributions

Probability
Distributions

Ch. 5 Discrete Continuous Ch. 6


Probability Probability
Distributions Distributions

Binomial Normal

Poisson
For any distribution
• When to apply
• Prob mass function/ density function
• Range
• Parameter
• Constants/ characteristics
• Simple problem
• Given parameter calculate probability
• Given probability, parameters get the random variable ( INVERSE)
• Expected value : E(X) = NP(X=x)
Binomial Probability Distribution
A fixed number of observations, n
e.g., 15 tosses of a coin; ten light bulbs taken from a warehouse
Each observation is categorized as to whether or not
the “event of interest” occurred
e.g., head or tail in each toss of a coin; defective or not defective light bulb
Since these two categories are mutually exclusive and collectively
exhaustive
When the probability of the event of interest is represented as π, then the
probability of the event of interest not occurring is 1 - π
Constant probability for the event of interest occurring (π) for each
observation
Probability of getting a tail is the same each time we toss the coin
Binomial Probability Distribution
(continued)

Observations are independent


The outcome of one observation does not affect the
outcome of the other
Two sampling methods deliver independence
Infinite population without replacement
Finite population with replacement
Possible Applications for the Binomial
Distribution

• A manufacturing plant labels items as either


defective or acceptable
• A firm bidding for contracts will either get a
contract or not
• A marketing research firm receives survey
responses of “yes I will buy” or “no I will not”
• New job applicants either accept the offer or
reject it
The Binomial Distribution
Counting Techniques
• Suppose the event of interest is obtaining heads on the toss
of a fair coin. You are to toss the coin three times. In how
many ways can you get two heads?

• Possible ways: HHT, HTH, THH, so there are three ways you
can getting two heads.

• This situation is fairly simple. We need to be able to count


the number of ways for more complicated situations.
Counting Techniques
Rule of Combinations

• The number of combinations of selecting X objects out


of n objects is

n!
n Cx =
X!(n − X)!
where:
n! =(n)(n - 1)(n - 2) . . . (2)(1)
X! = (X)(X - 1)(X - 2) . . . (2)(1)
0! = 1 (by definition)
Counting Techniques
Rule of Combinations
• How many possible 3 scoop combinations could you create at
an ice cream parlor if you have 31 flavors to select from?
• The total choices is n = 31, and we select X = 3.

31! 31! 31 • 30 • 29 • 28!


31 C3 = = = = 31 • 5 • 29 = 4495
3!(31 − 3)! 3!28! 3 • 2 • 1 • 28!
Binomial Distribution Formula
n! X n−X
P(X) = π (1-π)
X ! (n − X)!

P(X) = probability of X events of interest in n


trials, with the probability of an “event of
interest” being π for each trial Example: Flip a coin four
times, let x = # heads:
X = number of “events of interest” in sample, n=4
(X = 0, 1, 2, ..., n)
π = 0.5
n = sample size (number of trials
1 - π = (1 - 0.5) = 0.5
or observations)
π = probability of “event of interest” X = 0, 1, 2, 3, 4
Characteristics
- Experiment involves n identical trials
- Each trial has exactly two possible outcomes: success and failure
- Each trial is independent of the previous trials
p is the probability of a success on any one trial
q = (1-p) is the probability of a failure on any one trial
p and q are constant throughout the experiment
X is the number of successes in the n trials

Main parameters are n and p X ~ B(n,p)

E(X) = N P(X=x)
Binomial Distribution

• Probability n! X n− X
function P( X ) = p ⋅q for 0 ≤ X ≤ n
X !( n − X ) !

• Mean
value µ = n⋅ p
• Variance and σ 2
= n⋅ p⋅q
standard
deviation σ = σ 2
= n⋅ p⋅q
Example:
Calculating a Binomial Probability
What is the probability of one success in five
observations if the probability of an event of
interest is .1?
X = 1, n = 5, and π = 0.1
n!
P(X = 1) = π X (1 − π ) n − X
X!(n − X)!
5!
= (0.1)1 (1 − 0.1)5 −1
1!(5 − 1)!
= (5)(0.1)(0.9) 4
= 0.32805
The Binomial Distribution
Example
Suppose the probability of purchasing a defective
computer is 0.02. What is the probability of
purchasing 2 defective computers in a group of 10?
X = 2, n = 10, and π = .02
n!
P(X = 2) = π X (1 − π ) n − X
X!(n − X)!
10!
= (.02) 2 (1 − .02)10 − 2
2!(10 − 2)!
= (45)(.0004)(.8508)
= .01531
The Binomial Distribution
Shape
• The shape of the binomial P(X) n = 5 π = 0.1
.6
distribution depends on the .4
values of π and n .2
0
Here, n = 5 and π = .1 0 1 2 3 4 5 X

P(X) n = 5 π = 0.5
.6
.4
.2
0
Here, n = 5 and π = .5 0 1 2 3 4 5 X
The Binomial Distribution
Using Binomial Tables
n = 10
x F π=.20 π=.25 π=.30 π=.35 π=.40 π=.45 π=.50
0 F 0.1074 0.0563 0.0282 0.0135 0.0060 0.0025 0.0010 10
1 F 0.2684 0.1877 0.1211 0.0725 0.0403 0.0207 0.0098 9
2 F 0.3020 0.2816 0.2335 0.1757 0.1209 0.0763 0.0439 8
3 F 0.2013 0.2503 0.2668 0.2522 0.2150 0.1665 0.1172 7
4 F 0.0881 0.1460 0.2001 0.2377 0.2508 0.2384 0.2051 6
5 F 0.0264 0.0584 0.1029 0.1536 0.2007 0.2340 0.2461 5
6 F 0.0055 0.0162 0.0368 0.0689 0.1115 0.1596 0.2051 4
7 F 0.0008 0.0031 0.0090 0.0212 0.0425 0.0746 0.1172 3
8 F 0.0001 0.0004 0.0014 0.0043 0.0106 0.0229 0.0439 2
9 F 0.0000 0.0000 0.0001 0.0005 0.0016 0.0042 0.0098 1
10 F 0.0000 0.0000 0.0000 0.0000 0.0001 0.0003 0.0010 0

F π=.80 π=.75 π=.70 π=.65 π=.60 π=.55 π=.50 x


Examples:
n = 10, π = .35, x = 3: P(x = 3|n =10, π = .35) = .2522
n = 10, π = .75, x = 2: P(x = 2|n =10, π = .75) = .0004
Binomial Distribution Characteristics

• Mean
µ = E(x) = nπ
Variance and Standard Deviation
2
σ = nπ (1 - π )
σ = nπ (1 - π )
Where n = sample size
π = probability of the event of interest for any trial
(1 – π) = probability of no event of interest for any trial
The Binomial Distribution
Characteristics
Examples
P(X) n = 5 π = 0.1
µ = nπ = (5)(.1) = 0.5 .6
.4
σ = nπ (1 - π ) = (5)(.1)(1 − .1) .2
0
= 0.6708 0 1 2 3 4 5 X

P(X) n = 5 π = 0.5
µ = nπ = (5)(.5) = 2.5 .6
.4
σ = nπ (1 - π ) = (5)(.5)(1 − .5) .2
0
= 1.118 0 1 2 3 4 5 X
Using Excel For The
Binomial Distribution
Problem
• Find the probability of getting
I) exactly 3 heads in 4 tosses of a biased coin,
where p(H) = ¾ and p(T) = ¼
P(X = 3) = 4C3 (¾)3 (¼)1 = 0.421875
ii) Atleast 3 heads p(X ≥ 3) = .737
iii) No more than 2 heads p(X ≤2) = .263
Problem
• Assume that on an average, 1 telephone line out of
5 is busy. What is the probability that if 3
randomly selected telephone numbers are called

I) not more than 2 will be busy


II) At least 2 will be busy

P = 1/5 = 0.2 q = 0.8 n=3


i) p(X ≤ 2)=p(0)+p(1)+p(2)=.512+.384+.096=.992
ii) p(X ≥ 2) = p(2) + p(3) = .096 + .008 = .104
Problem
• Consider a binomial experiment with 2 trials
and p = .4. Compute
• p(1)
• p(0)
• p(2)
• The probability of at least one success
• The expected value, variance and SD
( 0.48, 0.36, 0.16, 0.64)
(.8 / .48 / .6928)
• A bank of a nationalized bank is giving educational loans to students.
The persons in charge of disbursement of loans claim that 40% of the
students do not repay the loan. The manager is not convinced and
takes a random sample of 10 students. If the person in-charge of
sanction of loans is correct, find the probability that
• There of the 10 students do not repay
• None of the 10 students do not repay
Practice
• 5.13 page no. 194
• 5.14 page no. 195
• 5.15
• 5.16
• 5.17
Poisson distribution
Poisson Distribution
It describes the number of times some event occurring during a
specified interval
It is a discrete probability because it is formed by counting
Based on two assumptions:
The probability is proportional to the length of the interval
The intervals are independent.
( That means the longer the interval the larger the probability, and the
number of occurrences in one interval does not affect the other
intervals)

When ‘n’ is very large and ‘p’ is very small

The p of the occurrence of events in an interval of time is independent

The expected number of occurrences must hold constant throughout the


experiment
Simeon –Denis Poisson in 1837 published a work entitled “Researches on the probability of criminal &
civil verdicts” which includes a discussion of what later became known as Poisson Distribution
Poisson Distribution:
Applications
• Arrivals at queuing systems

• airports -- people, airplanes, automobiles, baggage

• banks -- people, automobiles, loan applications

• computer file servers -- read and write operations

• Defects in manufactured goods

• number of defects per 1,000 feet of extruded copper wire

• number of blemishes per square foot of painted surface

• number of errors per typed page


The Poisson Distribution
Definitions
• You use the Poisson distribution when you are
interested in the number of times an event occurs in
a given area of opportunity.
• An area of opportunity is a continuous unit or
interval of time, volume, or such area in which more
than one occurrence of an event can occur.
• The number of scratches in a car’s paint
• The number of mosquito bites on a person
• The number of computer crashes in a day
The Poisson Distribution

• Apply the Poisson Distribution when:


• You wish to count the number of times an event occurs in a given area
of opportunity
• The probability that an event occurs in one area of opportunity is the
same for all areas of opportunity
• The number of events that occur in one area of opportunity is
independent of the number of events that occur in the other areas of
opportunity
• The probability that two or more events occur in an area of
opportunity approaches zero as the area of opportunity becomes
smaller
• The average number of events per unit is λ (lambda)
Poisson Distribution Formula

−λ x
e λ
P( X) =
X!
where:
X = number of events in an area of opportunity
λ = expected number of events
e = base of the natural logarithm system (2.71828...)
Poisson Distribution Characteristics

• Mean
µ=λ
Variance and Standard Deviation

σ2 = λ
σ= λ
where λ = expected number of events
Using Poisson Tables
λ

X 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90

0 0.9048 0.8187 0.7408 0.6703 0.6065 0.5488 0.4966 0.4493 0.4066


1 0.0905 0.1637 0.2222 0.2681 0.3033 0.3293 0.3476 0.3595 0.3659
2 0.0045 0.0164 0.0333 0.0536 0.0758 0.0988 0.1217 0.1438 0.1647
3 0.0002 0.0011 0.0033 0.0072 0.0126 0.0198 0.0284 0.0383 0.0494
4 0.0000 0.0001 0.0003 0.0007 0.0016 0.0030 0.0050 0.0077 0.0111
5 0.0000 0.0000 0.0000 0.0001 0.0002 0.0004 0.0007 0.0012 0.0020
6 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0003
7 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Example: Find P(X = 2) if λ = 0.50

e − λ λ X e −0.50 (0.50)2
P(X = 2) = = = 0.0758
X! 2!
Using Excel For The
Poisson Distribution
Graph of Poisson Probabilities
0.70

Graphically: 0.60

λ = 0.50 0.50

λ= 0.40

P(x)
X 0.50
0.30
0 0.6065
0.20
1 0.3033
2 0.0758 0.10

3 0.0126 0.00
0 1 2 3 4 5 6 7
4 0.0016
5 0.0002 x
6 0.0000
P(X = 2) = 0.0758
7 0.0000
Poisson Distribution Shape
• The shape of the Poisson Distribution depends
on the parameter λ :

0.70
λ = 0.50 0.25
λ = 3.00
0.60
0.20
0.50

0.15
0.40

P(x)
P(x)

0.30 0.10

0.20
0.05
0.10

0.00 0.00
0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12

x x
Types of problem in Poisson distribution
• Given mean, find the probability of X=x
• Given: n and p , find the probability of X=x
• Given : probability and mean, find X ( inverse)
• Given, N, mean and find expected value
Problem
On an average, 1 in 400 items are defective.
Out of 100 items chosen, what is the
probability that there are more than 3
defectives?
P = 1/400 n = 100 λ = np = 0.25
P(X > 3) = 1 – [p(0) + p(1) + p(2) + p(3)] =
1–[e-.25(.250/0!+.251/1!+.252/2!+.253/3!)]
= 1 – [.7787(1 + .25 + .03125 + .0026)]
= 1 – (.7787474 * 1.28385)
= 0.0002052 ( 2 in 10000)
(e-.25 = e -.20 * e -.05 = .8187 * .9512 =
.7787474)
Problem
A factory produces an item in packets of 10.
The probability of an item to be defective is
.2%. Find the number of packets having 2
defective items in a consignment of 10000
packets

P = .2/100 = .002 n = 10 λ = np = 0.02


P(X=2) = (.02)2 * .9802 / 2 = 0.00019608
# of packets = 10000 * 0.00019608
= 1.9608

2
Poisson Approximation
of the Binomial Distribution
• Binomial probabilities are difficult to
calculate when n is large.
• Under certain conditions binomial
probabilities may be approximated by
Poisson probabilities.

If n > 20 and n ⋅ p ≤ 7, the approximation is acceptable .


• Poisson approximation

Use λ = n ⋅ p.
Customer Dissatisfaction Survey –
Airline Passengers
Complaints per 100,000
Southwest 0.25
Alaska Air 0.54
Delta 0.79
US Airways 0.84
Continental 1.02
Tower Air 1.91
Northwest 2.21
If 1000,000 boarded passengers were contacted, what is
the probability that exactly 3 of them logged a
complaint? λ = 1.08
(1.08)3e-1.08 /3! = 0.0713
7.13% of the time, 3 would have logged complaints
• On the average, six people per hour use a self-service
banking facility during the prime shopping hours in a
department store. What is the probability that
a. Exactly six people will use the facility during a randomly
selected hour?
b. Fewer than five people will use the facility during a
randomly selected hour.
c. No one will use the facility during a 10 minutes interval?
d. No one will use the facility during a 5 – minutes interval
* X ~P(λ) st p(x=1) = P(x=2)

find the mean and P(x=0) (2, 0.1353)

• The variance of the PD is 0.5. Find P(X=3) 0.0126


QUANTITATIVE METHODS
FOR MANAGEMENT
RECAP
• Introduction – definition, types of statistics, levels of measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and correlation
• Regression analysis – simple, measures of variations ( SSE, SSR, SST,
coefficient of determination and coefficient of correlation)
• Basic probability – terms, definition, simple, joint, addition,
multiplication theorem, Baye’s theorem. Random variables, Expected
value and variance.
• Theoretical distributions – binomial, Poisson
Normal distribution
Chapter 6
207-233
Learning Objectives
In this chapter, you learn:
• Normal distribution and its properties
• Standard normal variate
• To compute probabilities from the normal distribution
• Inverse normal probability
Continuous Probability Distributions

• A continuous random variable is a variable that can


assume any value on a continuum (can assume an
uncountable number of values)
• thickness of an item
• time required to complete a task
• temperature of a solution
• height, in inches

• These can potentially take on any value depending


only on the ability to precisely and accurately
measure
The Normal Distribution
• ‘Bell
Shaped’
• Symmetrical f(X)
• Mean, Median and Mode
are Equal
Location is determined by the σ
mean, μ X
μ
Spread is determined by the
standard deviation, σ
Mean
= Median
= Mode
The random variable has an infinite
theoretical range:
+ ∞ to − ∞
Properties of Normal distribution
• It is symmetrical, thus the mean, median and mode are equal
• It is bell shaped, thus the empirical rule applies
• The interquartile range equals 1.33 standard deviations
• The range is approximately equal to 6 standard deviations.
The Normal Distribution
Density Function
The formula for the normal probability density function is

2
1  (X − µ) 
1 −  
2 σ 
f(X) = e
2πσ
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
µ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
Many Normal Distributions

By varying the parameters μ and σ, we obtain different normal


distributions
The Normal Distribution Shape

f(X) Changing μ shifts the distribution


left or right.

Changing σ increases or decreases


the spread.
σ

μ X
The Standardized Normal

• Any normal distribution (with any mean and standard


deviation combination) can be transformed into the
standardized normal distribution (Z)

• Need to transform X units into Z units

• The standardized normal distribution (Z) has a mean


of 0 and a standard deviation of 1
Translation to the Standardized
Normal Distribution

• Translate from X to the standardized normal (the


“Z” distribution) by subtracting the mean of X and
dividing by its standard deviation:

X −µ
Z=
σ
The Z distribution always has mean = 0 and standard
deviation = 1
The Standardized Normal
Probability Density Function
• The formula for the standardized normal probability
density function is

1 −(1/2)Z 2
f(Z) = e

Where e = the mathematical constant approximated by 2.71828


π = the mathematical constant approximated by 3.14159
Z = any value of the standardized normal distribution
The Standardized
Normal Distribution

• Also known as the “Z” distribution


• Mean is 0
• Standard Deviation is 1
f(Z)

Z
0

Values above the mean have positive Z-values, values below the mean have negative Z-values
Problem
The mean length of time spent on a training program
is 500 hrs and this normally distributed random
variable has a SD of 100 hrs
What is the probability that a participant will take
• More than 500 hrs (0.50)
• Between 500 & 600 hrs (.3413)
• Between 550 & 650 hrs (.2417)
• Between 420 & 570 hrs (.5461)
2) A pr oj ect yi elds an aver age cash – f low of Rs. 50 0 lakhs w i t h a st andar d
devi at i on of Rs. 60 l akhs. Calculat e t he follow i ng pr obabi li t i es.
(i ) Cash f low w i ll be mor e t han Rs. 560 l akhs
(i i ) Cash f low w i ll be less t han Rs. 420 lakhs
(i i i ) Cash f low w i ll be bet w een Rs. 460 and Rs. 540 lakhs
(i v) Cash f low w i ll be mor e t han Rs. 680 lakhs
Solution:
Let x Cash flow in Rs.
Χ = Rs. 500 lakhs
σ = Rs. 60 lakhs

(i) Cash flow will be more than Rs. 560 lakhs

-3 -2 -1 0 1 2 3

Χ−Χ
P (X ≥ 560) = P(Z≥ )
σ
560 − 500
P (X ≥ 560) = P(Z≥ )
60
= P (Z ≥ 1)

= ( Area from 0 to ∞ ) - ( Area from 0 to 1 )

= 0.5 – 0.3413

= 0.1587
( i i ) Cash f low w i ll be less t han Rs. 420 lakhs

-3 -2 -1 0 -1 -2 -3

Χ− Χ
P ( X ≤ 420 ) = P( Z≤ )
σ
420−500
P ( X ≤ 420 ) = P( Z≤ )
60
= P ( Z ≤ −1.33)

= ( Ar ea f r om 0 t o -∞ ) - ( Ar ea fr om -1.33 t o 0 )

= 0 .5 – 0 .40 8 2

= 0 .0 918
(iii) Cash flow will be between Rs. 460 and Rs. 540 lakhs

-3 -2 -1 0 1 2 3

460− 500 540− 500


P (460≤ X ≤ 540) = P( ≤Ζ≤ )
60 60
P (460≤ X ≤ 540) = P(-0.67≤ Z≤ 0.66)

= ( Area from - 0.67 to 0 ) - ( Area from 0 to 0.67 )

= 0.2486 + 0.2486

= 0.4972
(iv) Cash flow will be more than Rs. 680 lakhs

-3 -2 -1 0 1 2 3
Χ−Χ
P (X ≥ 680) = P(Z≥ )
σ
680 − 500
P (X ≥ 560) = P(Z≥ )
60
= P (Z ≥ 3 )

= ( Area from 0 to ∞ ) - ( Area from 0 to 3 )

= 0.5 – 0.4987

= 0.0013
Problem
A normal variable has a mean of 10 and SD 5.
What is the probability that the normal
variable will take a value in the interval 0.2 to
19.8?
P(0.2 < X < 19.8)
= p[((0.2 – 10)/5) < Z < ((19.8 – 10)/5)]
= p(-1.96 < Z < 1.96)
= 2 * .4750
= .9500
Finding Probabilities of the
Standard Normal Distribution:
P(0 < Z < 1.56)
Standard Normal Probabilities
Standard Norm al D istribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.4 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.3 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
f(z)

0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

0.0
-5 -4 -3 -2 -1 0 1
{
2 3 4 5
1.2
1.3
1.4
0.3849
0.4032
0.4192
0.3869
0.4049
0.4207
0.3888
0.4066
0.4222
0.3907
0.4082
0.4236
0.3925
0.4099
0.4251
0.3944
0.4115
0.4265
0.3962
0.4131
0.4279
0.3980
0.4147
0.4292
0.3997
0.4162
0.4306
0.4015
0.4177
0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
Look in row 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
labeled 1.5 and 2.4
2.5
0.4918
0.4938
0.4920
0.4940
0.4922
0.4941
0.4925
0.4943
0.4927
0.4945
0.4929
0.4946
0.4931
0.4948
0.4932
0.4949
0.4934
0.4951
0.4936
0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
column labeled .06 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

to find P(0 ≤ z ≤
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

1.56) = 0.4406
Given a Normal Probability
Find the X Value

• Steps to find the X value for a known probability:


1. Find the Z value for the known probability
2. Convert to X units using the formula:

X = µ + Zσ
Finding the X value for a Known
Probability
Example:
• Let X represent the time it takes (in seconds) to download an
image file from the internet.
• Suppose X is normal with mean 8.0 and standard deviation
5.0
• Find X such that 20% of download times are less than X.

0.2000

? 8.0 X
? 0 Z
Find the Z value for
20% in the Lower Tail
1. Find the Z value for the known probability
Standardized Normal Probability
Table (Portion) • 20% area in the lower tail
is consistent with
a Z value of -0.84
Z … .03 .04 .05

-0.9 … .1762 .1736 .1711


0.2000
-0.8 … .2033 .2005 .1977

-0.7 … .2327 .2296 .2266


? 8.0 X
-0.84 0 Z
Finding the X value

2. Convert to X units using the formula:

X = µ + Zσ

= 8.0 + ( −0.84)5.0

= 3.80

So 20% of the values from a distribution with mean 8.0 and


standard deviation 5.0 are less than 3.80
Problem
The average time a person spends in reading the
Economic Times is 49 minutes. Assume SD is 16 min
and that they are normally distributed.
What is the probability that a person will spend
a) At least 1 hour reading it – p(X ≥ 60) = .2451
b) No more than 30 min reading the ET – p(X ≤ 30) = .1170
c) For the 10% who spend the most time reading ET,
how much time do they spend? 69.48 min
Problem
A person must score in the upper 2% of the population on
an IQ test to qualify for membership in MENSA, the
international high IQ society. If IQ scores are normally
distributed with a mean of 100 and SD of 15, what score
must a person get to qualify for MENSA? (130.75)
• Data indicates that the time to download check$mart’s
home page is normally distributed with mean 7 seconds
and standard deviation of 2 seconds.
• what is the probability that the download time will be more than
9 seconds
• what is the probability that the download time will be between 7
and 9 seconds
• what is the probability that the download time is under 7second
or over 9 seconds
• what is the probability that the download time will be between 5
and 9 seconds
• what is the probability that the download time is less than 3.5
seconds
• How much time ( in seconds) will elapse before 10% of the
download are complete.
• What are the Upper Limit and Lower limit of X , located
symmetrically around the mean, which include 95% of the
download times
Business Statistics:
A First Course
5th Edition

Chapter 7
Chapter 7 : 234-257
Sampling and Sampling Distributions
Learning Objectives

In this chapter, you learn:


• To distinguish between different sampling methods
• The concept of the sampling distribution
• To compute probabilities related to the sample mean
and the sample proportion
• The importance of the Central Limit Theorem
Unit 3: Sampling Theory
Introduction to Sampling

We taste two to three A chemist takes a sample


grapes before of alcohol to determine
purchasing a whole whether it is proof or not.
bunch.

Does the selected sample represent the characteristics of the whole bunch?

Statisticians, however, recommend a scientific approach to sampling, which helps gets an


accurate sample, which in most cases represent the characteristics of the whole bunch.
Know more about sampling, as recommended by statisticians, in this unit.
Population Vs. Sample
• Population
• Set of all elements of interest in a study
• Complete enumeration – Census

• Sample
• Subset of the population
• Sampling

• The purpose of statistical inference is to develop estimates


& test hypotheses about the characteristics of a population
using the info obtained in the sample
Unit 3: Sampling Theory

Finite Population:
If the population consists of a finite number of
individuals, then it is called a Finite Population.

Infinite Population:
In a statistical survey aimed at determining average
per capita income of the people in a city, all earning
individuals in the city form the population.
Why Sample?
• Selecting a sample is less time-consuming than
selecting every item in the population (census).

• Selecting a sample is less costly than selecting every


item in the population.

• An analysis of a sample is less cumbersome and


more practical than an analysis of the entire
population.
A Sampling Process Begins With A Sampling Frame

• The sampling frame is a listing of items that make up the population


• Frames are data sources such as population lists, directories, or maps
• Inaccurate or biased results can result if a frame excludes certain
portions of the population
• Using different frames to generate data can lead to dissimilar
conclusions
Sampling techniques
Random/ Probability sampling
Non probability/Non random sampling
• Simple random sample • Purposive
• With replacement
• Without replacement • Convenience
• Stratified sample • Quota
• Proportionate • Judgment
• optimal
• Systematic sample • Snowball
• Cluster sample
• Multi stage sample
Types of Samples
Samples

Non-Probability Probability Samples


Samples

Simple Stratified
Random
Judgment Convenience

Systematic Cluster
Types of Samples:
Nonprobability Sample
• In a nonprobability sample, items included are
chosen without regard to their probability of
occurrence.
• In convenience sampling, items are selected based only on
the fact that they are easy, inexpensive, or convenient to
sample.
• In a judgment sample, you get the opinions of pre-selected
experts in the subject matter.
Types of Samples:
Probability Sample
• In a probability sample, items in the sample
are chosen on the basis of known probabilities.

Probability Samples

Simple
Random Systematic Stratified Cluster
Probability Sample:
Simple Random Sample
• Every individual or item from the frame has an
equal chance of being selected

• Selection may be with replacement (selected


individual is returned to frame for possible
reselection) or without replacement (selected
individual isn’t returned to the frame).

• Samples obtained from table of random numbers or


computer random number generators.
Selecting a Simple Random Sample Using A Random Number
Table
Portion Of A Random Number Table
Sampling Frame For 49280 88924 35779 00283 81163 07275
Population With 850 11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
Items
Item Name Item #
Bev R. 001 The First 5 Items in a simple
random sample
Ulan X. 002 Item # 492
. . Item # 808
Item # 892 -- does not exist so ignore
. . Item # 435
. . Item # 779
Item # 002
. .
Joann P. 849
Paul F. 850
Simple Random Sampling…
• Lottery Method
• Random Number Tables
• Tippetts Table (digits taken from the census report and
combined by 4s to give 10400 four figure numbers)
• Fisher & Yates Table
• Kendall & Babington Smith’s Tables
Problem 1
• Assume a finite population of 350. Using the
last 3 digits of the following 5 digit numbers
(601, 022….) determine the first 4 elements
that will be selected for the simple random
sample
98601 73022 83448 02147 34229
27553 84147 93289 14209
Ans
# 022 147 229 289
Simple Random Sampling…
Random Number Table

9 9 4 3 7 8 7 9 6 1 4 5 7 3 7 3 7 5 5 2 9 7 9 6 9 3 9 0 9 4 3 4 4 7 5 3 1 6 1 8
5 0 6 5 6 0 0 1 2 7 6 8 3 6 7 6 6 8 8 2 0 8 1 5 6 8 0 0 1 6 7 8 2 2 4 5 8 3 2 6
8 0 8 8 0 6 3 1 7 1 4 2 8 7 7 6 6 8 3 5 6 0 5 1 5 7 0 2 9 6 5 0 0 2 6 4 5 5 8 7
8 6 4 2 0 4 0 8 5 3 5 3 7 9 8 8 9 4 5 4 6 8 1 3 0 9 1 2 5 3 8 8 1 0 4 7 4 3 1 9
6 0 0 9 7 8 6 4 3 6 0 1 8 6 9 4 7 7 5 8 8 9 5 3 5 9 9 4 0 0 4 8 2 6 8 3 0 6 0 6
5 2 5 8 7 7 1 9 6 5 8 5 4 5 3 4 6 8 3 4 0 0 9 9 1 9 9 7 2 9 7 6 9 4 8 1 5 9 4 1
8 9 1 5 5 9 0 5 5 3 9 0 6 8 9 4 8 6 3 7 0 7 9 5 5 4 7 0 6 2 7 1 1 8 2 6 4 4 9 3
Probability Sample:
Systematic Sample
• Decide on sample size: n
• Divide frame of N individuals into groups of k
individuals: k=N/n
• Randomly select one individual from the 1st group
• Select every kth individual thereafter

N = 40 First Group
n=4
k = 10
Probability Sample:
Stratified Sample
• Divide population into two or more subgroups (called strata) according to
some common characteristic
• A simple random sample is selected from each subgroup, with sample sizes
proportional to strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying
across racial or socio-economic lines.

Population
Divided
into 4
strata
Probability Sample
Cluster Sample
• Population is divided into several “clusters,” each representative of the
population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a
cluster using another probability sampling technique
• A common application of cluster sampling involves election exit polls, where
certain election districts are selected and sampled.

Population
divided into
16 clusters. Randomly selected
clusters for sample
Multi Stage Sampling
• Sampling carried out in stages
• Material regarded as being made up of a number
of I stage sampling units, each of which is made
up of a number of II Stage units and so on
• Example – sample of 5000 households in
Karnataka
• I Stage – State – divided into District
• II Stage – Districts – divided into villages
• III Stage – Villages – divided into households
Judgement Sampling
• Choice of a sample depends exclusively on the
judgement of the investigator
• Quality of the sample depends exclusively on the
judgement of the person selecting the sample
Convenience Sampling
• Elements are included in the sample without pre
specified or known probability of being selected –
convenience of researcher
• A convenient chink or slice of the population is
taken
• Example
• From telephone directories
• Professor conducting research may use student
volunteers
Quota Sampling
• Quotas are set based on a given criteria, but
within the quotas, the sample is judgmental
• Example
• If out of 100 people to be interviewed, 60 are to be
housewives, 25 farmers, 15 children less than 15 years
Biased Sampling
• Picking a sample by choosing people who would
have very strong feelings on the issue
Snowball Sampling
• Survey subjects are selected based on referral
from other survey respondents
Probability Sample:
Comparing Sampling Methods
• Simple random sample and Systematic sample
• Simple to use
• May not be a good representation of the population’s
underlying characteristics
• Stratified sample
• Ensures representation of individuals across the entire
population
• Cluster sample
• More cost effective
• Less efficient (need larger sample to acquire the same level
of precision)
Evaluating Survey Worthiness
• What is the purpose of the survey?
• Is the survey based on a probability sample?
• Coverage error – appropriate frame?
• Nonresponse error – follow up
• Measurement error – good questions elicit good responses
• Sampling error – always exists
Sampling Vs. Non Sampling Error

• Sampling Error
• As sample results are based on partial or
incomplete analysis of the population features,
any statistical inference based on the sample may
not always be correct
• Non sampling error
• Incorrect enumeration of population
• Non random selection of samples
• Use of faulty questionnaire
• Wrong editing, coding or analysis
Types of Survey Errors
• Coverage error or selection bias
• Exists if some groups are excluded from the frame and have no
chance of being selected

• Non response error or bias


• People who do not respond may be different from those who do
respond

• Sampling error
• Variation from sample to sample will always exist

• Measurement error
• Due to weaknesses in question design, respondent error, and
interviewer’s effects on the respondent (“Hawthorne effect”)
Types of Survey Errors
(continued)

• Coverage error Excluded from


frame

• Non response error Follow up on


nonresponses

• Sampling error Random


differences from
sample to sample
• Measurement error Bad or leading
question
QUANTITATIVE METHODS
FOR MANAGEMENT
Chapter 8 : 258-289
RECAP
• Introduction – definition, types of statistics, levels of measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and correlation
• Regression analysis – simple, measures of variations ( SSE, SSR, SST,
coefficient of determination and coefficient of correlation)
• Basic probability – terms, definition, simple, joint, addition, multiplication
theorem, Baye’s theorem. Random variables, Expected value and variance.
• Theoretical distributions – binomial, Poisson and Normal distribution
• Types of sampling – errors in sampling
SAMPLING DISTRIBUTION
Sampling Distributions
• A sampling distribution is a distribution of all of the possible values of a
sample statistic for a given size sample selected from a population.

• For example, suppose you sample 50 students from your college regarding
their mean GPA. If you obtained many different samples of 50, you will
compute a different mean for each sample. We are interested in the
distribution of all potential mean GPA we might calculate for any given
sample of 50 students.
Developing a
Sampling Distribution

• Assume there is a population …


A C D
• Population size N=4 B

• Random variable, X,
is age of individuals
• Values of X: 18, 20,
22, 24 (years)
Developing a
Sampling Distribution
(continued)

Summary Measures for the Population Distribution:

µ=
∑ X i P(x)
N .3
18 + 20 + 22 + 24
= = 21 .2
4 .1
0
σ=
∑ (X − µ)
i
2

= 2.236
18
A
20
B
22
C
24
D
x
N
Uniform Distribution
Developing a
Sampling Distribution
(continued)
Now consider all possible samples of size n=2

16 Sample
1st 2nd Observation
Obs Means
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement) 24 21 22 23 24
Developing a
Sampling Distribution
(continued)
Sampling Distribution of All Sample Means

16 Sample Means Sample Means


1st 2nd Observation Distribution
Obs 18 20 22 24 _
P(X)
18 18 19 20 21 .3
20 19 20 21 22 .2
22 20 21 22 23 .1
24 21 22 23 24 0 _
18 19 20 21 22 23 24 X
(no longer uniform)
Developing a
Sampling Distribution
(continued)

Summary Measures of this Sampling Distribution:

µX =
∑ X
i 18 + 19 + 19 + L + 24
= = 21
N 16

σX =
∑ ( X i − µ X
) 2

(18 - 21)2 + (19 - 21)2 + L + (24 - 21)2


= = 1.58
16
Comparing the Population Distribution
to the Sample Means Distribution

Population Sample
n = 2 Means Distribution
N=4
µ = 21 σ = 2.236 µX = 21 σ X = 1.58
_
P(X) P(X)
.3 .3

.2 .2
.1 .1

0 18 20 22 24 X 0
18 19 20 21 22 23 24
_
A B C D X
Sampling Distribution…
• The function used – Mean or SD – is the Sample
Statistic
• The SD of the distribution of the sample statistic
is the Standard Error of the Statistic
• The expected value is regarded as the true value
and any deviation is regarded as error of
estimation due to sampling effects
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same
population will yield different sample means
• A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)

σ
σX =
n
• Note that the standard error of the mean decreases as the
sample size increases
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normally distributed with mean μ
and standard deviation σ, the sampling distribution
of X is also normally distributed with

σ
µX = µ and σX =
n
Z-value for Sampling Distribution
of the Mean
• Z-value for the sampling distribution of : X

(X − µX ) ( X − µ)
Z= =
σX σ
n
where: X = sample mean
µ = population mean
σ = population standard deviation
n = sample size
Sampling Distribution Properties

Normal Population


µx = µ Distribution

µ x
(i.e. xis unbiased ) Normal Sampling
Distribution
(has the same mean)

µx
x
Sampling Distribution Properties
(continued)

As n increases, Larger
sample size
σ xdecreases

Smaller
sample size

µ x
Sample Mean Sampling Distribution:
If the Population is not Normal

• We can apply the Central Limit Theorem:


• Even if the population is not normal,
• …sample means from the population will be
approximately normal as long as the sample size is large
enough.

Properties of the sampling distribution:

σ
µ x = µ and σx =
n
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough7
shape of
population

x
Sample Mean Sampling Distribution:
If the Population is not Normal
(continued)

Population Distribution
Sampling distribution
properties:
Central Tendency

µx = µ
µ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx = Larger
n Smaller
sample size
sample
size

µx x
How Large is Large Enough?
• For most distributions, n > 30 will give a sampling
distribution that is nearly normal
• For fairly symmetric distributions, n > 15 will
usually give a sampling distribution is almost
normal
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed
Example
• Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n
= 36 is selected.

• What is the probability that the sample mean is


between 7.8 and 8.2?
Example
(continued)

Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• … so the sampling distribution of x is approximately
normal
• … with mean µx = 8
• …and standard deviation
σ 3
σx = = = 0.5
n 36
Example
(continued)
Solution (continued):
 
 7.8 - 8 X -µ 8.2 - 8 
P(7.8 < X < 8.2) = P < < 
 3 σ 3 
 36 n 36 
= P(-0.4 < Z < 0.4) = 0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution .1554
??? +.1554
? ??
? ? Sample Standardize
? ? ?
?
-0.4 0.4
µ=8 7.8 8.2 µz = 0 Z
X µX = 8 x
Problem
• A bank calculates that its individual
savings accounts are normally distributed
with a mean of 2000 and SD 600. If a
random sample of 100 accounts are taken,
what is the probability that the sample
mean will lie between 1900 and 2050
SNV1900 = -1.67
SNV2050 = 0.83
P = .4525 + .2967 = .7492
Finite Population Multiplier
• When the population is finite, a finite
population multiplier is used

• √(N – n) / (N – 1)
• Find the Sampling Fraction n/N, if < .05, the
Finite multiplier is NOT to be used
• Also called Finite Correction Factor
Standard Error of Mean
• Infinite Population
σ
___
√n
• Finite Population
______
σ N-n
___ _______
√n √ N-1
Problem 4
• In a sample of 25 observations from a normal
distribution with mean 98.6 and SD 17.2, what
is p(92 < x < 102)
• σ = 17.2 µ = 98.6
n = 25
• σ / √ n = 3.44
• P(92< x <102)
= p(-1.92 < z < 0.99)
= .4726 + .3389
= .8115
Problem 5
• The auditor of a credit card company knows
that on an average, the daily balance of any
given customer is 112 and the SD 56. From 50
randomly selected accounts what is the
probability that the sample average daily
balance is
• < 100 ( .0643)
• Between 100 and 130 ( .9241)
Problem 6
• From a population of 125 items, with mean of
105 and SD 17, 64 items are chosen, what is
the standard error of the mean?
N = 125 n = 64
µ = 105 σ = 17
SE = 1.4896
Population Proportions
π = the proportion of the population having
some characteristic
• Sample proportion ( p ) provides an estimate
of π:
X number of items in the sample having the characteristic of interest
p= =
n sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution when
n is large
(assuming sampling with replacement from a finite population or without
replacement from an infinite population)
Sampling Distribution of p

• Approximated by a
normal distribution if: Sampling Distribution
P( ps)
.3

nπ ≥ 5 .2
.1
0

and
0 .2 .4 .6 8 1 p
n(1 − π ) ≥ 5
where
π(1− π )
µp = π and σp =
n
(where π = population proportion)
Z-Value for Proportions
Standardize p to a Z value with the formula:

p −π p −π
Z= =
σp π (1− π )
n
Example

• If the true proportion of voters who support


Proposition A is π = 0.4, what is the probability that
a sample of size 200 yields a sample proportion
between 0.40 and 0.45?

i.e.: if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?
Example
(continued)

• if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?

π (1− π ) 0.4(1− 0.4)


Find σ p : σ p = = = 0.03464
n 200

Convert to  0.40 − 0.40 0.45 − 0.40 


P(0.40 ≤ p ≤ 0.45) = P ≤Z≤ 
standardized  0.03464 0.03464 
normal:
= P(0 ≤ Z ≤ 1.44)
Example
(continued)

• if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?

Use standardized normal table: P(0 ≤ Z ≤ 1.44) = 0.4251

Standardized
Sampling Distribution Normal Distribution

0.4251

Standardize

0.40 0.45 0 1.44


p Z
Standard Error of Proportion
• Infinite Population
√ pq/n
• Finite Population
______
N-n
_______ √ pq/n
√ N-1
Problem 7
• The President of a company believes that 30%
of the firm’s orders come from first time
customers. A simple random sample of 100
orders is used to estimate the proportion of first
time users
• What is the standard error of proportion (.0458)
Problem 7…
• What is the probability that the sample
proportion will be
• Between .20 & .40
= p(.20<p<.40)
= p[(.2-.3/.0458) <z<(.4-.3/.0458)]
= p(-2.18<z<+2.18)
.4854 * 2 =.9708
• Between .25 & .35
= p(-1.09 < z < +1.09)
= .3621 * 2 = .7242
Chapter Summary
• Discussed probability and nonprobability samples
• Described four common probability samples
• Examined survey worthiness and types of survey errors
• Introduced sampling distributions
• Described the sampling distribution of the mean
• For normal populations
• Using the Central Limit Theorem
• Described the sampling distribution of a proportion
• Calculated probabilities using sampling distributions
Business Statistics:
A First Course
5th Edition

Chapter 8

Confidence interval estimation


Learning Objectives
In this chapter, you learn:
• To construct and interpret confidence interval estimates for
the mean and the proportion
• How to determine the sample size necessary to develop a
confidence interval for the mean or proportion
Chapter Outline

Content of this chapter


• Confidence Intervals for the Population Mean,
μ
• when Population Standard Deviation σ is Known
• when Population Standard Deviation σ is Unknown
• Confidence Intervals for the Population
Proportion, π
• Determining the Required Sample Size
Process of Statistical Inference

Population A simple random sample


with mean of n elements is selected
µ=? from the population.

The value of x is used to The sample data


make inferences about provide a value for
the value of µ. the sample mean x .
Types of Inference
1) Estimation: We estimate the value of a population
parameter.

2) Testing: We formulate a decision about a population


parameter.

3) Regression: We make predictions about the value of


a statistical variable.
• To evaluate the reliability of our inference, we need
to know about the probability distribution of the
statistic we are using.

• Typically, we are interested in the sampling


distributions for sample means and sample
proportions.
_ C
Sample
XC s c
_ D
Sample n
XD s d Population
n
µ σ _ B
Sample

n XB s b

SampleE SampleA
_
XE n XA s a
se
In reality, the sample mean is just one of many possible sample
means drawn from the population, and is rarely equal to µ.
Estimation
Point Estimation
In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.

We refer to x as the point estimator of the population


mean µ.

s is the point estimator of the population standard


deviation σ.

p is the point estimator of the population proportion p.


Terms, Statistics & Parameters
Introduction…
• Use of sample statistic to estimate population
parameter

Estimator Sample Statistic used to


estimate the population
parameter
Estimate Specific Observed value
of the Statistic
Estimator
An estimator of a population parameter is a
sample statistic used to estimate the parameter

Any systematic deviation of the estimator from


the population parameter of interest is called a
bias
Point and Interval Estimates

• A point estimate is a single number


• a confidence interval provides additional
information about the variability of the estimate

Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval
Point Estimates

We can estimate a with a Sample


Population Parameter 7 Statistic
(a Point Estimate)

Mean µ X
Proportion π p
Estimator
• The sample mean, is the most common estimator
of the population mean
• The sample variance, is the most common
estimator of the population variance
• The sample standard deviation, s, is the most
common estimator of the population standard
deviation
• The sample proportion, is the most common
estimator of the population proportion
Types
• Point Estimate – Single number used to estimate
single-valued estimate.
A single element chosen from a sampling
distribution.
Conveys little information about the actual value
of the population parameter, about the accuracy of
the estimate
Types
• Interval Estimate – Range of values
An interval or range of values believed to include
the unknown population parameter.
Associated with the interval is a measure of the
confidence we have that the interval does indeed
contain the parameter of interest.
Properties of a good estimator
• Property of unbiasedness
• Expected value of the estimator is equal to the parameter
being estimated.
• Property of efficiency
• Smallest variance
• Property of sufficiency
• Use as much information as possible from the sample
• Property of consistency
• Sample size increases, estimate tends to be parameter value
Point Estimate
• The sample mean is the best estimator of the
population mean
• The sample SD is the best estimator of the
population SD
• Sample proportion is the best estimator of the
population proportion
Problem 1
From the following data find the point estimates of
the population mean and the population SD

5 8 10 7 10 14
Problem 2
A survey question for a sample of 150 individuals
yielded 75 YES responses, 55 NO responses and
20 NO OPINIONS
What is the point estimate of the proportion in the
population who respond
(i) Yes (ii) No (iii) No Opinion
Problem 3
A bank wants to determine the number of tellers
during lunch rush on Fridays. Data on the
number of people who entered the bank
between 11 am and 1 pm on Friday over the
last 3 months is:
242 275 289 306
342 385 279 245
269 305 394 328
Find point estimates of mean & SD of population
from which the sample was drawn
Problem 4
In a sample of 400 textile workers, 184 expressed
extreme dissatisfaction regarding a prospective
plan to modify working conditions. Because
this dissatisfaction was strong enough to allow
management to interpret plan reaction as being
highly positive, they were curious about the
proportion of total workers harboring this
sentiment. Give a point estimate of this
proportion
Confidence Intervals

• How much uncertainty is associated with a point


estimate of a population parameter?

• An interval estimate provides more information


about a population characteristic than does a
point estimate

• Such interval estimates are called confidence


intervals
Confidence Interval Estimate
• An interval gives a range of values:
• Takes into consideration variation in sample
statistics from sample to sample
• Based on observations from 1 sample
• Gives information about closeness to unknown
population parameters
• Stated in terms of level of confidence
• e.g. 95% confident, 99% confident
• Can never be 100% confident
Confidence Interval Example

Cereal fill example


• Population has µ = 368 and σ = 15.
• If you take a sample of size n = 25 you know
• 368 ± 1.96 * 15 / 25= (362.12, 373.88) contains 95% of the
sample means
• When you don’t know µ, you use X to estimate µ
• If X = 362.3 the interval is 362.3 ± 1.96 * 15 / 25 = (356.42, 368.18)
• Since 356.42 ≤ µ ≤ 368.18, the interval based on this sample makes a correct
statement about µ.

But what about the intervals from other possible samples of


size 25?
Confidence Interval Example
(continued)
Lower Upper Contain
Sample # X
Limit Limit µ?
1 362.30 356.42 368.18 Yes

2 369.50 363.62 375.38 Yes

3 360.00 354.12 365.88 No

4 362.12 356.24 368.00 Yes

5 373.88 368.00 379.76 Yes


Confidence Interval Example
(continued)
• In practice you only take one sample of size n
• In practice you do not know µ so you do not know if the interval
actually contains µ
• However you do know that 95% of the intervals formed in this
manner will contain µ
• Thus, based on the one sample, you actually selected you can be 95%
confident your interval will contain µ (this is a 95% confidence
interval)

Note: 95% confidence is based on the fact that we used Z = 1.96.


Estimation Process

Random Sample I am 95%


confident that
µ is between
Population Mean 40 & 60.
(mean, µ, is X = 50
unknown)

Sample
General Formula
• The general formula for all confidence
intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Where:
• Point Estimate is the sample statistic estimating the population
parameter of interest

• Critical Value is a table value based on the sampling


distribution of the point estimate and the desired confidence
level

• Standard Error is the standard deviation of the point estimate


Confidence Level

• Confidence Level
• The confidence that the interval will
contain the unknown population
parameter
• A percentage (less than 100%)
Confidence Level, (1-α)
(continued)
• Suppose confidence level = 95%
• Also written (1 - α) = 0.95, (so α = 0.05)
• A relative frequency interpretation:
• 95% of all the confidence intervals that can be
constructed will contain the unknown true parameter
• A specific interval either will contain or will not
contain the true parameter
• No probability involved in a specific interval
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Interval for μ
(σ Known)
• Assumptions
• Population standard deviation σ is known
• Population is normally distributed
• If population is not normal, use large sample

• Confidence interval estimate:

σ
X ± Z α/2
n
where X is the point estimate
Zα/2 is the normal distribution critical value for a probability of α/2 in each tail
is the standard error
σ/ n
Finding the Critical Value, Zα/2
Z α /2 = ± 1.96
• Consider a 95% confidence interval:
1 − α = 0.95 so α = 0.05

α α
= 0.025 = 0.025
2 2

Z units: Zα/2 = -1.96 0 Zα/2 = 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
Common Levels of Confidence
• Commonly used confidence levels are 90%, 95%,
and 99%

Confidence
Confidence
Coefficient, Zα/2 value
Level
1− α
80% 0.80 1.28
90% 0.90 1.645
95% 0.95 1.96
98% 0.98 2.33
99% 0.99 2.58
99.8% 0.998 3.08
99.9% 0.999 3.27
Intervals and Level of Confidence
Sampling Distribution of the Mean

α/2 1− α α/2
x
Intervals µx = µ
extend from x1
σ x2 (1-α)x100%
X − Zα / 2 of intervals
n
to constructed
σ contain µ;
X + Zα / 2
n (α)x100% do
not.
Confidence Intervals
Example

• A sample of 11 circuits from a large normal


population has a mean resistance of 2.20 ohms.
We know from past testing that the population
standard deviation is 0.35 ohms.

• Determine a 95% confidence interval for the


true mean resistance of the population.
Example
(continued)
• A sample of 11 circuits from a large normal
population has a mean resistance of 2.20 ohms.
We know from past testing that the population
standard deviation is 0.35 ohms.

• Solution: σ
X ± Z α/2
n
= 2.20 ± 1.96 (0.35/ 11)
= 2.20 ± 0.2068
1.9932 ≤ µ ≤ 2.4068
Interpretation

• We are 95% confident that the true mean


resistance is between 1.9932 and 2.4068
ohms
• Although the true mean may or may not be in
this interval, 95% of intervals formed in this
manner will contain the true mean
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Do You Ever Truly Know σ?
• Probably not!

• In virtually all real world business situations, σ is not known.

• If there is a situation where σ is known then µ is also known (since to calculate σ


you need to know µ.)

• If you truly know µ there would be no need to gather a sample to estimate it.
Confidence Interval for μ
(σ Unknown)

• If the population standard deviation σ is


unknown, we can substitute the sample
standard deviation, S
• This introduces extra uncertainty, since S is
variable from sample to sample
• So we use the t distribution instead of the
normal distribution
Confidence Interval for μ
(σ Unknown)
(continued)
• Assumptions
• Population standard deviation is unknown
• Population is normally distributed
• If population is not normal, use large sample
• Use Student’s t Distribution
• Confidence Interval Estimate:
S
X ± tα / 2
n
(where tα/2 is the critical value of the t distribution with n -1 degrees of
freedom and an area of α/2 in each tail)
Student’s t Distribution
• The t is a family of distributions
• The tα/2 value depends on degrees of freedom
(d.f.)
• Number of observations that are free to vary after sample
mean has been calculated

d.f. = n - 1
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0

Let X1 = 7
If the mean of these three
Let X2 = 8
What is X3?
values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Student’s t Distribution
Note: t Z as n increases

Standard
Normal
(t with df = ∞)

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
Student’s t Table
Upper Tail Area
Let: n = 3
df .25 .10 .05 df = n - 1 = 2
α = 0.10
1 1.000 3.078 6.314 α/2 = 0.05

2 0.817 1.886 2.920


3 0.765 1.638 2.353 α/2 = 0.05

The body of the table


contains t values, not 0 2.920 t
probabilities
Selected t distribution values
With comparison to the Z value

Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) (∞ d.f.)

0.80 1.372 1.325 1.310 1.28


0.90 1.812 1.725 1.697 1.645
0.95 2.228 2.086 2.042 1.96
0.99 3.169 2.845 2.750 2.58

Note: t Z as n increases
Example of t distribution confidence interval
A random sample of n = 25 has X = 50 and
S = 8. Form a 95% confidence interval for μ

• d.f. = n – 1 = 24, so t α/2 = t 0.025 = 2.0639

The confidence interval is


S 8
X ± t α/2 = 50 ± (2.0639)
n 25

46.698 ≤ µ ≤ 53.302
Problem
From a population with SD 1.65, a sample of 32
items resulted in 34.8 as an estimate of the
mean. Find the SE of the mean. Compute an
interval estimate that should include the
population mean 99.7% of the time
σ = .292
Interval Estimates 34.8 ± .867
33.93 - 35.67
Problem
Estimate the mean life of windshield wiper
blades under typical driving conditions for CL
of 95%
SD 6 months
Sample size = 100
Mean = 21 months

(19.824 - 22.176 months)


Problem
Estimate the mean annual earnings of 700 families
at 90% CL n = 50 x’ = 11800 s = 950

σx’ = 129.57 (FPM) x’ ± 1.64 σx’


11587.5 - 12012.5
Problem
Nielsen Media Research reports that household
mean TV viewing time during 8 pm to 11 pm is
7.75 hrs per week. Assuming a sample size of
180 households and a sample SD of 3.45 hrs,
what is the 95% Confidence Interval estimate of
the mean TV viewing time per week during the
8pm to 11 pm time period? (7.25 to 8.25)
Problem
Average tyre pressure in a sample of 62 tyres was
24ppsi and SD 2.1 ppsi.

What is the S Error of the Mean (.267 ppsi)

Estimate a 95% Confidence Interval


(23.48 to 24.52)
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Intervals for the
Population Proportion, π

• An interval estimate for the population


proportion ( π ) can be calculated by
adding an allowance for uncertainty to the
sample proportion ( p )
Confidence Intervals for the
Population Proportion, π
(continued)
• Recall that the distribution of the sample
proportion is approximately normal if the sample
size is large, with standard deviation

π (1− π )
σp =
n
• We will estimate this with sample data:

p(1− p)
n
Confidence Interval Endpoints
• Upper and lower confidence limits for the population
proportion are calculated with the formula

p(1 − p)
p ± Z α/2
n
• where
• Zα/2 is the standard normal value for the level of confidence desired
• p is the sample proportion
• n is the sample size
• Note: must have np > 5 and n(1-p) > 5
Example

• A random sample of 100 people shows


that 25 are left-handed.
• Form a 95% confidence interval for the
true proportion of left-handers
Example
(continued)
• A random sample of 100 people shows that 25
are left-handed. Form a 95% confidence
interval for the true proportion of left-handers.

p ± Z α/2 p(1 − p)/n


= 25/100 ± 1.96 0.25(0.75)/100
= 0.25 ± 1.96 (0.0433)
0.1651 ≤ π ≤ 0.3349
Interpretation

• We are 95% confident that the true percentage


of left-handers in the population is between
16.51% and 33.49%.

• Although the interval from 0.1651 to 0.3349


may or may not contain the true proportion,
95% of intervals formed from samples of size
100 in this manner will contain the true
proportion.
Determining Sample Size

Determining
Sample Size

For the For the


Mean Proportion
Sampling Error
• The required sample size can be found to reach a
desired margin of error (e) with a specified level of
confidence (1 - α)

• The margin of error is also called sampling error


• the amount of imprecision in the estimate of the
population parameter
• the amount added and subtracted to the point estimate
to form the confidence interval
Determining Sample Size
Determining
Sample Size

For the
Mean Sampling error
(margin of error)
σ σ
X ± Zα / 2 e = Zα / 2
n n
Determining Sample Size
(continued)

Determining
Sample Size

For the
Mean

σ 2
Zα / 2 σ 2
e = Zα / 2 Now solve
n=
for n to get 2
n e
Determining Sample Size
(continued)

• To determine the required sample size for the


mean, you must know:

• The desired level of confidence (1 - α), which


determines the critical value, Zα/2
• The acceptable sampling error, e
• The standard deviation, σ
Required Sample Size Example

If σ = 45, what sample size is needed to


estimate the mean within ± 5 with 90%
confidence?

2 2 2 2
Z σ (1.645) (45)
n= 2
= 2
= 219.19
e 5

So the required sample size is n = 220


(Always round up)
If σ is unknown
• If unknown, σ can be estimated when
using the required sample size formula
• Use a value for σ that is expected to be at
least as large as the true σ

• Select a pilot sample and estimate σ with the


sample standard deviation, S
Determining Sample Size
(continued)

Determining
Sample Size

For the
Proportion

π (1− π ) Now solve Z 2 π (1− π )


e=Z for n to get n= 2
n e
Determining Sample Size
(continued)

• To determine the required sample size for the proportion,


you must know:

• The desired level of confidence (1 - α), which determines the


critical value, Zα/2
• The acceptable sampling error, e
• The true proportion of events of interest, π
• π can be estimated with a pilot sample if necessary (or
conservatively use 0.5 as an estimate of π)
Required Sample Size Example

How large a sample would be necessary to


estimate the true proportion defective in a
large population within ±3%, with 95%
confidence?
(Assume a pilot sample yields p = 0.12)
Required Sample Size Example
(continued)

Solution:
For 95% confidence, use Zα/2 = 1.96
e = 0.03
p = 0.12, so use this to estimate π

Z α/2 2 π (1 − π ) (1.96) 2 (0.12)(1 − 0.12)


n= = = 450.74
e2 (0.03) 2
So use n = 451
Ethical Issues
• A confidence interval estimate (reflecting sampling error) should
always be included when reporting a point estimate
• The level of confidence should always be reported
• The sample size should be reported
• An interpretation of the confidence interval estimate should also be
provided
Chapter Summary
• Introduced the concept of confidence intervals
• Discussed point estimates
• Developed confidence interval estimates
• Created confidence interval estimates for the mean (σ
known)
• Determined confidence interval estimates for the mean (σ
unknown)
• Created confidence interval estimates for the proportion
• Determined required sample size for mean and
proportion settings
• Addressed confidence interval estimation and ethical
issues
Determining An Interval Including A Fixed Proportion of the
Sample Means
Find a symmetrically distributed interval around µ that
will include 95% of the sample means when µ = 368, σ
= 15, and n = 25.

• Since the interval contains 95% of the sample means 5%


of the sample means will be outside the interval
• Since the interval is symmetric 2.5% will be above the
upper limit and 2.5% will be below the lower limit.
• From the standardized normal table, the Z score with
2.5% (0.0250) below it is -1.96 and the Z score with 2.5%
(0.0250) above it is 1.96.
Determining An Interval Including A Fixed Proportion of the
Sample Means (continued)

• Calculating the lower limit of the interval


σ 15
XL = µ +Z = 368 + (−1.96) = 362.12
n 25
• Calculating the upper limit of the interval
σ 15
XU = µ + Z = 368 + (1.96) = 373.88
n 25
• 95% of all sample means of sample size 25 are between 362.12 and
373.88
QUANTITATIVE METHODS
FOR MANAGEMENT
Session -10
Chapter :
RECAP
• Introduction – definition, types of statistics, levels of measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and correlation
• Probability – concepts, laws and Baye’s theorem
• Random variable- discrete and continuous
• Theoretical distributions – Binomial, Poisson and Normal
distributions
• Sampling techniques , sampling distribution, central limit theorem
and estimation theory- types ( point and interval estimate)
Learning objectives
• Hypothesis testing
• Errors in TOH
• Level of significance
• Types – parametric and non parametric
• Steps involved in TOH
• Chi square
• Independence
• Goodness of fit (uniform)
TESTING
OF
HYPOTHESIS
Hypothesis
• An assumption to be tested
• If the sample statistic differs from the population
parameter, a decision must be made as to
whether or not this difference is significant
• If it is, the hypothesis is rejected. If not, it is
accepted
Set up a Hypothesis
• H0:Null Hypothesis
• No significant difference between the sample statistic
and the population parameter
• Any difference found is accidental, arising out of
sampling fluctuations
• H1: Alternate Hypothesis
• A hypothesis that is different from the null hypothesis
• If sample info leads us to reject H0, then accept H1
In hypothesis testing, we must stated the assumed or
hypothesized value of the population before we begin
sampling. This assumption is called the null hypothesis.

The Null hypothesis (Ho) usually assumes there is no


difference between the observed and believed values.

If our sample results fail to support the null hypothesis, then


the conclusion that we do accept is called the alternative
hypothesis, H1.
Set up a Hypothesis…
• If the difference is due to chance
• Accept H0
• If the difference has statistical significance
• Reject H0
One-tailed Test
Is a significance test in which the null hypothesis can
be upset by values well above or below the mean but
not both.
Ho : µ < µo Ho : µ > µo

Two-tailed test
Is a significance test in which it will reject the null
hypothesis if the sample mean is significantly higher
or lower then hypothesized population mean.(i.e.
there are two rejection region)
Ho : µ # µo
Terminologies
Significance level
Complementary concepts to confidence limits.
Probability of committing a TYPE 1 error, naming
rejecting the null hypothesis when in reality it is true.
There is no single standard or universal level of
significance for testing hypothesis.
The higher the significance level, the higher the
probability of rejecting a null hypothesis when it is
true.
Set up a Significance level
• The confidence with which an experimenter
rejects or retains H0 depends on the level of
significance involved
• α = 5%
• 5% chance that H0 is rejected when it should be
accepted
• 95% confident that we have made the right decision
• Willing to accept a 5% chance of being wrong to reject
H0
Suitable Test Statistic

Sample Statistic – Hypothesized Parameter


-----------------------------------------------------
Standard Error of Statistic
• Calculation use appropriate formula and get the calculated value

• Inference - p value approach


classical approach
P-value
• The p-value is a measure of the likelihood of the
sample results when the null hypothesis is
assumed to be true
• The smaller the p-value, the less likely it is that
the sample results came from a situation where
the null hypothesis is true
Errors

D C O N D I T I O N
E
H0 : True H0 : False
C
I
Accept H0 Correct Decision TYPE 2 Error
S Confidence Level (β)
I (1 - α)
O Reject H0 TYPE 1 Error Correct Decision
N (α) Power of Test
(1 - β)
Type 1 error, α
Is the error of rejecting a null hypothesis when it is
true.

Type 11error, β
Is the error of accepting a null hypothesis when it is
actually false.

In order for any tests of hypothesis or rules of


decision to be good, they must be designed so
as to minimise errors of decision. The only way
to reduce both types of errors is to increase the
sample size.
Steps involved in TOH
Step 1: set up the null and alternative hypothesis ( one tail
v/s two tail)

Step 2 : define the level of significance

Step 3 : Test statistic

Step 4 : calculation

Step 5 : inference
• Classical approach : if table value > calculated value – accept Ho
• P value approach : if p> α accept Ho.
FLOW CHART FOR HYPOTHESIS TESTING

State H0 as well as H1

Specify the level of significance α

Decide the correct sampling distribution

Sample a random sample(s) and workout an appropriate value from sample


data

Calculate the probability that sample result would diverge as widely as it


has from expectations, if H0 were true

Is this probability equal to or smaller than α value in case of one-tailed test


and α/2 in case of two- tailed test
YES NO
Reject H0 Accept H0
• Parametric • Non parametric
• on the assumption, or • on occasions, the data are
presence of the normal not normal, or contain
distribution extreme values or not
• concerned with the enough is known to be able
parameters of the distribution to make any assumption
e.g. mean, proportion. about the type of
distribution/ distribution free
Advantages of non-parametric tests
No assumptions need to be made about the
underlying distribution
They can be used on data ranked in some order.
Mathematic concepts are simpler than for
parametric tests
Disadvantages of non-parametric tests
They are less discriminating than parametric tests.
I.e. they are more prone to error and less powerful
Although simple, the arithmetic may take a long
time
Chi-square (χ2) Distribution
used when it is wished to compare an actual,
observed distribution with a hypothesized, or
expected distribution.

Often referred to as a ‘goodness of fit’ test & test


for independence

χ2 = (O − E )2

E
where O = the observed frequency of any value
E = the expected frequency of any value
The obtained value from the formula is
compared with the value from χ2 table for a
given significance level and the number of
degrees of freedom.

Degrees of freedom = (Rows-1)(Columns –1)

If χ2 calculated is > χ2 from table, the null


hypothesis is rejected.
Use broadly for
Test of goodness of fit (for one way classification or for
one variable only)
Can also be used to determine how well empirical
distributions I.e. those obtained from sample data fit
theoretical distributions such as the Normal, Poisson
and Binomial
Test of independence (for more than one row or
column in the form of a contingency table covering
several attributes.)
Note that :
When calculating, the expected cell values, the
expected frequency is less than 5, the χ2 test
becomes inaccurate. In such circumstances the cell
which is less than 5 is merged with an adjoining cell
so that the expected frequencies in all resulting cells
are at least 5.
Degrees of freedom: the number of degrees of freedom is equal to
the number of independent constraints.
If there are 10 frequency classes and there is one independent
constraints, then there are (10-1)=9 degrees of freedom. Thus if ‘n’
is the number of groups and one constraints is placed by making the
totals of observed and expected frequencies equal, the degree of
freedom would be equal to (n-1).
If the case of r×c contingency, the degree of freedom is worked
out as
d.f.=(r-1)(c-1)
c: No. of columns
r: No. of rows.
Conditions for the application of χ2-Test:
The following conditions should be satisfied before χ2 –test can
be applied:
(i) Observations recorded and used are collected on a random
basis,
(ii) All the items in the sample must be independent.
Chi-square Goodness-of-fit Test

Does sample data conform to a hypothesized distribution?


Examples:
Are technical support calls equal across all days of
the week? (i.e., do calls follow a uniform
distribution?)
Do measurements from a production process follow
a normal distribution?
Quick Review Question

Example:
Are technical support calls equal across all days of the week? (i.e., do calls follow a
uniform distribution?)
Sample data for 10 days per day of week:

Sum of calls for this day:


Monday 290
Tuesday 250
Wednesday 238
Thursday 257
Friday 265
Saturday 230
Sunday 192
Σ = 1722
Logic of Goodness-of-Fit Test
If calls are uniformly distributed, the 1722 calls would be expected
to be equally divided across the 7 days:
1722
= 246 expected calls per day if uniform
7
Chi-Square Goodness-of-Fit Test: test to see if the sample results
are consistent with the expected results
Observed & Expected Frequencies
Observed Expected
oi ei
Monday 290 246
Tuesday 250 246
Wednesday 238 246
Thursday 257 246
Friday 265 246
Saturday 230 246
Sunday 192 246

TOTAL 1722 1722


Chi-Square Test Statistic
H0: The distribution of calls is uniform over days of the week
HA: The distribution of calls is not uniform

The test statistic is


(oi − ei )2
χ =∑
2
(where df = k − 1)
ei
where:
k = number of categories
oi = observed cell frequency for category i
ei = expected cell frequency for category i
The Rejection Region
H0: The distribution of calls is uniform over days of the week
HA: The distribution of calls is not uniform

( o − e ) 2
χ2 = ∑ i i
ei
• Reject H0 if χ >χ
2 2
α
α
(with k – 1 degrees of
freedom) 0 χ2
Do not Reject H0
reject H0 χ2α
Chi-Square Test Statistic
Contingency Tables
Situations involving multiple population
proportions
Used to classify sample observations according
to two or more characteristics
Also called a cross-tabulation table.
Example:
The following data concerning industrial accidents and absentees
classified according to the types of employee.

Is there any evidence to suggest that the severity of accident is


associated with type of employee ?
Logic of the test
H0: Severity of accident is independent of type of employees
HA: Severity of accident is not independent of type of employees

If H0 is true, then the proportion of severity of accidents


should be the same as the proportion of type of
employees
Observed vs. expected Frequencies
The Chi-square contingency test statistic is:

χ =∑
2 (O − E )2
with d.f . = (r − 1)(c − 1)
E
where:
O = observed frequency
E = expected frequency
r = number of rows
c = number of columns
Contingency Analysis
Example:
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female

H0: Hand preference is independent of gender


HA: Hand preference is not independent of gender
Example:
Logic of the test

H0: Hand preference is independent of gender


HA: Hand preference is not independent of gender

If H0 is true, then the proportion of left-handed females


should be the same as the proportion of left-handed
males
The two proportions above should be the same as the
proportion of left-handed people overall
Observed vs. expected Frequencies
Contingency Analysis
Example 1- The following table gives the number of aircraft accidents that
occur during the various days of the week. Find whether the accidents are
uniformly distributed over the week.
Days: Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
No. of accidents: 14 16 8 12 11 9 14

Given table value of Chi-square at 5% level of significance for 6 d.f. is


12.59 (χ20.05=12.59).
Solution: Here we set the null hypothesis, the expected frequencies of the
accidents on each of the days would be:
Days: Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
No. of accidents: 12 12 12 12 12 12 12
Thus we have
Observed Expected (O-E) (O-E)2 (O-E)2/E No. of d.f
frequency frequency =7-1=6
O E
14 12 2 4 4/12 Table value
16 12 4 16 16/12 χ26,0.05=12.59
8 12 -4 16 16/12 χ2cal.=4.17
12 12 0 0 0 We see that
11 12 -1 1 1/12 χ2cal.< χ2tab
9 12 -3 9 9/12 ⇒Ho is
14 12 2 4 4/12 accepted.
Total 56/12=4.17

Example 2-Two research worker classified some people in income


groups on the basis of sampling studies. Their results are as follows:

Investigator Income Groups Total


Poor Middle Rich
A 160 30 10 200
B 140 120 40 300
Total 300 150 50 500
Show that the sampling technique of at least one research worker is
defective.
Solution: Let us take the hypothesis that the sampling techniques adopted
by research workers are similar (i.e. there is no difference between
the techniques adopted by research workers). This being so, the
expectation of A investigator classifying the people in
Poor income group= 200×300
500
= 120
(ii) Middle income group= 200×150
500
= 60
(iii) Rich income group= 200×50
500
= 20
Similarly the expectation of B investigator classifying the people in
(i) Poor income group= 300×300
500
= 180
(ii) Middle income group = 300×150
500
= 90

(iii) Rich income group= 300×50


500
= 30
We can now calculate value of χ2 as follows:
Groups Observed Expected (O-E) (O-E)2/E
frequency (O) frequency (E)
Investigator A
Classified 160 120 40 1600/120=13.
people as poor 33
Classified 30 60 -30 900/60=15
people as
middle
Classified 10 20 -10 100/20=5
people as rich
Investigator B
Classified 140 180 -40 1600/180=8.8
people as poor 8
Classified 120 90 30 900/90=10
people as
middle
Classified 40 30 10 100/30=3.33
people as rich
Total 55.54
Think: Suppose we find that a person’s gender affects his or
her attitude toward abortion. What are the two variables
involved in this explanatory finding? Which variable is the
independent variable? Which is the independent variable?
Which is the dependent variable?
A WORD OF CAUTION
The fact that two variables “go together” does not mean that
change in one variable causes changes in another variable.
A social researcher in his study shows that violent crime
rates (a dependent variable) are lower in metropolitan areas
where people tend to watch violent TV programs than in
areas where they don’t. Does this mean that watching
violent TV programs “causes” less violent crime? Probably
not.
Example: Two sample polls of votes for two candidates A and B for
a public office are taken, one from among the residents of rural areas.
The results are given in the table. Examine whether the nature of the
area is related to voting preference in this election.
Votes A B Total

for
Area

Rural 620 380 1000


a b a+b
Urban 550 450 1000
c d c+d
Total 1170 830 2000
a+c b+d
Under the null hypothesis that the nature of the area is independent of the
voting preference in the election, we get the expected frequencies as follows:
1170×1000
E(620)= =585,
2000
830×1000
E(380)= =415,
2000
1170×1000
E(550)= =585,
2000
830×1000
E(450)= =415,
2000
χ = ∑ (O − E ) E =
2 2 (620 − 585)
2

+
(380 − 415)
2

+
(550 − 585)
2

+
(450 − 415)
2

585 415 585 415


=
(35) + (− 35) + (− 35) + (35) = (35)2  1 + 1 + 1 + 1 
2 2 2 2

 
585 415 585 415  585 415 585 415 
= (1225)(0.002409 + 0.001709 + 0.002409 + 0.001709)
i.e. χ2cal =10.0891
Tabulated χ2 for (2-1)(2-1)=1 d.f. at 5% level of significance is 3.841 i.e.
χ2tab=3.841.
Here we see that χ2cal>χ2tab (10.0891>3.841)⇒Ho is rejected i.e. it is highly
significant at 5% level of significance. Thus we conclude that nature of area is
related to voting preference in the election.
Alternative procedure: To calculate the value χ2, we can use the following
formula:
Total
a b a+b
c d c+d
Total a+c b+d N = a+b+c+d
N = 620+380+550+450=2000
N (ad − bc ) 2000(620 × 450 − 380 × 550 )
2 2

χ = 2
= = 10.09165
(a + b )(a + c )(b + d )(c + d ) 1000 ×1170 × 830 × 1000
What is a Hypothesis
Of a test?
• A hypothesis is an I assume the mean AGE
of this class is 50!!!
assumption about the
population parameter. Am I correct? TEST IT!

• A parameter is a
characteristic of the
population, like its
mean or variance.
• The parameter must
be identified before
analysis.

© 1984-1994 T/Maker Co.


The Null Hypothesis, H0
• States the Assumption (numerical) to be
tested
e.g. Our class mean age is 50 (H0: µ=50)
• Begin with the assumption that the null
hypothesis is TRUE.
(Similar to the notion of innocent until proven
guilty)

The Null Hypothesis may or may not be


rejected,but our aim is to REJECT the null
hypothesis!
The Alternative Hypothesis, H1

• Is the opposite of the null hypothesis


e.g. The average age of our class is
different from 50 (H1: µ ≠50)

• Is generally the hypothesis that is


believed to be true by the researcher!
Identify the Problem

• Steps:
• State the Null Hypothesis
• State its opposite, the Alternative Hypothesis
• Hypotheses are mutually exclusive &
exhaustive
• Sometimes it is easier to form the
alternative hypothesis first.
Hypothesis Testing Process

Assume the
population
mean age is 50.
(Null Hypothesis) Population

The Sample
IsX =20 ≅ µ =50? Mean Is 20
No, not likely!

REJECT
Sample
Null Hypothesis
Reason for Rejecting H0

Sampling Distribution
Our sample
mean (20) we reject the
falls in the null hypothesis
tails!It’s H0 that µ = 50.
not likely!
Hypotyzed
population mean.

20 µ = 50 Sample Mean

Observed population mean


Level of Significance, α

• Defines the Rejection region

• Typical value of a is 0.05. It Provides the


Critical Value(s) of the Test

Critical
Rejection Value
Regions
α “Area” of the
Rejection region

0
Level of Significance, α and
the Rejection Region
One tail (left) test
α
H0: µ = 0 Critical
H1: µ < 0 Value(s)
0
Rejection
Regions One tail (right) test
H0: µ = 0
α
H1: µ > 0
0
H0: µ = 0
Two tails test
H1: µ ≠ 0 α/2

0
Errors in Making Decisions
• Type I Error
• Reject Null Hypothesis when it is True (“False
Positive”)
• Has Serious Consequences
• Probability of Type I Error Is α
Called Level of Significance

• Type II Error
• Do Not Reject Null Hypothesis when it is
False (“False Negative”)
• Probability of Type II Error Is β (
Power 1- β )
α &β Have an Inverse
Relationship
Reduce probability of
one error and the
other one goes up.

One possibility: Increase the sample


size!!!!
What is the p Value and how to use it
in a Test?

• The p-value is the Probability of Obtaining a Test Statistic (under


H0) more Extreme (≤ or ≥) than the observed Sample Value

Observed One tail test


Sample p
Value

0
• Used to Make Rejection Decision

• If p value < α <=> Reject H0 <=> SUCCESS

• If p value ≥ α <=> Do Not Reject H0 <=> FAILURE


DECISION THEORY
Introduction
• Statistical Decision Theory provides an analytical
and systematic approach to the study of decision
making
• Use of statistical techniques to solve problems for
which information is incomplete, uncertain or
completely lacking
• Data concerning occurrence of different outcomes
are evaluated to enable the decision maker to
identify suitable decision alternatives or courses
of action
• Decide among alternatives by taking into account
the monetary repercussions of actions (payoff)
Decision theory
• Provides a formal analytic framework for decision making under conditions of
uncertainty

• Also called decision analysis

• Used to determine optimal strategies where a decision maker faced with several
decision alternatives and an uncertain, or risky pattern of future events.

• Decision – Definition

Defined as the selection by the decision maker of an act,


considered to be best according to some predesigned standard, from among the
available options
Decision making process

Choosing from
alternatives
Determination of
payoff

Identification of all
courses of action
(Strategies)

Identification of various possible


outcomes (States of nature or
events Ei)
Introduction…
• Objective is to maximize gains or minimize
losses
• Several courses of action – choice among
alternatives
• Calculate the measure of benefit of various
alternatives
• Events beyond the decision maker’s control
(acts of God)
• Uncertainty concerning which outcome will
actually happen
Types of Decision Making Environments
• Decision Making under Certainty – perfect
knowledge – only one possible future state of
nature exists
• Decision Making under Risk – less than
complete knowledge – certainty of
consequence of every decision choice – with
associated probabilities
• Decision Making under Uncertainty – unable
to specify probabilities
Problem Formulation

The first step in the decision analysis process is


problem formulation.
We begin with a verbal statement of the problem.
Then we identify:
• the decision alternatives
• the states of nature (uncertain future events)
• the payoffs (consequences) associated with each
specific combination of:
• decision alternative
• state of nature
Problem Formulation
A decision problem is characterized by decision
alternatives, states of nature, and resulting payoffs.
The decision alternatives are the different possible
strategies the decision maker can employ.
The states of nature refer to future events, not
under the control of the decision maker, which
may occur.
States of nature should be defined so that they are
mutually exclusive and collectively exhaustive.
Payoff Tables
The consequence resulting from a specific
combination of a decision alternative and a state of
nature is a payoff.

A table showing payoffs for all combinations of


decision alternatives and states of nature is a payoff
table.

Payoffs can be expressed in terms of profit, cost, time,


distance or any other appropriate measure.
Preparing a Payoff table and opportunity loss
table
• A flower merchant purchases roses at Rs. 10 per dozen and sells them
for Rs. 30. Unsold flowers are donated to a temple. Prepare a payoff
table and opportunity loss table. Consider the event and strategy in
multiples of 5.
Solution - Payoff
States of nature
E1 (5) E2 (10) E3 (15) E4 (20)

Strategies S1 100 100 100 100


(5)

S2 50 200 200 200


(10)
0 150 300 300
S3
(15)
-50 100 250 400
S4
(20)
Solution – opportunity loss
States of nature
E1 (5) E2 (10) E3 (15) E4 (20)

Strategies S1 0 100 200 300


(5)

S2 50 0 100 200
(10)
100 50 0 100
S3
(15)
150 100 50 0
S4
(20)
Decision Making Under Certainty
Manager knows which event will occur
pick the alternative with the best payoff
Possible Future Demand
Alternative Low High
Small facility 200 270
Large facility 160 800
Do nothing 0 0

What is the best choice if future demand will be low?


Decision Making Under Uncertainty

Decision making without probability


(no probability of occurrence are assigned)

Decision making with probabilities


(probabilities can be assigned)
Decision Making with Probabilities

Expected Value Approach


If probabilistic information regarding the states of
nature is available, one may use the expected value
(EV) approach.
Expected value is computed by multiplying each
decision outcome under each state of nature by the
probability of its occurrence
The decision yielding the best expected return is
chosen.
Decision Making with Probabilities
Once we have defined the decision alternatives and
states of nature for the chance events, we focus on
determining probabilities for the states of nature.
The classical method, relative frequency method, or
subjective method of assigning probabilities may be
used.
Because one and only one of the N states of nature can
occur, the probabilities must satisfy two conditions:

P(sj) > 0 for all states of nature

∑ P(s ) = P(s ) + P(s ) + L + P(s


j =1
j 1 2 N )=1
Expected Value Approach
The expected value of a decision alternative is the
sum of weighted payoffs for the decision alternative.
The expected value (EV) of decision alternative di is
defined as
N
EV( d i ) = ∑ P( s j )Vij
j =1

where: N = the number of states of nature


P(sj ) = the probability of state of nature sj
Vij = the payoff corresponding to decision
alternative di and state of nature sj
Problem
• Warren Bubby, a wealthy investor is offered three major investments viz.,
conservative, speculative and countercyclical. The profits under these
scenarios (i) Improving economy (ii) Stable economy (iii) Worsening
economy are given in the following payoff matrix (in dollars)

Investment pattern Improving economy Stable economy Worsening economy

Conservative
30 m 5m -10 m
Speculative
40 m 10 m -30 m
Counter cyclical
-10 m 0 15 m
• If the prior probabilities for improving economy, stable economy and worsening
economy are 0.1, 0.5 and 0.4, which investment would Warren consider?
Solution
E(conservative) = 0.1*30+0.5*5-0.4*10 =
E(Speculative) = 40*0.1+0.5*10 -30*0.4 =
E(counter cylic) = -10*0.1+ 0*0.5+ 10*0.4 =
Decision Trees

A decision tree is a chronological


representation of the decision problem. It is a
graphical device that forces the decision-
maker to examine all possible outcomes,
including unfavorable ones.

makes easier the computation of the expected


values

easy to understand the process of making


decision
Each decision tree has two types of nodes; round
nodes correspond to the states of nature while square
nodes correspond to the decision alternatives.

The branches leaving each round node represent the


different states of nature while the branches leaving
each square node represent the different decision
alternatives.

At the end of each limb of a tree are the payoffs


attained from the series of branches making up that
limb.
How to draw a decision Tree ?
Expected Value Approach
Example: Burger Prince
Burger Prince Restaurant is considering opening a new
restaurant on Main Street. It has three different
restaurant layout models (A, B, and C), each with a
different seating capacity.

Burger Prince estimates that the average number


of customers served per hour will be 80, 100, or 120.
The payoff table for the three models is on the next
slide.
Expected Value Approach
Payoff Table

Average Number of Customers


Per Hour
s1 = 80 s2 = 100 s3 = 120

Model A $10,000 $15,000 $14,000


Model B $ 8,000 $18,000 $12,000
Model C $ 6,000 $16,000 $21,000
Expected Value Approach
Calculate the expected value for each decision.
The decision tree on the next slide can assist in this
calculation.
Here d1, d2, d3 represent the decision alternatives of
models A, B, and C. the probabilities are 0.4,0.2,0.4
respectively

And s1, s2, s3 represent the states of nature of 80, 100,


and 120 customers per hour.
Expected Value Approach

Decision Tree Payoffs


s1 .4
10,000
s2 .2
2 s3 15,000
.4
d1
14,000
s1 .4
d2 8,000
s2 .2
1 3 18,000
s3 .4
d3 12,000
s1 .4
6,000
s2 .2
4 16,000
s3
.4
21,000
Expected Value Approach

EMV = .4(10,000) + .2(15,000)


d1 2 + .4(14,000) = $12,600
Model A
EMV = .4(8,000) + .2(18,000)
Model B d2 + .4(12,000) = $11,600
1 3

d3 EMV = .4(6,000) + .2(16,000)


Model C
4 + .4(21,000) = $14,000

Choose the model with largest EV, Model C


√ END

You might also like