0 ratings0% found this document useful (0 votes) 888 views51 pagesBusiness Statistics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
COURSE GUIDEBOOK COURSE GUIDEBOOK
Business Statistics
Professor George T. Geis
University of California at Los Angeles
Part I
Business Statistics
Part
Lecture 1: Overview of Probability and Statisties
Lecture 2: Descriptive Statistics
Lecture 3: Probability Concepts
Lecture & rent Probabilities
Lecture 5: §
Lecture 6: Random Variables
Lecture 7: The Binomial and Poisson Distributions
Lecture 8: The Normal Distribution
S195) “J, 81095 s0ssagory
[38q ‘sonspeag ssaursng
1-800-TEACH-12
1-800-832-2412
Tiae TracwiG Connany
545
www TEACH 12.comTable of Contents
Business Statistics
Part |
Professor Blography
Purpose of Course
Lecture One: Overview of Probability and Statistics
Lecture Twos Descriptive Statistics
Lecture Three: Probability Concepts
Lecture Four: Combining Event Probables
Lecture Five: Simulating Business Situations
Lecture Six: Random Variables
Lecture Seven: The Binomial and Poisson Distribution
Lecture Kight: The Normal Distribution
Answers
Bibliography
(01997 The Teaching Company Limited Partnersip
10
7
a
7
35
»
a3
82George T. Geis, Ph.D.
Anderson Graduate School of Management
University of California at Los Angeles
George T. Geis was born in Chicago, Illinois, in 1944, He received a B.S.
“summa cum laude” with “Honors in Mathematics" from Purdue University in
1966. Dr. Geis earned his Ph.D. in 1977 at the University of Southern California
and his MBA from University of California, Los Angeles in 1981.
Dr. Geis was @ National Science Foundation and Woodrow Wilson Honorary
Fellow. In the field of Finance, he has been honored with the Financial
Executives Institute Award for outstanding achievement,
During his teaching carcer as an Adjunct Professor atthe Anderson Graduate
‘School of Management at UCLA, Professor Geis has been voted outstanding
teacher three times. His academic experiences include serving as Research
Coordinator atthe Center for Human Resource Management. Presently, he is
serving as a member of the faculty advisory board for the Entrepreneurial Studies
program at UCLA.
Geis is also an author. He has published dozens of professional articles and five
books. His books include Desktop Computing and the Essence of Management
(Prentice Hall, 1990) and Micromanaging (Prentice Hall, 1987), Currently, he is
the application of computer technology to visually represent
dynamics in converging technology and communication markets and the use of
interactive media in illustrating statistieal analysis,
He has extensive consulting experience and is a frequent lecturer on emerging
tends in the computer, communications, and media markets. In his spare time
Professor Geis plays three-on-three basketball, struggles to lower his golf
handicap, and paints his seven-color Victorian-style home in Pasadena,
California.
2 (©1097 the Teaching Company Limited Parmership.
Business Statistics
Purpose of the Course
In our tightly wired world, business executives make decisions under pressure.
Almost always, these decisions must be made with less than complete
information. This course is about how to effectively use data that is currently
available (or can be obtained within a reasonable time frame and cost) to
improve business decision-making.
We will use business examples from functional areas such as finance, marketing,
human resources, and operations to ilustrate the role of data analysis in decision
making. This course is not designed to be a dry sleepy-time set of abstract,
‘mathematical lectures. My goal is to make statistics come alive inthe context of
life and in the context of real business problems demanding solution.
‘Quantitative methods such as statistical analysis must not be viewed as the be-all
and end-all of decision making. The vial role that seasoned business intuition
plays in effective decision making can not be overemphasized. Nevertheless,
analytical techniques are a central part of many decisions. In fact, we illustrate in
this course how statisties and probability can effectively work together with
‘managerial intuition in business problem solving.
‘The advent of personal computer statistical software that readily generates visual
representations of data and performs sophisticated analyses enables a manager to
‘concentrate on the meaning of data. The burden of computation has largely been
eliminated, and business people are now free to focus on probing issues and
searching for creative solutions. In this course, we illustrate the use of computer-
{generated output that promotes visualization of data
‘Students tell me that statistics was obscure and inaccessible for them as
undergraduates, On the first day of class, they enter my MBA course on
Statistics and Data Analysis prepared for the worst, Fortunately, Iam often able
to help them build intuition for statistics, appreciate how the content can be
applied and actually enjoy the experience.
Whatever, previous experience you have had with statistics (if any), our main
‘objective will be to make the content useful to you in business decision-making
aand relevant to decisions we all make in everyday life.
****In addition to questions at the end of each lecture, problems have been
provided where relevant. For you convenience answers are available at the end
of this outline.Statistical Software Credits
For further information on Crystal Ball Software
please contact Decisioneering Ine.
1,800, 289. 2550
1.308.337.3560 (F)
IMP-IN 3 for Windows © is available from
Duxbury Press,
‘An International Thompson Publishing Company
Belmont, CA
1-800-876-2350
Images Copyrighted New Visions Technologies Inc.
Al rights reserved. No part of this book may be reproduced in any manner
‘whatsoever without written permission except in the case of brief quotations
‘embodied in critical articles and reviews. For information, send complete
description of intended use to The Teaching Company/Rights and Permissions,
7405 Alban Station Ct. Suite B-215, Springfield, VA 22150, USA
4 (©1997 The Teaching Company Limited Parmership
Lecture One
Overview of Probability and Stati
ics
Scope: Professor Geis explains how skill in obtaining and analyzing data can
provide a business leader with significant competitive advantage.
Effective decision-making is not over quantified but takes into account
experience and intuition as factors. Decisions must almost always be
‘made with less than complete data. Therefore one must make decisions
‘based on inferences from the data one does have. Statistics help in
developing a model for refining business decisions. Professor Geis
compares and contrasts statistics with probability. He also details and
illustrates the major activities of statistical analysis.
Outline
Statistics can provide a substantial business advantage.
‘A. Statistical analysis refines decision-making and choices in business.
‘Meaningful information gives a business a distinct advantage in a
competitive world. Rapid decisions must almost always be made with
Jess than complete data. Therefore one must make decisions based on
inferences from the data one does have.
B. Statistics helps in effectively analyzing data as well asin developing a
‘model for refining business decisions. We build models to simplify
‘complex business situations. Effective decision-making is not over-
‘quantified but takes into account experience and intuition as facto,
Statistics takes inert data and brings it to life.
|A. Statistics involves collecting, analyzing, and understanding data for
effective decision-making. Statistics make sense ofthe big picture. The
key characteristics of the data provide insight into the problem under
consideration,
Good statistical analysis pays attention to the outlier. An outlier isa
data point that falls away from most ofthe others. The outlier(s) may be
‘included, excommunicated, or accommodated. An outlier may indicate
thatthe model is not adequate or expansive enough. On the other band,
the outlier may be so unusual that it really does not belong in our data
C. Business statistics may utilize a variety of data sets. Some examples of
data sets from the business world include: financial, marketing,
attendance patterns, production quality, human resources, Data analysis
cculs across all the functional areas of business.
er tee teehee apes tM :IIL. The core activities of statistics concern themselves with data collection,
representation and usage.
‘A. It is necessary to design a plan for data collection. Data may be acquired
from another source, known as downloading, Data may also be
developed by gathering information,
B. The aim of descriptive statistics isto represent the data or results of
research in tabular, graphical, or numerical form, The data must be
summarized in some way in order to describe and visualize it, The key
characteristics ofa set of data emerge and provide a picture of the
situation.
C. Inferential statistics refers to a group of methods used to draw
inferences about a population from data available on a sample of the
population. Inferential statistics moves from a sample group analysis to
{draw conclusions about a parameter inthe population at large.
). Hypothesis testing seeks to determine which of two competing ideas is
correct. It addresses a problem by testing an expected solution to the
problem. An example of hypothesis testing in the business world might
be: Is the current production quality under control or out of control?
E. A forecasting model predicts what is likely to oceur in a future situation
based on a number of different factors. For example, statistics ean be
used to predict what the sales per square foot will be in a new restaurant
based on location, demographies, and other factors.
IV. Probability is a number expressing the likelihood that a specific event will
occur, expressed as the ratio of the number of actual occurrences to the
‘umber of possible occurrences.
A. Probability is used to predict a future outcome. Based on assumptions
about how the world works, probability quantifies the likelihood of a
future outcome.
B. We can view probability asthe “inverse” of statisies. Statistics begins
‘with what is observable and draws conclusions about how the world
‘works. Probability starts with a view of how the world works and tries
to forecast what will occur.
\V. Classifying data sets makes the information they contain more useful
6
‘A. Data sets consist of measurements For individual records called
clementary units. Example: salaries for individual employees.
B, Data sets may be classified according to the number of variables for
each elementary unit as univariate, bivariate, or multivariate. Univariate
data sets concern themselves with one measured variable. Bivariate data
sets contain two measurements for each elementary unit. Multivariate
data sets contain three or more measurements for the data set.
01997 The Teaching
wnpany Limited Partestip
C. Data sets can be classified by whether or not time sequence is
important,
1. Time is important in time series data sets since time is one of the
dimensions. The daily Dow Jones Average is an example of ime
series data
‘Time sequence is not considered in cross-sectional data. The data is
‘only meaningful for one slice of time. One example might be the
college ranking of basketball teams. A measurement is given for
‘each of the elementary units for a particular slice of time.
D. Data sets can be classified by the kind of measurements recorded for
ceach unit.
1. Quantitative data can either be discrete or continuous.
a, Discrete data ean be counted. For example, How many times
did you drink coffee today?
b. Continuous data is any variable that cannot be listed as a
discreet number. Time and distance are common examples of
‘continuous data,
2. Qualitative data is classified as ordinal or nominal
8. Ordinal data uses a scale to rank or order objects or persons on
‘a continuum. This provides information about the rank order
(on a particular variable. These numbers have meaning in the
ranking but they are not generally added or subtracted.
. Nominal data uses numbers to stand for names or categories
representing the way objects or persons differ. Common,
‘examples of nominal data include sex (male or female), ace,
tc. There is no implied ranking.
Refer to Dodger Data Set
Category ‘Type of data
Games attended quantitative, disereet
Age quantitative, discreet
Firsteage quantitative, discreet
Race qualitative, nominal
Distance ‘quantitative, continuous
Sex qualitative, nominal
Refer to Dodger Attendance Histogram
The histogram is a useful way of visualizing the data, This histogram helps
portray the fact that there are two different groups of fans: the regular fans and
the season holder or “super” fanSlicing the Dodger Data Set
Games Age First-age Race
1 18 6 c
5 32 12 H
3 4% 2 oC
4 477 c
35 42 7 H
Distance Sex
10 M
5 M
20 M
25 F
30 M
Dodger Attendance Histogram
Games/Season
O2ecen Nw om a oo ee eo
(©1997 The Teaching Company Limited Partnership
Frequency
Questions for Lecture One
1, True or False: A model is essentially a reduction technique for making.
sense out of one’s world,
2. Define statistics.
3. Whatis an outlier?
4. Statistics cuts across all functional areas of business. Explain.
8. Distinguish between inferential and descriptive statistics
6. In what sense is probability the “inverse” of statistics?
7. You are developing a data set that provides prices of an IPO (Initial Public
Offering) at issue, one month after issue, and six months after issue. Is this
data set univariate, bivariate, or multivariate?
8. Is addata set that provides NCAA college basketball rankings as of 1/15/97
time series or cross sectional?
9. An investment bank ranks stocks I, 2, or 3 (I being most timely for
investment, 3 being least timely). Should this data be thought of as nominal,
ordinal or quantitative?
Essential Reading for Lecture One
Aczel, Complete Business Statistics, Chapter 1, Irwin, Third Edition, 1996.
Recommended Reading for Lecture One
2 Probability as a Basis for Action,” American Statistician, Vol. 29,
1975, 146-152.
Hanke and Reitsch, Understanding Business Statistics, Chapter 1, Irwin, 1994.
Demin
(©1997 The Teaching Company Lime # assLecture Two
Descriptive Statistics
Scope: In this lecture we discuss the purpose of descriptive statistics, What are
some important ways to view and summarize data? Why is variability
so important in analyzing a business situation?
Outline
1. Descriptive statistics portrays and summarizes a set of data so that its key
characteristics become evident. Raw data is transformed into useful, refined
information.
TL. Various graphing techniques have been developed for viewing and
summarizing data
‘A. Bar charts and histograms use the area of rectangles or bars to portray
differences and trends in data. Bar charts usually portray the measure of
frequency of a qualitative variable, Histograms visualize how a
quantitative variable is distributed
1, ‘The bar chart below answers the question, How may films a year
do each of the film companies produce?
bar chart
#films
cc
20 18
28
ist 12
10
5
°
MGM Fox Disney
0 (©1097 The Teaching Company Limited Parmer
2, This histogram starts with the quantitative variable, in this case,
salary levels. It visualizes how the salary levels are distributed and
the likelihood of being in a certain category.
Histogram
30%
25%
20%
19%
0%
&%
$80,000 $100,000 $120,000
Likelihood of salary levels
B. Contingency tables use the cross tabulation of occurrences in two
«dimensions to display key characteristics, Ben Pranklin had as a goal
perfection in thirteen identified virtues. He kept a record of his
violations of these virtues with a eross tabulation of the virtue and day
of the week,
vintuE [Su [M__[T |W [th |r _|Sa
1 _[ Temper-ance W 7 Zi
2 [Silence 7 7
3 [Onder a a_i ati
A A SL
(©1097 The Teaching Company Limited! Parersip uA scattergram plots points in two dimensions and then tries to ft the
da in some way toa line or curve. A production curve is a common
application of a scattergram. An outlier is a data point that does not lie
fon the curve.
seattergram
EPR.
L
o 1 2 3 4
ora
D. A time series graph tracks the movement of some variable across the
dimension of time. An example might be the share price of Netscape
over the past year. From a time series graph we can try to visualize what
triggered some of the events represented by the upturns or downturns,
‘Time series graph
18
star f}———§\\_e
price 5
OE
June - August
IIL Statistics describes data by selecting one or more values to summarize the
tire data set. Very select values try to describe the entire data set as a
whole using only one or two numbers. The spread of the data set may also
be visualized to provide an at-a-glance summary in a graph like a box plot or
ogive.
‘A, In statistics we would like to speak with some confidence about the
Population at large. It is important to understand the difference between
4 population parameter versus a sample statistic.
b ©1997 The Teaching Company Limited Panes
A population parameter is a numerical measure for the entire
Population. An example might be the average age of all possible
purchasers. The Greek letter Ht stands for the population parumeter.
2. A sample statistic is a numerical measure of a sample group. The
sample is taken from the population and is an estimate of H. The
symbol, X , read x-bar represents the average of the sample
statistic.
sample
sample statistic
average age of sample
coup.
population
im
Population parameter-actual average age
B. When a statistic is used to estimate a parameter, it is called an estimator.
‘The sample mean, X isa statistic which is used as an estimator of the
population mean, [1. The closer X is to JL the better an estimator itis
For example }1 might be the average age of a customer base. X is
calculated from a sample so that one can approximate with confidence
the actual average age, [.
C. There are several ways of measuring the central tendency of a set of
data, tis useful to have one number to represent the data set. The mean,
‘median, and mode are different ways of summarizing a data set, The
summary measure used effects the number chosen to represent the data,For example, use a data set with customer ages of 30, 30, 40, 50, 100
years
1, The means the arithmetic average. The ages are all added together
and divided by the number of customers. 250 divided by 5 equals
50. So the mean age is 50 years. In this case the extreme value of
the 100 year old customer may unduly effect the mean,
The median is the central data point when in ordered from least to
sreatest In this case the median age is 40 years,
3. The mode is the most common data point. In this case the mode is
30 years. There can be no mode, one mode, or multiple modes.
The box plot is a useful visual data summary because it makes obvious
the median, the middle fifty percent of the data, the least and greatest
data points and any outliers, as well as how the data is spread out.
BOXPLOT
75th * + outiter
percentile Ft wNargest
Ne non-outlier
<— median
4 smallest
an non-outlier
percentile
1. The box plot conta
coming out ofthe box.
2, The line in middle is the median valu
3. The op ofthe box isthe 75th percentile; the bottom is 25th
percentile. The center of the data is contained within the box.
J. One of the two “whiskers” extends from largest non-outiet to the
box ,a line through the box, and whiskers,
‘75th percentile edge of the box. The other whisker extends from the
smallest non-outlier to the 25th percentile edge of the box.
5. Any outliers are indicated by a star
E.
A cumilative-frequeney graph or ogive is a graph that builds up ioward
the right with each increment including the data preceding it.
1. Discrete data, for example age, looks like steps in an ogive.
2, Continuous data looks more continuous.
loom Cumulative-requency graph or
som 4 Osive
20%
o%
10 2030.40 50 60 70 80 90 100
‘Age in years
IV. Variability takes into account the differences in response. It is helpful to
know how close the data set isto the mean,
v.
A.
B.
Variability has a central role of statistical analysis.
‘The traditional choice for measuring variability is the standard
deviation, The standard deviation is a measure of how close each ofthe
data points are on average to the mean, The standard deviation is given
by the mathematical relationship below.
‘A mx. XY
Most data points tend to be within two to three standard deviations of
the mean. Some way-out events like a stock market crash may be
twenty standard deviations from the mean.
‘There are other summary measures that are useful for describing data set,
A
B.
Skewness measures the asymmetry oF a distribution. Therefore, ifthe
data plot it tends to have a tal in which the curve goes up but is rot
symmetrical. There can be right-skewed and left-skewed distribuions.
‘The bulk ofthe data is opposite the tal or skew.
Kurtosis measures of peakedness of a distribution, The shape of
distribution can range from very peaked to very flat.
1. Leptokurdic describes a distribution that is very peaked. In this case
the data is clumped together.
2. Platykurdic describes a distribution that is very fla. In this case the
data is very spread out.
£01997 The Teochine Company Limited Parershio Is(Questions for Lecture Two Lecture Three
1. Describe the purpose of descriptive statistics in business. Probability Concepts
2. What does the height of each bar ina histogram represent?
3. True or False. A scatergram provides across tabulation ofthe data by ke Scope: Probability plays an important role in analyzing business situations and
‘characteristics. van ad in refining intuition. In business situations, complexity is often the
norm. Therefore, in order for a model to be adequate, it must have some
room for complexity and subtlety. Probability helps us move from the
known to the unknown and to assess the likelihood of future evens.
Distinguish between a parameter and a statistic.
‘What does the 95th percentile ina data set represent?
‘True or False. A box plot serves a similar purpose asa histogram ~ 10 Various ways of determining probability will be introduced in this
provide a visual image ofthe distribution ofthe data lecture.
7. ‘True or False. The standard deviation summarizes how far from the median
the data typically are? Outline
eee 1. The goal of probability i to understand what is likely to happen in uncertain
Problems for Lecture Two future situations within “known” systems. Good business decision making
uses probability o obtain assessments of various possibilities: competitors
‘The following data represent monthly housing rents paid by employees in your centering the market, cost patterns, cash flows, etcetera.
accounting department: $500, 600, 600, 600, 700, 800, 800, 900, 900, and 3,000.
Answer the following questions relating to this dataset
1. Calculate the mean, mode, and median values of the data set
A. Probability starts with the known and moves to the unknown,
B. If. good model is developed then we have a good way to make
predictions
2 Draw a histogram ofthe data ‘The difference between probability and statistics may be ilustrated as
3. _ Is the data skewed to the right or skewed tothe left? What does this follows
skewness mean?
4. What is the standard deviation ofthe data sot? [What happened? > SSTATISTIOS> How the word works
5. What is the standard deviation if the highest rent (arguably an outlier) is
excluded? How the world works > PROBABILITY >What will hap
IL, To predict the future a model for refining intuition must be developed. The
following flowchart serves this purpose.
Essential Reading for Lecture Two pape
Aczel, Complete Business Statistics, Chapter 1, Irwin, Third Elion 1996, Model for Refining Intuition
Recommended Reading for Lecture Two TTT
ct standing Business Statistics, Chapter 4, Irwin, ‘Analytical
Hanke and Reitsch, Understanding Business Statistics, Chapter 4, Irwin, 1994
Velleman and Hoaglin, Applications, Basics, and Computing of Exploratory ‘Access to
Data Analysis, Duxbury, 1981. Information,
Raw
Refined. Implemented
net Intuition [—>] Intuition
‘Access to
Expertise
16 (©1007 The Teaching Company Limited Partnership Cee eee eae »IIL, Experimental probabil
xy provides a basis for probability judgments. The
results of random experiments or customer surveys help in the decision
‘making process, Desired product features can be determined by the
information obtained.
‘A. The random experiment: procedure that produces an outcome that can't
be perfectly predicted.
B. The sample space: a list ofall possible outcomes of the experiment. For
‘example, the sample space for flipping a coin one time is (H, T}; for
rolling a die is (1, 2,3, 4, 5,6}
C. The outcome: the result that occurs each time a random experiment is
run, For example, the outcome of five coin flips may be heads, heads,
tails, heads, tals.
D. The event: a collection of outcomes specified in advance; a subset of the
sample space. For example, when flipping a coin, we may be interested
in the event heads.
IV. Probabily related to events helps deal with complex situations.
A. Probability indicates how likely itis an event isto occur by a numerical
value between zero and one. This value may be expressed as a fraction,
decimal or percent. A probability of zero means i isnot ever going to
happen. A probability of one means the event is always going t0
happen. Even a very unlikely event might occur over a long period of
time. We have to keep in mind the time horizon in calculating
probability.
B. The probability of an event can be expressed in terms of “the odds”.
For example, in a horse race, a certain horse may have 9to | odds. The
1
probability ofthat horse winning is p=7q_ as calculated by the formula:
eel
P= Pst)
where p=probability
and P+ refers to the odds
1 1
With 1 to 1 odds p=3 ; with 310 I odds p=
(01997 The Teaching Company LinitedParers!
C, There are various ways of deriving probability.
1. _Ineexperimentation the relative frequency is expressed as a ratio,
limes the event occurs
ce #erals
For example, iftwo thousand golf clubs are sampled and three are
found to be defective, the probability of a defective golf club would
3
be expressed as p=3q5 = 0.0015 or less than 1%.
2. Sometimes history will not be a good predictor ofthe future since
the future may deviate from the past. Probability may be estimated
subjectively based on experience and intuition. The calculated
relative frequency may or may not be accurate dependening on the
‘expertise of the individual
3. Counting possibilities is another way of determining probability,
For example, take the chances of getting two daughters ina three-
child family. Let m=son, f=daughter.
Make a list all the possible outcomes: mmm, mmf, mfm, mff,
fmm, ffi, fmf, fff
b. Determine which outcomes correspond to a two-daughter
family: mf, fim, fmf.
‘e. Express the probability asthe ratio of number of two-daughter
4. Mathematial calculation of probability lets us count withou
counting. For example, if there is «60% chance of sinking afre-
throw, the probability of sinking ten in a row is given by p= (0.6)!0
= 0.006. This is equivalent to 6 times in a thousand tries,
'V. We can not necessarily predict the future but probability provides a means
of quantifying what is likely to happen.Questions for Lecture Three
Describe the overall goal of using probability analysis in business
situations
‘True or False. The sample space is a listing of al possible outcomes of a
random experiment,
3. What is the difference between an event and an outcome?
4. True of False. The probability of an event, a number between -I and +1,
expresses how likely itis that an event will occur.
‘You want to estimate the probability of a film doing more than $100
million atthe box office (ticket sales) and ask an industry expert for her
opinion. What means of deriving probability are you using?
6. Ifyou examine a database of last year's movies to estimate the probability
of the film doing $100 million in box office, what means of deriving
probability are you using?
Problems for Lecture Three
Ifa racehorse has been given 5 to 1 odds of winning, what is the implied
probability thatthe horse will win?
2. Ifyou are a 70% free throw shooter in basketball, what are the chances that
you will “sink” 10 free throws in a row?
3. Say your chances of picking a “winner” stock is 50% and the chances of
picking a “loser” stock is 50%. If you select four stocks, what are your
chances of picking exactly three winners?
Essential Reading for Lecture Three
Aczel, Complete Business Statistics, Chapter 2, Irwin, Third Edition 1996.
‘Recommended Reading for Lecture Three
Clemen, Making Hard Decisions, Chapter 7, PWS-Kent, 1991,
Hanke and Reitsch, Understanding Business Statistics, Chapter 5, Irwin, 1994,
20 (01997 The Tesching Company Limited Paresh
Lecture Four
Combining Event Probabilities
Scope: In this lecture we discuss how to obtain probabilities associated with
‘more complex events. We introduce the notion of simulation, how it
relates to probability, and how it can be used in business decisior
‘making. Business situations are usually complex. We develop the
intuition behind conditional probability, independence, and mutual
exclusivity.
Outli
1. Simulation provides the means for forecasting success or failure, potential
revenue, of how the market may respond to a new product launch, Monte
Carlo simulation is used extensively in business to deal with uncertainty.
‘There are many popular software packages available that incorporate Monte
Carlo simulation inside a spreadsheet.
A. There are three steps in the Monte Carlo simulation,
A. Generate assumption cells
2. Given the assumed values above, calculate the value inside a
spreadsheet, perhaps a hundred, or a thousand times, This develops a
tracking mechanism for the range of possibilities.
3, Display the range of possibilities in a forecast chart. The forecast chart
shows the range as well as the probability corresponding to each
potential value,
B. Refer to the revenue forecast model below. Each entry also has behind it a
range of assumed values. The distribution of these values can vary. For
‘example the price assumption could be a triangular distribution
‘Competitor entry can also be an assumed value.
Revenue Forecast Model
evens smulon: Product La
ect: NoCompettor, — 0
Pre: Conpetior 0
‘onune No Congetior 0
Vonane We Cnet 0
Compa Et? °
Sales Pe 0C. Refer to the Forecast Chart below. This simulates the cash flow resulting
from a project launch. Simulation goes beyond stipulating the range ; it
also gives the probability associated with each value in that range.
Forecast Chart
Forecast: 97 Revenue
Fraquoncy Chart 500 Trials Show
” , |
sux” sao ox se
eran arg tom 34 aK
TL. Venn diagrams depict probabilities spatially,
‘A. The sample space, X, is a rectangular region containing the entire range of
possibilities of what could happen.
x
2 (©1997 The Teaching Company Limited Partnership
B. An event, A, is represented spatially by a circular area. Everything inside
the circle corresponds tothe desired event. Not-A is represented by
everything outside the circle. A and -A are complements
x
. Intersections occurs when there is an event, A, and an event, B. The two
circular areas overlap. The area that they share in common represeats both
events occurring, To calculate the probability of A or B occurring, do not
add the probabilities together. The area in common must be subtracted or
it will be counted twice. Therefore, p(A or B) = p(A) + p(B) - p(A-B)
x
(©1997 The Teaching Company Limited Partnership 2D, Mutually exclusive events, A and B, share no common points. For
‘example, A could represent an earthquake occurring in the next 30
seconds. B could represent no earthquakes today. Ifevent A occurs, itis
impossible for event B to occur atthe same time and vice versa.
IIL. Conditional probability when you revise the probability of an event to
reflect information that another event has occurred.
A. Probability of A, given B is expressed as p(A(B)..
B. For example, a sales price will be affected by whether or not a competitor
centers the market,
TV. Independent events occur when one event does not have any relationship to
another. What happened in the past does not impact or change the future
A. For example, the roulette wheel is not influenced by what occurred
before, whereas blackjack is since the cards dealt affect what cards
B. To calculate the probability of A and B, multiply. P(A and B) = p(A)p(B)
C. If two events are mutually exclusive, then almost certainly they are not
independent. One event docs have a relationship on the other.
'Y. Check your understanding of combining information about events with the
following problem:
‘You apply for four jobs and have a 1/4 chance of getting each one. assuming
independence, what are the chances of getting at least one”
A. Since the events are not mutually exclusive, you might be offered more
than one job, one job, or no job at all, So you cannot merely add the
probabilities together or muitiply them,
2 (©1997 The Teaching Company Limited Parertip
L
b.wt i ty meio en
Thepramnny ofotentagenhot eons S03 S42.
Ehret edger erect
3 qrsinaly Soe
‘getting a job from one. 1
probability of getting atleast one job offer, or maybe more, is 68%.
(Questions for Lecture Four
Why ie it important in business tobe able to analyze probabilities that
involve combinations of events?
‘True or False. Monte Carlo simulation is one way to effectively combine
probability information about events.
Define what is meant by conditional probability.
Explain what i
Define independent events in your own words.
‘True or False, Iftwo events are mutually exclusive, then they are
independent.
implied if two events are mutually exclusive.Problems for Lecture Four
‘The following problems relate to this situation. Assume you apply to three
colleges and that you have a 1/3 chance of getting into each one. Assume
independence.
1. Are the chance of being admitted to atleast one college 1B+1/3+1/3 = 17
Why or why not?
2. Are the chances of being admitted to at least one college 1/3 x 1/3 x 1
1/27? Why or why not?
3. What are the chances of being admitted to atleast one college?
4. What are the chances of being turned down by all colleges?
Essential Reading for Lecture Four
Aczel, Complete Business Statistics, Chapter 2, Irwin, Third Edition 1996,
Recommended Reading for Lecture Four
Clemen, Making Hard Decisions, Chapter 7, PWS-Kent, 1991.
Hanke and Reitsch, Understanding Business Statistics, Chapter 5, Irwin, 1994,
26 (©1997 The Teaching Company Limited Parmership
Lecture Five
Simulating Business Situations
Scope: In this lecture we show how simulation builds on our understanding of
probability. We review the steps in setting up a Monte Carlo simulation
Understanding how probabilities work and distributions are built is an
essential clement to building a good simulation model. The model is
good to the extent that it approximates the reality of the business
situation that it represents. Since it isa reduetion tool, no model is
perfect, but it helps us to deal with uncestainties in trying to predict the
future, Simulation models help us deal with risks and make a decision
using analytical as well as intuitive power.
Outline
1. Simulation is a useful technique for modeling business situations with
uncertain conditions.
‘A, Assumptions are built into each cell of the simulation model, which is
usually run inside ofa spreadsheet. The probability of a given value
‘coming up will be driven by the distribution which sits behind each
assumption.
BB. Historical information along with other factors is used to obtain the
probability estimates for the distributions.
TI, Monte Carlo simulation was developed in the 1940's by John Von
"Neumann who used it in physics. Now Monte Carlo simulation has many
other applications, including business, Random numbers are selected for
cach assumption cell drawn from the related distribution,
‘A. The steps involved in Monte Carlo simulation
1.Generating random numbers to conform model assumptions
2.Caleulating one iteration (recalculation of the model) of the event
‘3.Displaying simulation results in a forecast chart
B. Monte Carlo simulation is not the only technique for modeling.
TIL. Simulation is useful in business decision as a product can have many lives
‘on paper before final decisions are made. Business people often think in
terms of a triangular distribution: worst case, most likely and best case
scenarios. These distributions can be randomly entered in a simulation
‘model. The result is not only the range of likely possibilities but also their
probabilities
IV. Forecasting revenue for a product launch is a type of business situation that
can be simulated. Refer to the chart below.
(©1997 The Teaching Company Limiod Parmer 2VL
A
One Iteration of the Model
Revenue simulation: Product Launch
1997
Price: No Competitor eee Td
Price: Competitor 87
Volume No Competitor sis
Volume With Competicor asp
Competitor Entry? i
Sales Price $97
Sales Volume 3572
Sales Revenue $200833
ch of the first five shaded areas is an assumption cell used to forecast
the projected sales revenue, also shaded in. In this example different kinds
of distributions are built into the assumption cells.
1 The competitor entry cells are a discreet distribution
2.The price entry cells area triangular distribution, A triangular
1, the standard error of the mean
gets smaller.
B. As the sample size gets large the standard eror ofthe mean gets
Staller This i anoter application o the cena inst eorem
C. Take information from a sample group, 2-100,
¥ =520,0 =88.00, then =F =~4
2320, ten dg =k,
% “a ios
This allows sto sy something about H. I'm sing $20 to estate HL
My standard error of the mean is $0.80. This is a much tighter
istribution than the standard deviation of the raw data
$0.80
‘The sampling distribution for the sample proportion isthe binomial
distribution by application of the Central Limit Theorem.
A. The sampling distribution for the sample proportion is related to the
binomial distribution. Iti the binomial distribution with parameters n
and p, where nis the sample size and p is the population proportion
BB. As the sample size increases the Central Limit Theorem applies. So the
sampling distribution of the sample proportion approaches a normal
distribution, as n gets large. As a rule of thumb, we can use the normal
approximation if mp(-p) > 5.
Business applications using the central limit theorem
A. You survey some of your customers to determine if sales will go up if
‘you cut prices.
A
xofn forn=35 } =35
‘The sampling distribution {) , approaches a normal distribution witk the
pop)
\ Since x follows a
P
‘mean = p and a standard distribution
distribution.
(©1997 The Teaching Company Limited Partnership 9B. You survey 400 voters on an upcoming ballot initiative, You assume
A A
p=0.5. ¥ survey and find our that 425, The distribution of ‘p
‘will be a normal curve with a mean of 0.5 and a standard distribution of
errors below my proposed p. So this would be evidence that the
population parameter is probably not 0.5,
.025. This means that I am three standard
Central Limit Theorem
* Central limit theorem: for a data set of n
independent observations of a random variable
representing a population
~ for both the average and the sum, the
distribution becomes more and more
normal, as n gets large
- the mean & standard deviation of the
>
‘Construct the confidence interval for the sample proportion: another real
estate example. In this case the confidence interval is again given by the
point estimate + 2 multiplier times the standard error estimate.
‘A. What percent of my client base has previously owned a home? In a
survey of 100 clients, 60 have been previous homeowners. Construct a
99% confidence interval
(01997 The Teaching Company Limited Parersip 7B. The point estimate is a 90. The standard error estimate is
P
AGA)
PAY p/ _, (06-06 _. [28
a= \P 100 =\V i100
C. Using 2.576 as the z multiplier, a confidence interval of 99% can be
established for .60 + 126 having previously owned ahome, This
‘means that we can be 99% confident that 47.4% to 72.6% of our clients
hhave been previous homeowners.
TV. Confidence intervals are valid only if certain requirements are observed.
|A. Be sure the data set is @ random sample from the population of interest.
For example itis impossible to sample the future.
B. Be sure the quantity being measured is normally distributed. This is
nota rigid requirement since the central limit theorem tells us that
‘means and other measures are normally distributed.
18 (01997 The Teaching Company Limited Parersip
Questions for Lecture Twelve
Explain what is meant by 1-c, the level of confidence.
For a large scale sample, what does 1-c. typically depict in relation to a
normal curve?
For a large scale sample, what does o/2 typically depict in relation to a
normal curve?
In reporting on an election poll, a newswoman states that 52% + 3% of the
electorate say they will vote for a given candidate. Is this a confidence
interval, and if so, what parameter is being estimated.
‘True or false. In order to construct a valid confidence interval, the deta set
lilized must be a random sample from the population of interest.
Problems for Lecture Twelve
‘A new pizza topping is testing in your supermarket. A sample of 500 shoppers
try the product and 240 say that they like it.
1
2
‘What is the sample statistic forthe proportion of shoppers that like the
spread?
Construct a 90% confidence interval for the percentage of shoppers that
like the topping.
Construct a 95% confidence interval for this percent
Interpret in your own words what the 95% confidence interval means,
Essential Reading for Lecture Twelve
Aczel, Complete Business Statistics, Chapter 6, trwin, Third Edition, 1996.
Reco
mended Reading for Lecture Twelve
Hanke and Reitsch, Understanding Business Statistics, Chapter 8, Irwin, 1994.
(©1997 The Teaching Company Limited Parnerstip 19Lecture Thirteen
Hypothesis Testing
Scope: In this lecture we explore the use of hypothesis testing in business. In a
1
m.
20
‘business situation our data is limited to a sample of reality. Statistical
techniques can test how large a part chance plays in the results reflected
by the designated sample. In designing a hypothesis test, we intend to
determine whether or not a claim, such as response rate from an
advertising campaign, should be allowed to stand. We will examine the
steps in conducting a hypothesis test.
Outline
‘Asstume that the experimental results reflect only the random variation
‘caused by chance. This assumption is called the null hypothesis. The
object of our research isto be able to reject or fail to reject the null
hypothesis. Stating the null and alternative hypothesis
A. The null hypothesis can be viewed as the status quo; i is valid until
proven otherwise. It is usually denoted by Ho.
B. The alternative hypothesis is the competing theory which you are trying
to establish, The alternative hypothesis bears the burden of proof. Itis
usually denoted by H1
‘The task of hypothesis testing isto reject the null hypothesis or Fail to
reject the null hypothesis,
Errors in hypothesis testing.
A, In a Type I error: rejecting the null hypothesis when itis true. also
known as an alpha error.
1B, Type Il error: failing to reject the null hypothesis when itis false, also
known as an beta error. v.
CC. Examples: the ding letters and true love
1. Hg: You shouldbe hired. Hy: You shoud be dinged
Correct decision
Company hire ain
decision wire | corect Type I eror
ding ___ [Type terror | corret
(©1997 The Teaching Company Limited Parership
W.
2. Ho: You should pursue this romantic relationship.
Hy: You should not pursue this romantic relationship.
Truth what you should do
yursue ot pursue
whatyou | pursue | correct-True | Type If error-
decide to do love Looking for love
in all the wrong
places
not pursue |Type Terror | correct-Thank
Golden chanees | God for
pass me by. ‘unanswered
prayer
A two-tailed test is used when the difference between the population
parameter and a sample statistic is non-directional. ‘The statistic could be
very large or very small. When the direction of difference between the
population mean and a particular value is specified, the alternative
hypothesis is directional, or one-il. In 2 one-tailed test, consider urder
‘what circumstances to take action. This will determine the alternative
hypothesis
A. Use a right-hand-tailed test to take action if a parameter is greater than
some value since the alternative hypothesis will state that the parameter
is greater than some value.
B. Use a left-hand-tailed test to take action if a parameter is less thas some
value since the alternative hypothesis will state thatthe parameter is
less than some value.
‘The steps involved in hypothesis testing
A, Set up the null and alternative hypotheses.
B. Choose , the level of significance.
C. Define the test statistic, for example z
D. Define a rejection region, In this region, the value of the test statistic
results in rejecting the null hypothesis.
E, Calculate the value of the test statistic and carry out the test
F, State a conclusion for the original question,
(01907 The Teaching Company Limited Parmersip aVI. A hypothesis test can be used to test product quality claims. Suppose you
produce a professorial punching bag with the claim that it's good for 400
punches. Check out the claim using hypothesis testing as outlined above.
AL Hy: b= 400 Hy: | 400
test 100 punching bags, n=100, X = 420, $=50
B. alpha =005
Kp ¥-a0 K -400
G2 = =
o sx 5
D. > 1.96 or 2<-1.96
= jn 400-420
g, te
F, Since the z-value is so extreme, we reject the null hypothesis. The
likelihood of being wrong is less than 5%.
2 (©1997 The Teaching Company Limited Parership
Questions for Lecture Thirteen
‘What is the null hypothesis ofa test?
How does the alternative hypothesis relate to the null?
Explain what is meant by Type I error?
What is a Type I error?
‘When would you use a hypothesis test as opposed to simply constructing a
confidence interval?
Problems for Lecture Thirteen
Suppose you manufacture small packages of tissue paper and want to knowhow
‘many tissues should be put in your package. You decide to test the industry
‘wisdom that the average person uses 40 tissues during a cold. You contiuct a
random sample of 100 customers with a cold and find the average customer uses
235 tissues with a standard deviation of 25. You set cat 5%.
1. Write the null and alternative hypotheses for yout test
2. What isthe test statistic you will use?
3. Define the rejection region for the mull hypothesis.
4. Calculate the value of the test statistic.
5. Should the null hypothesis be rejected. Explain,
Essential Reading for Lecture Thirteen
Aczel, Complete Business Statisties, Chapter 7, Inwin, Third Edition, 1996,
Recommended Reading for Lecture ‘Thirteen
Hanke and Reitsch, Understanding Business Statistics, Chapter 9, Irwin, 1994.
(©1997 The Teaching Company Limited Partnership 2BScope: Linear regression is a method for modeling the rel
n.
24
Lecture Fourteen
Simple Linear Regression
jonship between
two variables, such as advertising and sales or training and job
performance. Regression is a widely used technique and ofien provides
‘useful mathematical formulation of a real world situation. This lecture
will explore the basies of simple linear regression,
Outline
Regression and modeling
‘A, Simple linear regression involves two variables x (independent) and y
(Gependent) assumed to have a straight-line relationship
B. Linear regression is one of the most widely used statistical techniques
in describing the relationship between two variables such as
advertising and sales, training and job performance.
C. A good model captures and extracts the systematic behavior of the
data, leaving out factors that are nonsystematic and cannot be
foreseen, namely random error.
‘The purpose of simple linear regression is to provide a best model for a
straight-line relationship between two variable.
‘A. Simple linear regression assumes an intercept parameter
nd a slope
parameter: y= fio * fix & where ty isan estimate ofthe
imercept, san estimate ofthe slope and € represents random
1. The intercept parameter provides the value ofthe dependent
variable when the independent variable is equal to 0
2. Apositive lope parameter will occur when increasing values of
the independent variable are associated with increasing values of
the dependent variable.
3. Anegatve slope parameter will occur when increasing valuss of
the independent variable are assoelated with decreasing values
ofthe dependent variable.
B The method used to estimate the regression parameters is called least
Squares. "This technique minimizes the sum of the squared eror.
(©1997 The Tesching Company Limited Parehip
C. ‘The MSE (Mean Square Error) is used in estimating error variance.
‘The smaller the error variance, the closer the points are to the line. If
the error variance is too large when using simple linear regression,
then itis more difficult to make accurate and meaningful forecast
predictions. Error variance for can be represented as
syne fee
nas
D. Consider the following example concerning the relationship between
housing square footage and sales price.
x(f2) yiprice)
1500 25K
200 230K
1800 290K
3000 340K
350000
300000 .
280000 Aj
Soles 200000
pprice 160000
00000
‘50000
0
© 1000» 2000-3000
Square footage
possible regression line y=850,000 + 80x
intercept is $50,000, slope is 80
IIL. Correlation must be distinguished from regression.
A. When we do correlation analysis, we assume that both x and y are
random variables. With regression, we assume that xis fixed. The
correlation between x and y is a measure of the degree of linear
association between the two variables,
B. The sample correlation is denoted by r and can take values from -1 10
41, With 0 correlation, there will he litle if any association between
the two variables, for example shoe size and eye color, R2, the
coefficient of determination, isthe square of the correlation for simple
linear regression and has a special meaning in regression analysis.
(©1097 The Teaching Company Limited Parership 2526
c
Correlation is a measure of how closely two variables stick together
im a straight line relationship, Both variables are independent. In
regression analysis, one variable is independent and one is dependent.
(©1997 The Teaching Company Limited Parerhip
(Questions for Lecture Fourteen
1. Describe in your own words the purpose of simple linear regression.
2. ‘True or False. A good statistical model will often explain all of the
systematic behavior ofthe data eliminating all of the random error,
3. What information does the intercept parameter in simple linear regression
provide?
4. Give an example of when the slope parameter in linear regression would be
negative.
5. True or False. There is one line that minimizes the squares of the error
from the points to that line, That lin is the regression lin.
6. ‘True or False. The Root Mean Square Error is used in constructing
confidence curves for the regression line.
7. State a major difference between regression and correlation,
True or False. Correlation ranges from -1 to +1
9. Givean example of two variables that have correlation of around 0.
Essential Reading for Lecture Fourteen
Aczel, Complete Business Statistics, Chapter 10, Irwin, Third Edition, 1996,
Recommended Reading for Lecture Fourteen
Hanke and Reitsch, Understanding Business Statistics, Chapter 14, Irwin, 1994,
Mendenhall and Sincich, A Second Course in Business Statistics: Regression
Analysis, Chapter 2, Dellen, 1993,
(©1997 The Teaching Company Limited Parmeship 7Lecture Fifteen
The Validity and Usefulness of a Regression
Scope: Just because we run a regression does not guarantee that its useful or
valid. A regression may be valid only for @ small range of values. In
this lecture, we explain how to determine whether or not the regression
equation in meaningful for business analysis. We also discuss what
conditions must be met in order for a regression to be valid. The goal
of regression is not just to fit a line to a set of data points, but to be able
{0 use the line to forecast and predict.
Outline
BotBixte.
‘A. When there is no linear relationship between x and y, the population
regression slope, 1, is equal to 0. Therefore the most important
statistical test in simple linear regression is whether ono the slope
Parameter sO. In every other situation there isa linear relationship
‘hich exists, either positive or negative
1. The slope parameter may be O when y is a constant value.
2. Asx increases there is no systematic influence ony. They are
completely independent and the data points are randomly
distributed
B. The statistical test fora linear relationship between x any.
1. Use hypothesis testing. Set the null and altemative hypothesis,
Ho:0b 1= 0.4: 11 0,28 divided by the standard eror of.
If we can reject the nll hypothesis then we can conclude there is
4 linear relationship between the two variables,
2. Enter the data into statistical software package which will
caleulate the regression line and all the parameters. Suppose that
‘Testing fora linear relationship in
the reprenion ln 0.000480
vane Catinte standard tra
Boimercep 90000 25.000
Armee 80 3027006
3. Ifthe tratio is high enough we can reject the null hypothesis and
assume a linear relationship exists. Generally speaking a linear
relationship exists when tis larger than two,
4, The p value is the value of «at which the hypothesis test would
change conclusions. Since our tis generally .05, any p value
Jess than ,05 (,006 is Tess than .05) allows us to reject the aul
hypothesis.
191997 The Teaching Company Limited Parmeship
p value
nl.
Mm.
‘The usefulness ofa regression can be measured and quantified.
‘A. The mean square error (MSE) is an estimate of regression error,
‘measuring the variation of the data about the regression line. MSE,
however, depends on the nature of the data
B. R2is arelarive measure that compares the variation of y about the
regression line with the variation without the regression line. The
coeff
ent of determination (R2) isthe proportion ofthe variation in y
that i explained by the regression relationship ofy with x. R2 ranges
from010 +1
C. The regression line always goes through the mean (X,Y). R2
tells you how much work the regression line is doing as x moves
away from and y moves away from ¥ . R?= 0 means thattne
regression ine does not explain the movement away from the mean.
R
reans thatthe line isa perfect fit
Residual analysis ofa regression checks for equality of error variance, tests
for missing variables inthe regression and helps detect if there isa possible
‘curvilinear relationship,
A. Ifthe residuals are plotted, a pattern may emerge known as
hoteroscedasticity in which the residuals get larger as x gets larger (a
funnel shape). This implies thatthe error variance is not equal ard
thus bring into question the validity of the regression. ‘The desire
‘outcome is homoscedasticty in which the residuals are scattered
randomly.
B. Sometimes when the residuals are plotted the points form a linear
pattern, which often indicates that variable should be included in the
‘model. It may also indicate a curvilinear relationship.
Constructing a prediction interval: 9 + interval
‘A. The width ofthe prediction interval depends on the distance of x from
the mean
B, For example, there sa significant linear relationship between January
stock prices and how stock perform forthe year. However the root
mean square error is so large that the regression line sof litle or no
_use in predicting stock prices
{©1997 The Teaching Company Limited Parwership 29Lecture Sixteen
Introduction to Multiple Regression
Questions for Lecture Fifteen
1. Describe in your own words the test for determining whether or not there is
1 regression relationship between x and y.
2. True or False, MSE (Mean Square Error) isa relative measure of how good
the regression fits.
Scope: In this lecture we will provide an introduction to multiple regression.
Multiple regression is an extension of simple linear regression in that
‘more than one independent variable is used in attempting to explain
3. True or False, R? essentially tells you what percentage of the variation in y variation in the dependent variable. We also explore the use of dummy
is explained by the regression line. variables in regression models. Nevertheless, just because a model can
be built, it does not necessarily follow that the model will be good for
prediction. In business situations, statistical modeling is generally not
Explain how residual analysis is used to check the validity of the
Nea . ‘an end in itself, but when analytical and statistical modeling are
5, True or False, Ifthe plot of the residuals against x yields a upside down U- combined with business experience and intuition, more effective
shaped curve, the linear regression is confirmed. decision making will often be the result.
6. You determine that there isa valid regression relationship between
‘movement in January stock prices and the stock price movement for the Outline
entire year. Nevertheless, you determine that your prediction interval is not
useful. How can this be?
7. What is heteroscedasticity?
8, Truc or False. A prediction interval consists of two lines parallel to the
1. When two or more independent variables are included in a regression
‘model, we are using multiple regression.
regression line IL, Parsimony is important in building regression models.
‘A, Given n points, we can find an (n-1) dimensional surface that will fit
Problems for Lecture Fifteen the data perfectly. It is possible to overfit the data by introducing too
Problems | through 3 relate to the following situation, Suppose that a regression many variables.
line for ice eream sales ata ball park has been developed using historical data. B. ot Pixs *Baxa* Bh axa..* Bex
‘The regression equation is: y = 12000+200x, where y represents sales in dollars, C. Utilize the minimum aumber of independent vaiables to get the job
and x represents average temperature in degrees Fahrenheit, aa
1. Does the slope of the regression line appear to be in the direction you
would expect? Explain IIL, ‘The Analysis of Variance (ANOVA) test using data from residential real
estate sales as an example.
2. Whatis the expected diference in ice cream sales a the park between a MATS ANOVA Gn mete ts ee ctu eee
day when the average temperature is 60" and & day when the average relationship between y and any ofthe independent variables?
temperature is 70°? Consider the following data in our example:
3, Would you expect temperature to explain most of the variation in ice eream Resi. | aales rice | square feet| —Toraize
sales atthe park? Explain dential
Essential Reading for Lecture Fifteen eel . e
Acrel, Complete Business Statistics, Chapter 10, Irvin, Third Edition, 1996. ee o:
Recommended Reading for Lecture Fifteen 2 $300,000 2.200 12,000
Hanke and Reitseh, Understanding Business Staristics, Chapter 14, Irwin, 1994. exe? 900000 000 18,000
Mendenhall and Sincich, A Second Course in Business Statistics: Regression “The statistical test or overall test is a follows
Analysis, Chapt 3, Dellen, 1993, Ho: Bi-+f2=0 oF Hy: not all the is are =0. TFall the fis are
equal to zero then the mean of the data set is doing all the
work and the regression is not helping us.
30 (©1997 The Teaching Company Limited Parveeship 161997 The Teaching Company Limited Parership 332
B.
ANOVA is included in most statistical or spreadsheet software
applications. The statistical package runs the regression and
calculations once you've entered the data. The resulting ANOVA.
lable includes source of variation, degrees of freedom (k relates to the
‘number of independent variables in the regression), sums of the
squares (SSR), mean square from the regression (MSR), f-ratio and p-
value.
Source af ss Eratio _p value
Regression _k R MSR 0.010
Ln MSE
Error n{kel) SSE SSE
‘a(kel)
Total nl
1. The Fratio test indicates whether or not there is a regression
relationship between y and any of the independent variables.
‘The higher the F value, the more likely thatthe regression has
explanatory and predictive power. A rough rule of thumb for,
larger sample sizes is that an F ratio greater than five indicates
that there is a rogression relationship between the dependent
variable and at least one of the independent variables.
It should also be remembered that the p-value also needs to be
less than 0.05 to indicate a regression relationship. For example,
inthe ANOVA table above if the p-value were 0.10 you would
conclude that there was not a regression relationship.
ANOVA is important because series oft tests to compare pairs
‘of means are not independent of each other. This is especially
true when there are three or more independent variables. This is
‘due to the fact that one variable may be robbing another variable
of its predictive power. Thus, the ANOVA testis done first in
situations involving multiple regression.
C. Note that we still need separate tests to determine which ofthe slope
parameters are different from 0. In this case {tests have been uscd:
Variables Estimate [Standard |evalue |p value
Error of
Estimate
Constant 36,000
Xt 70 2 58 <0.001
Xz 7 34 2A 047
‘Since the model passed the overall F test there isa relationship
between the variables, Both of the independent variables, and
X2 should be included in the model since p<0.05,
‘The model would be $ = 36,000 + 70x, + 7x2
£01997 The Teaching Company Limited Patersbip
3. To predict the price of a piece of residential real estate with 2
2,000 square foot house and # 10,000 square foot lot, substitute
X;=2,000, X=10,000. The regression model equation
‘calculates the sales price as follows:
16,000+70(2,000) + 7(1,000) = $246,000
IV. The usefulness and accuracy of the multiple regression is indicated by the
root mean square error and the R? value.
A.
B.
‘The mean square error (MSE) estimates the population square error.
‘The root mean square error (SE) is (MSE . The SE is generally used
as a multiplier in the prediction interval
2, which corresponds to the multiple coefficient of determination
‘measures the proportion of variation explained by the regression
‘model. R2 tends to go up as more variables are included.
V. Dummy variables are also used in a regression. In a dummy variable the
‘switeh” is either on or off; the value is either O or 1
AL
B.
c.
‘A dummy of indicator variable expresses levels of a quality, such as
whether the house is on a golf course, type of coffee or genre of
Use of a dummy variable in regression analysis is straightforward.
‘Simply code the indicator variable to ifthe level is obtained or to 0
ifthe level is not obtained.
Consider the regression equation:
y=Bo* fini * fh axo* fh 3x3, Let x3 represent whether or not the house
ison. golfcourse, Ifthe house ison the golf course 3a. Ths in
the following regression equation y=$40,000—85) + 10x +
50,0003. The dummy variable x adds $50,000 to the sales pric if
the house is located on the golf course.
(©1997 The Teaching Comoany Limited Protein ”Questions for Lecture Sixteen
Explain what is meant by parsimony in building a multiple regression
mode!
‘True or False. The maximum number of independent variables that should
be used in multiple regression is three.
Multiple regression often provides a more adequate way of modeling
‘complex business situations than simple linear regression, Explain this
statement.
4, True of False. The Analysis of Variance (ANOVA) table is used to
determine which of the independent variables have a regression
relationship with the dependent variable.
Assume you are attempting to build a multiple regression model to explain
the price of properties in a real estate development located near a golf
‘course, What are some of the independent variables you might use?
True or False. Unlike simple linear regression where R® must be less than
1, in multiple regression, it is possible for R? to be greater than 1.
‘What are dummy variables and why are they coded as 0 or 1?
‘Suppose you are attempting to build a regression model to explain box
office sales for upcoming movies. Which of the following are dummy
variables: production cost budget, advertising budget, whether or not a
‘major star is in the film, whether or not the film isa sequel
Essential Reading for Lecture Sixteen
Aczel, Complete Business Statistics, Chapter 11, Irwin, Third Edition, 1996.
Recommended Reading for Lecture Sixteen
Hanke and Reitsch, Understanding Business Statistics, Chapter 15, Irwin, 1994.
Mendenhall and Sincich, A Second Course in Business Statistics: Regression
Analysis, Chapter 4, Dellen, 1993.
34 {©1997 Te Teaching Company Limited Partership
Answers
(©1997 The Teaching Company Limit Parmership
352
3.
4.
5.
6.
36
Answers to Questions for Lecture Nine
‘A random sample provides a “representative” sample; using a random
sample, you can often describe how your resulls differ from those of the
population,
‘A parameter is a number computed for the entie population.
{A statistic is number computed from your sample data.
False
False
A sampling distribution lists, for each possible value of the statistic, the .
fraction ofall possible samples with a given value. 2
No. The units must also be chosen independently.
Answers to Questions for Lecture Ten
Many data sets we work with in business will be normally distributed.
Other data sets will not be normally distributed. However, given the central L
limit theorem, the distributions of means or sums of the data will be
approximately normal if our sample is large enough,
False
‘When sampling from a population, the distribution of means will tend
toward a normal distribution as the sample size gets large
True
True
Because ofthe central limit theorem
No
Answers to Problems for Lecture Ten
Yes. You can use the normal distribution to approximate the binomial,
since np(1-p) is large (greater than 5)
About 90.1% (using a normal distribution table)
About 21.5% (using a normal distribution table)
(©1997 The Teaching Company Limited Parneship
Answers to Questions for Lecture Eleven
‘An interval of numbers within which we expect the true value of the
population parameter to lie
Tre
‘The sample size is large enough so that the central limit theorem can be
applied
‘A wider confidence interval
Tre
False.
Given the possiblity of very remote events, a 100% confidence interval (if
obtainable) is too large to be useful
Answers to Problems for Lecture Eleven.
From $24.85 to $25.15.
From $24.88 to $25.12. Note that going to a 95% confidence interval does
not “cost” you much in interval width, given the large sample size.
People who send in for rebates may not be a random sample of your
customers.
(©1007 The Tesching Company Lin5.
3.
38
Answers to Questions for Lecture Twelve
‘This isthe fraction of all confidence intervals that would include the true
value of the population parameter
‘The area under the curve that excludes the tails
‘The area in one tail of the distribution
Yes, this is a confidence interval to estimate p, the population proportion,
True
Answers to Problems for Lecture Twelve
48%
44.3% 10 SL7%
43.6% 10 52.4%
‘We are 95% sure that between 43.6% and 52.4% of our customers like the
new pizza topping
(©1997 The Teaching Company Limit Parmership
Answers to Questions for Lecture Thirteen
‘What is claimed to be correet~ the status quo
‘The alternative hypothesis competes with the null
‘The chances of rejecting the mull hypothesis when itis indeed true
Failing to reject the null hypothesis when its false
‘When you are testing a specific claim for a population parameter
Answers to Problems for Lecture Thirteen
[Null hypothesis: mean = 35; alternative hypothesis: mean (1 35
‘The z-statistic
Rejection region: 2<-1.96 or 7>1.96
20
Yes. Since z* falls in the rejection region, we conclude that there is
‘evidence that the average number of tissues used is not 40,
(©1997 The Teaching Company Limited Parersiip
396
40
Answers to Questions for Lecture Fourteen
‘The purpose of linear regression is to provide a “best model” fora straight
line relationship between two variables.
False
‘The value of the dependent variable when the independent variable is equal
100.
‘This will occur when increasing values of the independent variable are
associated with decreasing values of the dependent variable. For example,
using age to predict the time that it takes adults to run a 100 yard dash may
produce a negative slope parameter estimate.
‘True
True
With correlation, we assume that both x and y are random variables,
‘whereas with regression we assume that x is not random.
True
With 0 correlation, there will be litle if any association between the two
variables, An example might be height and intelligence of company
CEO's,
(©1997 The Teaching Company Limited Parnership
Answers to Questions for Lecture Fifteen
"The test is a t-test that examines whether or not the slope parameter is
equal t0 0.
False
True
‘Asx increases, check the residuals to see ifthe error variance is staying
approximately constant.
False
‘The root mean square error may be large, and the prediction interval may
be too large to be useful
Unequal error variance
False
Answers to Problems for Lecture Fifteen
Yes. It makes sense for sales to go up as temperature rises.
$2,000
Not necessarily. Other factors such as attendance may be very important.
(©1997 The Teachine Comouny Limited Pacnershio
ata
Answers to Questions for Lecture Sixteen
Building a good regression model withthe minimum number of
independent variables
False
‘Many business variables (such as sales) are complex and are better
explained by using more than one independent variable.
False
Lot size, interior square footage, number of bedrooms, and whether or not
the property is on the golf course are some examples.
False
Dummy variables are used to indicate whether or not a quality is present or
not. A value of O means that quality is not present, and a value of I means
the quality is present
Whether or not a major star isin the film, whether or not the film is a
sequel
(©1997 The Teaching Company Limited Partnership
Bibliography
Acrel, Complete Business Statistics, Irwin, Third Falition, 1996,
Clemen, Making Hard Decisions, PWS-Kent, 1991
Cochran, Sampling Techniques, Wiley, 1973.
Crystal Ball Users Manual, Decisioneering, 1995.
Deming, “On Probability as a Basis for Action,” American Statistician, Vol. 29,
1975, 146-152.
Derman, Gleser, and Olkin, A Guide to Probability Theory and Applications,
Holt, Rinehart and Winston, 1973.
Hanke and Reitsch, Understanding Business Statistics, Irwin, 1994,
‘Mendenhall and Sincich, A Second Course in Business Statistics: Regression
Analysis, Chapter 4, Dellen, 1993,
Schleifer and Bell, Data Analysis, Regression, and Forecasting, Chapter 2,
Course Technology, 1995.
Winston, Simulation Modeling using @ Risk, Duxbury, 1995.
{1007 The Teaching Carn eed Payeehin *