0% found this document useful (0 votes)
888 views51 pages

Business Statistics

Business Statistics

Uploaded by

MihaiPopescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
888 views51 pages

Business Statistics

Business Statistics

Uploaded by

MihaiPopescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 51
COURSE GUIDEBOOK COURSE GUIDEBOOK Business Statistics Professor George T. Geis University of California at Los Angeles Part I Business Statistics Part Lecture 1: Overview of Probability and Statisties Lecture 2: Descriptive Statistics Lecture 3: Probability Concepts Lecture & rent Probabilities Lecture 5: § Lecture 6: Random Variables Lecture 7: The Binomial and Poisson Distributions Lecture 8: The Normal Distribution S195) “J, 81095 s0ssagory [38q ‘sonspeag ssaursng 1-800-TEACH-12 1-800-832-2412 Tiae TracwiG Connany 545 www TEACH 12.com Table of Contents Business Statistics Part | Professor Blography Purpose of Course Lecture One: Overview of Probability and Statistics Lecture Twos Descriptive Statistics Lecture Three: Probability Concepts Lecture Four: Combining Event Probables Lecture Five: Simulating Business Situations Lecture Six: Random Variables Lecture Seven: The Binomial and Poisson Distribution Lecture Kight: The Normal Distribution Answers Bibliography (01997 The Teaching Company Limited Partnersip 10 7 a 7 35 » a3 82 George T. Geis, Ph.D. Anderson Graduate School of Management University of California at Los Angeles George T. Geis was born in Chicago, Illinois, in 1944, He received a B.S. “summa cum laude” with “Honors in Mathematics" from Purdue University in 1966. Dr. Geis earned his Ph.D. in 1977 at the University of Southern California and his MBA from University of California, Los Angeles in 1981. Dr. Geis was @ National Science Foundation and Woodrow Wilson Honorary Fellow. In the field of Finance, he has been honored with the Financial Executives Institute Award for outstanding achievement, During his teaching carcer as an Adjunct Professor atthe Anderson Graduate ‘School of Management at UCLA, Professor Geis has been voted outstanding teacher three times. His academic experiences include serving as Research Coordinator atthe Center for Human Resource Management. Presently, he is serving as a member of the faculty advisory board for the Entrepreneurial Studies program at UCLA. Geis is also an author. He has published dozens of professional articles and five books. His books include Desktop Computing and the Essence of Management (Prentice Hall, 1990) and Micromanaging (Prentice Hall, 1987), Currently, he is the application of computer technology to visually represent dynamics in converging technology and communication markets and the use of interactive media in illustrating statistieal analysis, He has extensive consulting experience and is a frequent lecturer on emerging tends in the computer, communications, and media markets. In his spare time Professor Geis plays three-on-three basketball, struggles to lower his golf handicap, and paints his seven-color Victorian-style home in Pasadena, California. 2 (©1097 the Teaching Company Limited Parmership. Business Statistics Purpose of the Course In our tightly wired world, business executives make decisions under pressure. Almost always, these decisions must be made with less than complete information. This course is about how to effectively use data that is currently available (or can be obtained within a reasonable time frame and cost) to improve business decision-making. We will use business examples from functional areas such as finance, marketing, human resources, and operations to ilustrate the role of data analysis in decision making. This course is not designed to be a dry sleepy-time set of abstract, ‘mathematical lectures. My goal is to make statistics come alive inthe context of life and in the context of real business problems demanding solution. ‘Quantitative methods such as statistical analysis must not be viewed as the be-all and end-all of decision making. The vial role that seasoned business intuition plays in effective decision making can not be overemphasized. Nevertheless, analytical techniques are a central part of many decisions. In fact, we illustrate in this course how statisties and probability can effectively work together with ‘managerial intuition in business problem solving. ‘The advent of personal computer statistical software that readily generates visual representations of data and performs sophisticated analyses enables a manager to ‘concentrate on the meaning of data. The burden of computation has largely been eliminated, and business people are now free to focus on probing issues and searching for creative solutions. In this course, we illustrate the use of computer- {generated output that promotes visualization of data ‘Students tell me that statistics was obscure and inaccessible for them as undergraduates, On the first day of class, they enter my MBA course on Statistics and Data Analysis prepared for the worst, Fortunately, Iam often able to help them build intuition for statistics, appreciate how the content can be applied and actually enjoy the experience. Whatever, previous experience you have had with statistics (if any), our main ‘objective will be to make the content useful to you in business decision-making aand relevant to decisions we all make in everyday life. ****In addition to questions at the end of each lecture, problems have been provided where relevant. For you convenience answers are available at the end of this outline. Statistical Software Credits For further information on Crystal Ball Software please contact Decisioneering Ine. 1,800, 289. 2550 1.308.337.3560 (F) IMP-IN 3 for Windows © is available from Duxbury Press, ‘An International Thompson Publishing Company Belmont, CA 1-800-876-2350 Images Copyrighted New Visions Technologies Inc. Al rights reserved. No part of this book may be reproduced in any manner ‘whatsoever without written permission except in the case of brief quotations ‘embodied in critical articles and reviews. For information, send complete description of intended use to The Teaching Company/Rights and Permissions, 7405 Alban Station Ct. Suite B-215, Springfield, VA 22150, USA 4 (©1997 The Teaching Company Limited Parmership Lecture One Overview of Probability and Stati ics Scope: Professor Geis explains how skill in obtaining and analyzing data can provide a business leader with significant competitive advantage. Effective decision-making is not over quantified but takes into account experience and intuition as factors. Decisions must almost always be ‘made with less than complete data. Therefore one must make decisions ‘based on inferences from the data one does have. Statistics help in developing a model for refining business decisions. Professor Geis compares and contrasts statistics with probability. He also details and illustrates the major activities of statistical analysis. Outline Statistics can provide a substantial business advantage. ‘A. Statistical analysis refines decision-making and choices in business. ‘Meaningful information gives a business a distinct advantage in a competitive world. Rapid decisions must almost always be made with Jess than complete data. Therefore one must make decisions based on inferences from the data one does have. B. Statistics helps in effectively analyzing data as well asin developing a ‘model for refining business decisions. We build models to simplify ‘complex business situations. Effective decision-making is not over- ‘quantified but takes into account experience and intuition as facto, Statistics takes inert data and brings it to life. |A. Statistics involves collecting, analyzing, and understanding data for effective decision-making. Statistics make sense ofthe big picture. The key characteristics of the data provide insight into the problem under consideration, Good statistical analysis pays attention to the outlier. An outlier isa data point that falls away from most ofthe others. The outlier(s) may be ‘included, excommunicated, or accommodated. An outlier may indicate thatthe model is not adequate or expansive enough. On the other band, the outlier may be so unusual that it really does not belong in our data C. Business statistics may utilize a variety of data sets. Some examples of data sets from the business world include: financial, marketing, attendance patterns, production quality, human resources, Data analysis cculs across all the functional areas of business. er tee teehee apes tM : IIL. The core activities of statistics concern themselves with data collection, representation and usage. ‘A. It is necessary to design a plan for data collection. Data may be acquired from another source, known as downloading, Data may also be developed by gathering information, B. The aim of descriptive statistics isto represent the data or results of research in tabular, graphical, or numerical form, The data must be summarized in some way in order to describe and visualize it, The key characteristics ofa set of data emerge and provide a picture of the situation. C. Inferential statistics refers to a group of methods used to draw inferences about a population from data available on a sample of the population. Inferential statistics moves from a sample group analysis to {draw conclusions about a parameter inthe population at large. ). Hypothesis testing seeks to determine which of two competing ideas is correct. It addresses a problem by testing an expected solution to the problem. An example of hypothesis testing in the business world might be: Is the current production quality under control or out of control? E. A forecasting model predicts what is likely to oceur in a future situation based on a number of different factors. For example, statistics ean be used to predict what the sales per square foot will be in a new restaurant based on location, demographies, and other factors. IV. Probability is a number expressing the likelihood that a specific event will occur, expressed as the ratio of the number of actual occurrences to the ‘umber of possible occurrences. A. Probability is used to predict a future outcome. Based on assumptions about how the world works, probability quantifies the likelihood of a future outcome. B. We can view probability asthe “inverse” of statisies. Statistics begins ‘with what is observable and draws conclusions about how the world ‘works. Probability starts with a view of how the world works and tries to forecast what will occur. \V. Classifying data sets makes the information they contain more useful 6 ‘A. Data sets consist of measurements For individual records called clementary units. Example: salaries for individual employees. B, Data sets may be classified according to the number of variables for each elementary unit as univariate, bivariate, or multivariate. Univariate data sets concern themselves with one measured variable. Bivariate data sets contain two measurements for each elementary unit. Multivariate data sets contain three or more measurements for the data set. 01997 The Teaching wnpany Limited Partestip C. Data sets can be classified by whether or not time sequence is important, 1. Time is important in time series data sets since time is one of the dimensions. The daily Dow Jones Average is an example of ime series data ‘Time sequence is not considered in cross-sectional data. The data is ‘only meaningful for one slice of time. One example might be the college ranking of basketball teams. A measurement is given for ‘each of the elementary units for a particular slice of time. D. Data sets can be classified by the kind of measurements recorded for ceach unit. 1. Quantitative data can either be discrete or continuous. a, Discrete data ean be counted. For example, How many times did you drink coffee today? b. Continuous data is any variable that cannot be listed as a discreet number. Time and distance are common examples of ‘continuous data, 2. Qualitative data is classified as ordinal or nominal 8. Ordinal data uses a scale to rank or order objects or persons on ‘a continuum. This provides information about the rank order (on a particular variable. These numbers have meaning in the ranking but they are not generally added or subtracted. . Nominal data uses numbers to stand for names or categories representing the way objects or persons differ. Common, ‘examples of nominal data include sex (male or female), ace, tc. There is no implied ranking. Refer to Dodger Data Set Category ‘Type of data Games attended quantitative, disereet Age quantitative, discreet Firsteage quantitative, discreet Race qualitative, nominal Distance ‘quantitative, continuous Sex qualitative, nominal Refer to Dodger Attendance Histogram The histogram is a useful way of visualizing the data, This histogram helps portray the fact that there are two different groups of fans: the regular fans and the season holder or “super” fan Slicing the Dodger Data Set Games Age First-age Race 1 18 6 c 5 32 12 H 3 4% 2 oC 4 477 c 35 42 7 H Distance Sex 10 M 5 M 20 M 25 F 30 M Dodger Attendance Histogram Games/Season O2ecen Nw om a oo ee eo (©1997 The Teaching Company Limited Partnership Frequency Questions for Lecture One 1, True or False: A model is essentially a reduction technique for making. sense out of one’s world, 2. Define statistics. 3. Whatis an outlier? 4. Statistics cuts across all functional areas of business. Explain. 8. Distinguish between inferential and descriptive statistics 6. In what sense is probability the “inverse” of statistics? 7. You are developing a data set that provides prices of an IPO (Initial Public Offering) at issue, one month after issue, and six months after issue. Is this data set univariate, bivariate, or multivariate? 8. Is addata set that provides NCAA college basketball rankings as of 1/15/97 time series or cross sectional? 9. An investment bank ranks stocks I, 2, or 3 (I being most timely for investment, 3 being least timely). Should this data be thought of as nominal, ordinal or quantitative? Essential Reading for Lecture One Aczel, Complete Business Statistics, Chapter 1, Irwin, Third Edition, 1996. Recommended Reading for Lecture One 2 Probability as a Basis for Action,” American Statistician, Vol. 29, 1975, 146-152. Hanke and Reitsch, Understanding Business Statistics, Chapter 1, Irwin, 1994. Demin (©1997 The Teaching Company Lime # ass Lecture Two Descriptive Statistics Scope: In this lecture we discuss the purpose of descriptive statistics, What are some important ways to view and summarize data? Why is variability so important in analyzing a business situation? Outline 1. Descriptive statistics portrays and summarizes a set of data so that its key characteristics become evident. Raw data is transformed into useful, refined information. TL. Various graphing techniques have been developed for viewing and summarizing data ‘A. Bar charts and histograms use the area of rectangles or bars to portray differences and trends in data. Bar charts usually portray the measure of frequency of a qualitative variable, Histograms visualize how a quantitative variable is distributed 1, ‘The bar chart below answers the question, How may films a year do each of the film companies produce? bar chart #films cc 20 18 28 ist 12 10 5 ° MGM Fox Disney 0 (©1097 The Teaching Company Limited Parmer 2, This histogram starts with the quantitative variable, in this case, salary levels. It visualizes how the salary levels are distributed and the likelihood of being in a certain category. Histogram 30% 25% 20% 19% 0% &% $80,000 $100,000 $120,000 Likelihood of salary levels B. Contingency tables use the cross tabulation of occurrences in two «dimensions to display key characteristics, Ben Pranklin had as a goal perfection in thirteen identified virtues. He kept a record of his violations of these virtues with a eross tabulation of the virtue and day of the week, vintuE [Su [M__[T |W [th |r _|Sa 1 _[ Temper-ance W 7 Zi 2 [Silence 7 7 3 [Onder a a_i ati A A SL (©1097 The Teaching Company Limited! Parersip u A scattergram plots points in two dimensions and then tries to ft the da in some way toa line or curve. A production curve is a common application of a scattergram. An outlier is a data point that does not lie fon the curve. seattergram EPR. L o 1 2 3 4 ora D. A time series graph tracks the movement of some variable across the dimension of time. An example might be the share price of Netscape over the past year. From a time series graph we can try to visualize what triggered some of the events represented by the upturns or downturns, ‘Time series graph 18 star f}———§\\_e price 5 OE June - August IIL Statistics describes data by selecting one or more values to summarize the tire data set. Very select values try to describe the entire data set as a whole using only one or two numbers. The spread of the data set may also be visualized to provide an at-a-glance summary in a graph like a box plot or ogive. ‘A, In statistics we would like to speak with some confidence about the Population at large. It is important to understand the difference between 4 population parameter versus a sample statistic. b ©1997 The Teaching Company Limited Panes A population parameter is a numerical measure for the entire Population. An example might be the average age of all possible purchasers. The Greek letter Ht stands for the population parumeter. 2. A sample statistic is a numerical measure of a sample group. The sample is taken from the population and is an estimate of H. The symbol, X , read x-bar represents the average of the sample statistic. sample sample statistic average age of sample coup. population im Population parameter-actual average age B. When a statistic is used to estimate a parameter, it is called an estimator. ‘The sample mean, X isa statistic which is used as an estimator of the population mean, [1. The closer X is to JL the better an estimator itis For example }1 might be the average age of a customer base. X is calculated from a sample so that one can approximate with confidence the actual average age, [. C. There are several ways of measuring the central tendency of a set of data, tis useful to have one number to represent the data set. The mean, ‘median, and mode are different ways of summarizing a data set, The summary measure used effects the number chosen to represent the data, For example, use a data set with customer ages of 30, 30, 40, 50, 100 years 1, The means the arithmetic average. The ages are all added together and divided by the number of customers. 250 divided by 5 equals 50. So the mean age is 50 years. In this case the extreme value of the 100 year old customer may unduly effect the mean, The median is the central data point when in ordered from least to sreatest In this case the median age is 40 years, 3. The mode is the most common data point. In this case the mode is 30 years. There can be no mode, one mode, or multiple modes. The box plot is a useful visual data summary because it makes obvious the median, the middle fifty percent of the data, the least and greatest data points and any outliers, as well as how the data is spread out. BOXPLOT 75th * + outiter percentile Ft wNargest Ne non-outlier <— median 4 smallest an non-outlier percentile 1. The box plot conta coming out ofthe box. 2, The line in middle is the median valu 3. The op ofthe box isthe 75th percentile; the bottom is 25th percentile. The center of the data is contained within the box. J. One of the two “whiskers” extends from largest non-outiet to the box ,a line through the box, and whiskers, ‘75th percentile edge of the box. The other whisker extends from the smallest non-outlier to the 25th percentile edge of the box. 5. Any outliers are indicated by a star E. A cumilative-frequeney graph or ogive is a graph that builds up ioward the right with each increment including the data preceding it. 1. Discrete data, for example age, looks like steps in an ogive. 2, Continuous data looks more continuous. loom Cumulative-requency graph or som 4 Osive 20% o% 10 2030.40 50 60 70 80 90 100 ‘Age in years IV. Variability takes into account the differences in response. It is helpful to know how close the data set isto the mean, v. A. B. Variability has a central role of statistical analysis. ‘The traditional choice for measuring variability is the standard deviation, The standard deviation is a measure of how close each ofthe data points are on average to the mean, The standard deviation is given by the mathematical relationship below. ‘A mx. XY Most data points tend to be within two to three standard deviations of the mean. Some way-out events like a stock market crash may be twenty standard deviations from the mean. ‘There are other summary measures that are useful for describing data set, A B. Skewness measures the asymmetry oF a distribution. Therefore, ifthe data plot it tends to have a tal in which the curve goes up but is rot symmetrical. There can be right-skewed and left-skewed distribuions. ‘The bulk ofthe data is opposite the tal or skew. Kurtosis measures of peakedness of a distribution, The shape of distribution can range from very peaked to very flat. 1. Leptokurdic describes a distribution that is very peaked. In this case the data is clumped together. 2. Platykurdic describes a distribution that is very fla. In this case the data is very spread out. £01997 The Teochine Company Limited Parershio Is (Questions for Lecture Two Lecture Three 1. Describe the purpose of descriptive statistics in business. Probability Concepts 2. What does the height of each bar ina histogram represent? 3. True or False. A scatergram provides across tabulation ofthe data by ke Scope: Probability plays an important role in analyzing business situations and ‘characteristics. van ad in refining intuition. In business situations, complexity is often the norm. Therefore, in order for a model to be adequate, it must have some room for complexity and subtlety. Probability helps us move from the known to the unknown and to assess the likelihood of future evens. Distinguish between a parameter and a statistic. ‘What does the 95th percentile ina data set represent? ‘True or False. A box plot serves a similar purpose asa histogram ~ 10 Various ways of determining probability will be introduced in this provide a visual image ofthe distribution ofthe data lecture. 7. ‘True or False. The standard deviation summarizes how far from the median the data typically are? Outline eee 1. The goal of probability i to understand what is likely to happen in uncertain Problems for Lecture Two future situations within “known” systems. Good business decision making uses probability o obtain assessments of various possibilities: competitors ‘The following data represent monthly housing rents paid by employees in your centering the market, cost patterns, cash flows, etcetera. accounting department: $500, 600, 600, 600, 700, 800, 800, 900, 900, and 3,000. Answer the following questions relating to this dataset 1. Calculate the mean, mode, and median values of the data set A. Probability starts with the known and moves to the unknown, B. If. good model is developed then we have a good way to make predictions 2 Draw a histogram ofthe data ‘The difference between probability and statistics may be ilustrated as 3. _ Is the data skewed to the right or skewed tothe left? What does this follows skewness mean? 4. What is the standard deviation ofthe data sot? [What happened? > SSTATISTIOS> How the word works 5. What is the standard deviation if the highest rent (arguably an outlier) is excluded? How the world works > PROBABILITY >What will hap IL, To predict the future a model for refining intuition must be developed. The following flowchart serves this purpose. Essential Reading for Lecture Two pape Aczel, Complete Business Statistics, Chapter 1, Irwin, Third Elion 1996, Model for Refining Intuition Recommended Reading for Lecture Two TTT ct standing Business Statistics, Chapter 4, Irwin, ‘Analytical Hanke and Reitsch, Understanding Business Statistics, Chapter 4, Irwin, 1994 Velleman and Hoaglin, Applications, Basics, and Computing of Exploratory ‘Access to Data Analysis, Duxbury, 1981. Information, Raw Refined. Implemented net Intuition [—>] Intuition ‘Access to Expertise 16 (©1007 The Teaching Company Limited Partnership Cee eee eae » IIL, Experimental probabil xy provides a basis for probability judgments. The results of random experiments or customer surveys help in the decision ‘making process, Desired product features can be determined by the information obtained. ‘A. The random experiment: procedure that produces an outcome that can't be perfectly predicted. B. The sample space: a list ofall possible outcomes of the experiment. For ‘example, the sample space for flipping a coin one time is (H, T}; for rolling a die is (1, 2,3, 4, 5,6} C. The outcome: the result that occurs each time a random experiment is run, For example, the outcome of five coin flips may be heads, heads, tails, heads, tals. D. The event: a collection of outcomes specified in advance; a subset of the sample space. For example, when flipping a coin, we may be interested in the event heads. IV. Probabily related to events helps deal with complex situations. A. Probability indicates how likely itis an event isto occur by a numerical value between zero and one. This value may be expressed as a fraction, decimal or percent. A probability of zero means i isnot ever going to happen. A probability of one means the event is always going t0 happen. Even a very unlikely event might occur over a long period of time. We have to keep in mind the time horizon in calculating probability. B. The probability of an event can be expressed in terms of “the odds”. For example, in a horse race, a certain horse may have 9to | odds. The 1 probability ofthat horse winning is p=7q_ as calculated by the formula: eel P= Pst) where p=probability and P+ refers to the odds 1 1 With 1 to 1 odds p=3 ; with 310 I odds p= (01997 The Teaching Company LinitedParers! C, There are various ways of deriving probability. 1. _Ineexperimentation the relative frequency is expressed as a ratio, limes the event occurs ce #erals For example, iftwo thousand golf clubs are sampled and three are found to be defective, the probability of a defective golf club would 3 be expressed as p=3q5 = 0.0015 or less than 1%. 2. Sometimes history will not be a good predictor ofthe future since the future may deviate from the past. Probability may be estimated subjectively based on experience and intuition. The calculated relative frequency may or may not be accurate dependening on the ‘expertise of the individual 3. Counting possibilities is another way of determining probability, For example, take the chances of getting two daughters ina three- child family. Let m=son, f=daughter. Make a list all the possible outcomes: mmm, mmf, mfm, mff, fmm, ffi, fmf, fff b. Determine which outcomes correspond to a two-daughter family: mf, fim, fmf. ‘e. Express the probability asthe ratio of number of two-daughter 4. Mathematial calculation of probability lets us count withou counting. For example, if there is «60% chance of sinking afre- throw, the probability of sinking ten in a row is given by p= (0.6)!0 = 0.006. This is equivalent to 6 times in a thousand tries, 'V. We can not necessarily predict the future but probability provides a means of quantifying what is likely to happen. Questions for Lecture Three Describe the overall goal of using probability analysis in business situations ‘True or False. The sample space is a listing of al possible outcomes of a random experiment, 3. What is the difference between an event and an outcome? 4. True of False. The probability of an event, a number between -I and +1, expresses how likely itis that an event will occur. ‘You want to estimate the probability of a film doing more than $100 million atthe box office (ticket sales) and ask an industry expert for her opinion. What means of deriving probability are you using? 6. Ifyou examine a database of last year's movies to estimate the probability of the film doing $100 million in box office, what means of deriving probability are you using? Problems for Lecture Three Ifa racehorse has been given 5 to 1 odds of winning, what is the implied probability thatthe horse will win? 2. Ifyou are a 70% free throw shooter in basketball, what are the chances that you will “sink” 10 free throws in a row? 3. Say your chances of picking a “winner” stock is 50% and the chances of picking a “loser” stock is 50%. If you select four stocks, what are your chances of picking exactly three winners? Essential Reading for Lecture Three Aczel, Complete Business Statistics, Chapter 2, Irwin, Third Edition 1996. ‘Recommended Reading for Lecture Three Clemen, Making Hard Decisions, Chapter 7, PWS-Kent, 1991, Hanke and Reitsch, Understanding Business Statistics, Chapter 5, Irwin, 1994, 20 (01997 The Tesching Company Limited Paresh Lecture Four Combining Event Probabilities Scope: In this lecture we discuss how to obtain probabilities associated with ‘more complex events. We introduce the notion of simulation, how it relates to probability, and how it can be used in business decisior ‘making. Business situations are usually complex. We develop the intuition behind conditional probability, independence, and mutual exclusivity. Outli 1. Simulation provides the means for forecasting success or failure, potential revenue, of how the market may respond to a new product launch, Monte Carlo simulation is used extensively in business to deal with uncertainty. ‘There are many popular software packages available that incorporate Monte Carlo simulation inside a spreadsheet. A. There are three steps in the Monte Carlo simulation, A. Generate assumption cells 2. Given the assumed values above, calculate the value inside a spreadsheet, perhaps a hundred, or a thousand times, This develops a tracking mechanism for the range of possibilities. 3, Display the range of possibilities in a forecast chart. The forecast chart shows the range as well as the probability corresponding to each potential value, B. Refer to the revenue forecast model below. Each entry also has behind it a range of assumed values. The distribution of these values can vary. For ‘example the price assumption could be a triangular distribution ‘Competitor entry can also be an assumed value. Revenue Forecast Model evens smulon: Product La ect: NoCompettor, — 0 Pre: Conpetior 0 ‘onune No Congetior 0 Vonane We Cnet 0 Compa Et? ° Sales Pe 0 C. Refer to the Forecast Chart below. This simulates the cash flow resulting from a project launch. Simulation goes beyond stipulating the range ; it also gives the probability associated with each value in that range. Forecast Chart Forecast: 97 Revenue Fraquoncy Chart 500 Trials Show ” , | sux” sao ox se eran arg tom 34 aK TL. Venn diagrams depict probabilities spatially, ‘A. The sample space, X, is a rectangular region containing the entire range of possibilities of what could happen. x 2 (©1997 The Teaching Company Limited Partnership B. An event, A, is represented spatially by a circular area. Everything inside the circle corresponds tothe desired event. Not-A is represented by everything outside the circle. A and -A are complements x . Intersections occurs when there is an event, A, and an event, B. The two circular areas overlap. The area that they share in common represeats both events occurring, To calculate the probability of A or B occurring, do not add the probabilities together. The area in common must be subtracted or it will be counted twice. Therefore, p(A or B) = p(A) + p(B) - p(A-B) x (©1997 The Teaching Company Limited Partnership 2 D, Mutually exclusive events, A and B, share no common points. For ‘example, A could represent an earthquake occurring in the next 30 seconds. B could represent no earthquakes today. Ifevent A occurs, itis impossible for event B to occur atthe same time and vice versa. IIL. Conditional probability when you revise the probability of an event to reflect information that another event has occurred. A. Probability of A, given B is expressed as p(A(B).. B. For example, a sales price will be affected by whether or not a competitor centers the market, TV. Independent events occur when one event does not have any relationship to another. What happened in the past does not impact or change the future A. For example, the roulette wheel is not influenced by what occurred before, whereas blackjack is since the cards dealt affect what cards B. To calculate the probability of A and B, multiply. P(A and B) = p(A)p(B) C. If two events are mutually exclusive, then almost certainly they are not independent. One event docs have a relationship on the other. 'Y. Check your understanding of combining information about events with the following problem: ‘You apply for four jobs and have a 1/4 chance of getting each one. assuming independence, what are the chances of getting at least one” A. Since the events are not mutually exclusive, you might be offered more than one job, one job, or no job at all, So you cannot merely add the probabilities together or muitiply them, 2 (©1997 The Teaching Company Limited Parertip L b.wt i ty meio en Thepramnny ofotentagenhot eons S03 S42. Ehret edger erect 3 qrsinaly Soe ‘getting a job from one. 1 probability of getting atleast one job offer, or maybe more, is 68%. (Questions for Lecture Four Why ie it important in business tobe able to analyze probabilities that involve combinations of events? ‘True or False. Monte Carlo simulation is one way to effectively combine probability information about events. Define what is meant by conditional probability. Explain what i Define independent events in your own words. ‘True or False, Iftwo events are mutually exclusive, then they are independent. implied if two events are mutually exclusive. Problems for Lecture Four ‘The following problems relate to this situation. Assume you apply to three colleges and that you have a 1/3 chance of getting into each one. Assume independence. 1. Are the chance of being admitted to atleast one college 1B+1/3+1/3 = 17 Why or why not? 2. Are the chances of being admitted to at least one college 1/3 x 1/3 x 1 1/27? Why or why not? 3. What are the chances of being admitted to atleast one college? 4. What are the chances of being turned down by all colleges? Essential Reading for Lecture Four Aczel, Complete Business Statistics, Chapter 2, Irwin, Third Edition 1996, Recommended Reading for Lecture Four Clemen, Making Hard Decisions, Chapter 7, PWS-Kent, 1991. Hanke and Reitsch, Understanding Business Statistics, Chapter 5, Irwin, 1994, 26 (©1997 The Teaching Company Limited Parmership Lecture Five Simulating Business Situations Scope: In this lecture we show how simulation builds on our understanding of probability. We review the steps in setting up a Monte Carlo simulation Understanding how probabilities work and distributions are built is an essential clement to building a good simulation model. The model is good to the extent that it approximates the reality of the business situation that it represents. Since it isa reduetion tool, no model is perfect, but it helps us to deal with uncestainties in trying to predict the future, Simulation models help us deal with risks and make a decision using analytical as well as intuitive power. Outline 1. Simulation is a useful technique for modeling business situations with uncertain conditions. ‘A, Assumptions are built into each cell of the simulation model, which is usually run inside ofa spreadsheet. The probability of a given value ‘coming up will be driven by the distribution which sits behind each assumption. BB. Historical information along with other factors is used to obtain the probability estimates for the distributions. TI, Monte Carlo simulation was developed in the 1940's by John Von "Neumann who used it in physics. Now Monte Carlo simulation has many other applications, including business, Random numbers are selected for cach assumption cell drawn from the related distribution, ‘A. The steps involved in Monte Carlo simulation 1.Generating random numbers to conform model assumptions 2.Caleulating one iteration (recalculation of the model) of the event ‘3.Displaying simulation results in a forecast chart B. Monte Carlo simulation is not the only technique for modeling. TIL. Simulation is useful in business decision as a product can have many lives ‘on paper before final decisions are made. Business people often think in terms of a triangular distribution: worst case, most likely and best case scenarios. These distributions can be randomly entered in a simulation ‘model. The result is not only the range of likely possibilities but also their probabilities IV. Forecasting revenue for a product launch is a type of business situation that can be simulated. Refer to the chart below. (©1997 The Teaching Company Limiod Parmer 2 VL A One Iteration of the Model Revenue simulation: Product Launch 1997 Price: No Competitor eee Td Price: Competitor 87 Volume No Competitor sis Volume With Competicor asp Competitor Entry? i Sales Price $97 Sales Volume 3572 Sales Revenue $200833 ch of the first five shaded areas is an assumption cell used to forecast the projected sales revenue, also shaded in. In this example different kinds of distributions are built into the assumption cells. 1 The competitor entry cells are a discreet distribution 2.The price entry cells area triangular distribution, A triangular 1, the standard error of the mean gets smaller. B. As the sample size gets large the standard eror ofthe mean gets Staller This i anoter application o the cena inst eorem C. Take information from a sample group, 2-100, ¥ =520,0 =88.00, then =F =~4 2320, ten dg =k, % “a ios This allows sto sy something about H. I'm sing $20 to estate HL My standard error of the mean is $0.80. This is a much tighter istribution than the standard deviation of the raw data $0.80 ‘The sampling distribution for the sample proportion isthe binomial distribution by application of the Central Limit Theorem. A. The sampling distribution for the sample proportion is related to the binomial distribution. Iti the binomial distribution with parameters n and p, where nis the sample size and p is the population proportion BB. As the sample size increases the Central Limit Theorem applies. So the sampling distribution of the sample proportion approaches a normal distribution, as n gets large. As a rule of thumb, we can use the normal approximation if mp(-p) > 5. Business applications using the central limit theorem A. You survey some of your customers to determine if sales will go up if ‘you cut prices. A xofn forn=35 } =35 ‘The sampling distribution {) , approaches a normal distribution witk the pop) \ Since x follows a P ‘mean = p and a standard distribution distribution. (©1997 The Teaching Company Limited Partnership 9 B. You survey 400 voters on an upcoming ballot initiative, You assume A A p=0.5. ¥ survey and find our that 425, The distribution of ‘p ‘will be a normal curve with a mean of 0.5 and a standard distribution of errors below my proposed p. So this would be evidence that the population parameter is probably not 0.5, .025. This means that I am three standard Central Limit Theorem * Central limit theorem: for a data set of n independent observations of a random variable representing a population ~ for both the average and the sum, the distribution becomes more and more normal, as n gets large - the mean & standard deviation of the > ‘Construct the confidence interval for the sample proportion: another real estate example. In this case the confidence interval is again given by the point estimate + 2 multiplier times the standard error estimate. ‘A. What percent of my client base has previously owned a home? In a survey of 100 clients, 60 have been previous homeowners. Construct a 99% confidence interval (01997 The Teaching Company Limited Parersip 7 B. The point estimate is a 90. The standard error estimate is P AGA) PAY p/ _, (06-06 _. [28 a= \P 100 =\V i100 C. Using 2.576 as the z multiplier, a confidence interval of 99% can be established for .60 + 126 having previously owned ahome, This ‘means that we can be 99% confident that 47.4% to 72.6% of our clients hhave been previous homeowners. TV. Confidence intervals are valid only if certain requirements are observed. |A. Be sure the data set is @ random sample from the population of interest. For example itis impossible to sample the future. B. Be sure the quantity being measured is normally distributed. This is nota rigid requirement since the central limit theorem tells us that ‘means and other measures are normally distributed. 18 (01997 The Teaching Company Limited Parersip Questions for Lecture Twelve Explain what is meant by 1-c, the level of confidence. For a large scale sample, what does 1-c. typically depict in relation to a normal curve? For a large scale sample, what does o/2 typically depict in relation to a normal curve? In reporting on an election poll, a newswoman states that 52% + 3% of the electorate say they will vote for a given candidate. Is this a confidence interval, and if so, what parameter is being estimated. ‘True or false. In order to construct a valid confidence interval, the deta set lilized must be a random sample from the population of interest. Problems for Lecture Twelve ‘A new pizza topping is testing in your supermarket. A sample of 500 shoppers try the product and 240 say that they like it. 1 2 ‘What is the sample statistic forthe proportion of shoppers that like the spread? Construct a 90% confidence interval for the percentage of shoppers that like the topping. Construct a 95% confidence interval for this percent Interpret in your own words what the 95% confidence interval means, Essential Reading for Lecture Twelve Aczel, Complete Business Statistics, Chapter 6, trwin, Third Edition, 1996. Reco mended Reading for Lecture Twelve Hanke and Reitsch, Understanding Business Statistics, Chapter 8, Irwin, 1994. (©1997 The Teaching Company Limited Parnerstip 19 Lecture Thirteen Hypothesis Testing Scope: In this lecture we explore the use of hypothesis testing in business. In a 1 m. 20 ‘business situation our data is limited to a sample of reality. Statistical techniques can test how large a part chance plays in the results reflected by the designated sample. In designing a hypothesis test, we intend to determine whether or not a claim, such as response rate from an advertising campaign, should be allowed to stand. We will examine the steps in conducting a hypothesis test. Outline ‘Asstume that the experimental results reflect only the random variation ‘caused by chance. This assumption is called the null hypothesis. The object of our research isto be able to reject or fail to reject the null hypothesis. Stating the null and alternative hypothesis A. The null hypothesis can be viewed as the status quo; i is valid until proven otherwise. It is usually denoted by Ho. B. The alternative hypothesis is the competing theory which you are trying to establish, The alternative hypothesis bears the burden of proof. Itis usually denoted by H1 ‘The task of hypothesis testing isto reject the null hypothesis or Fail to reject the null hypothesis, Errors in hypothesis testing. A, In a Type I error: rejecting the null hypothesis when itis true. also known as an alpha error. 1B, Type Il error: failing to reject the null hypothesis when itis false, also known as an beta error. v. CC. Examples: the ding letters and true love 1. Hg: You shouldbe hired. Hy: You shoud be dinged Correct decision Company hire ain decision wire | corect Type I eror ding ___ [Type terror | corret (©1997 The Teaching Company Limited Parership W. 2. Ho: You should pursue this romantic relationship. Hy: You should not pursue this romantic relationship. Truth what you should do yursue ot pursue whatyou | pursue | correct-True | Type If error- decide to do love Looking for love in all the wrong places not pursue |Type Terror | correct-Thank Golden chanees | God for pass me by. ‘unanswered prayer A two-tailed test is used when the difference between the population parameter and a sample statistic is non-directional. ‘The statistic could be very large or very small. When the direction of difference between the population mean and a particular value is specified, the alternative hypothesis is directional, or one-il. In 2 one-tailed test, consider urder ‘what circumstances to take action. This will determine the alternative hypothesis A. Use a right-hand-tailed test to take action if a parameter is greater than some value since the alternative hypothesis will state that the parameter is greater than some value. B. Use a left-hand-tailed test to take action if a parameter is less thas some value since the alternative hypothesis will state thatthe parameter is less than some value. ‘The steps involved in hypothesis testing A, Set up the null and alternative hypotheses. B. Choose , the level of significance. C. Define the test statistic, for example z D. Define a rejection region, In this region, the value of the test statistic results in rejecting the null hypothesis. E, Calculate the value of the test statistic and carry out the test F, State a conclusion for the original question, (01907 The Teaching Company Limited Parmersip a VI. A hypothesis test can be used to test product quality claims. Suppose you produce a professorial punching bag with the claim that it's good for 400 punches. Check out the claim using hypothesis testing as outlined above. AL Hy: b= 400 Hy: | 400 test 100 punching bags, n=100, X = 420, $=50 B. alpha =005 Kp ¥-a0 K -400 G2 = = o sx 5 D. > 1.96 or 2<-1.96 = jn 400-420 g, te F, Since the z-value is so extreme, we reject the null hypothesis. The likelihood of being wrong is less than 5%. 2 (©1997 The Teaching Company Limited Parership Questions for Lecture Thirteen ‘What is the null hypothesis ofa test? How does the alternative hypothesis relate to the null? Explain what is meant by Type I error? What is a Type I error? ‘When would you use a hypothesis test as opposed to simply constructing a confidence interval? Problems for Lecture Thirteen Suppose you manufacture small packages of tissue paper and want to knowhow ‘many tissues should be put in your package. You decide to test the industry ‘wisdom that the average person uses 40 tissues during a cold. You contiuct a random sample of 100 customers with a cold and find the average customer uses 235 tissues with a standard deviation of 25. You set cat 5%. 1. Write the null and alternative hypotheses for yout test 2. What isthe test statistic you will use? 3. Define the rejection region for the mull hypothesis. 4. Calculate the value of the test statistic. 5. Should the null hypothesis be rejected. Explain, Essential Reading for Lecture Thirteen Aczel, Complete Business Statisties, Chapter 7, Inwin, Third Edition, 1996, Recommended Reading for Lecture ‘Thirteen Hanke and Reitsch, Understanding Business Statistics, Chapter 9, Irwin, 1994. (©1997 The Teaching Company Limited Partnership 2B Scope: Linear regression is a method for modeling the rel n. 24 Lecture Fourteen Simple Linear Regression jonship between two variables, such as advertising and sales or training and job performance. Regression is a widely used technique and ofien provides ‘useful mathematical formulation of a real world situation. This lecture will explore the basies of simple linear regression, Outline Regression and modeling ‘A, Simple linear regression involves two variables x (independent) and y (Gependent) assumed to have a straight-line relationship B. Linear regression is one of the most widely used statistical techniques in describing the relationship between two variables such as advertising and sales, training and job performance. C. A good model captures and extracts the systematic behavior of the data, leaving out factors that are nonsystematic and cannot be foreseen, namely random error. ‘The purpose of simple linear regression is to provide a best model for a straight-line relationship between two variable. ‘A. Simple linear regression assumes an intercept parameter nd a slope parameter: y= fio * fix & where ty isan estimate ofthe imercept, san estimate ofthe slope and € represents random 1. The intercept parameter provides the value ofthe dependent variable when the independent variable is equal to 0 2. Apositive lope parameter will occur when increasing values of the independent variable are associated with increasing values of the dependent variable. 3. Anegatve slope parameter will occur when increasing valuss of the independent variable are assoelated with decreasing values ofthe dependent variable. B The method used to estimate the regression parameters is called least Squares. "This technique minimizes the sum of the squared eror. (©1997 The Tesching Company Limited Parehip C. ‘The MSE (Mean Square Error) is used in estimating error variance. ‘The smaller the error variance, the closer the points are to the line. If the error variance is too large when using simple linear regression, then itis more difficult to make accurate and meaningful forecast predictions. Error variance for can be represented as syne fee nas D. Consider the following example concerning the relationship between housing square footage and sales price. x(f2) yiprice) 1500 25K 200 230K 1800 290K 3000 340K 350000 300000 . 280000 Aj Soles 200000 pprice 160000 00000 ‘50000 0 © 1000» 2000-3000 Square footage possible regression line y=850,000 + 80x intercept is $50,000, slope is 80 IIL. Correlation must be distinguished from regression. A. When we do correlation analysis, we assume that both x and y are random variables. With regression, we assume that xis fixed. The correlation between x and y is a measure of the degree of linear association between the two variables, B. The sample correlation is denoted by r and can take values from -1 10 41, With 0 correlation, there will he litle if any association between the two variables, for example shoe size and eye color, R2, the coefficient of determination, isthe square of the correlation for simple linear regression and has a special meaning in regression analysis. (©1097 The Teaching Company Limited Parership 25 26 c Correlation is a measure of how closely two variables stick together im a straight line relationship, Both variables are independent. In regression analysis, one variable is independent and one is dependent. (©1997 The Teaching Company Limited Parerhip (Questions for Lecture Fourteen 1. Describe in your own words the purpose of simple linear regression. 2. ‘True or False. A good statistical model will often explain all of the systematic behavior ofthe data eliminating all of the random error, 3. What information does the intercept parameter in simple linear regression provide? 4. Give an example of when the slope parameter in linear regression would be negative. 5. True or False. There is one line that minimizes the squares of the error from the points to that line, That lin is the regression lin. 6. ‘True or False. The Root Mean Square Error is used in constructing confidence curves for the regression line. 7. State a major difference between regression and correlation, True or False. Correlation ranges from -1 to +1 9. Givean example of two variables that have correlation of around 0. Essential Reading for Lecture Fourteen Aczel, Complete Business Statistics, Chapter 10, Irwin, Third Edition, 1996, Recommended Reading for Lecture Fourteen Hanke and Reitsch, Understanding Business Statistics, Chapter 14, Irwin, 1994, Mendenhall and Sincich, A Second Course in Business Statistics: Regression Analysis, Chapter 2, Dellen, 1993, (©1997 The Teaching Company Limited Parmeship 7 Lecture Fifteen The Validity and Usefulness of a Regression Scope: Just because we run a regression does not guarantee that its useful or valid. A regression may be valid only for @ small range of values. In this lecture, we explain how to determine whether or not the regression equation in meaningful for business analysis. We also discuss what conditions must be met in order for a regression to be valid. The goal of regression is not just to fit a line to a set of data points, but to be able {0 use the line to forecast and predict. Outline BotBixte. ‘A. When there is no linear relationship between x and y, the population regression slope, 1, is equal to 0. Therefore the most important statistical test in simple linear regression is whether ono the slope Parameter sO. In every other situation there isa linear relationship ‘hich exists, either positive or negative 1. The slope parameter may be O when y is a constant value. 2. Asx increases there is no systematic influence ony. They are completely independent and the data points are randomly distributed B. The statistical test fora linear relationship between x any. 1. Use hypothesis testing. Set the null and altemative hypothesis, Ho:0b 1= 0.4: 11 0,28 divided by the standard eror of. If we can reject the nll hypothesis then we can conclude there is 4 linear relationship between the two variables, 2. Enter the data into statistical software package which will caleulate the regression line and all the parameters. Suppose that ‘Testing fora linear relationship in the reprenion ln 0.000480 vane Catinte standard tra Boimercep 90000 25.000 Armee 80 3027006 3. Ifthe tratio is high enough we can reject the null hypothesis and assume a linear relationship exists. Generally speaking a linear relationship exists when tis larger than two, 4, The p value is the value of «at which the hypothesis test would change conclusions. Since our tis generally .05, any p value Jess than ,05 (,006 is Tess than .05) allows us to reject the aul hypothesis. 191997 The Teaching Company Limited Parmeship p value nl. Mm. ‘The usefulness ofa regression can be measured and quantified. ‘A. The mean square error (MSE) is an estimate of regression error, ‘measuring the variation of the data about the regression line. MSE, however, depends on the nature of the data B. R2is arelarive measure that compares the variation of y about the regression line with the variation without the regression line. The coeff ent of determination (R2) isthe proportion ofthe variation in y that i explained by the regression relationship ofy with x. R2 ranges from010 +1 C. The regression line always goes through the mean (X,Y). R2 tells you how much work the regression line is doing as x moves away from and y moves away from ¥ . R?= 0 means thattne regression ine does not explain the movement away from the mean. R reans thatthe line isa perfect fit Residual analysis ofa regression checks for equality of error variance, tests for missing variables inthe regression and helps detect if there isa possible ‘curvilinear relationship, A. Ifthe residuals are plotted, a pattern may emerge known as hoteroscedasticity in which the residuals get larger as x gets larger (a funnel shape). This implies thatthe error variance is not equal ard thus bring into question the validity of the regression. ‘The desire ‘outcome is homoscedasticty in which the residuals are scattered randomly. B. Sometimes when the residuals are plotted the points form a linear pattern, which often indicates that variable should be included in the ‘model. It may also indicate a curvilinear relationship. Constructing a prediction interval: 9 + interval ‘A. The width ofthe prediction interval depends on the distance of x from the mean B, For example, there sa significant linear relationship between January stock prices and how stock perform forthe year. However the root mean square error is so large that the regression line sof litle or no _use in predicting stock prices {©1997 The Teaching Company Limited Parwership 29 Lecture Sixteen Introduction to Multiple Regression Questions for Lecture Fifteen 1. Describe in your own words the test for determining whether or not there is 1 regression relationship between x and y. 2. True or False, MSE (Mean Square Error) isa relative measure of how good the regression fits. Scope: In this lecture we will provide an introduction to multiple regression. Multiple regression is an extension of simple linear regression in that ‘more than one independent variable is used in attempting to explain 3. True or False, R? essentially tells you what percentage of the variation in y variation in the dependent variable. We also explore the use of dummy is explained by the regression line. variables in regression models. Nevertheless, just because a model can be built, it does not necessarily follow that the model will be good for prediction. In business situations, statistical modeling is generally not Explain how residual analysis is used to check the validity of the Nea . ‘an end in itself, but when analytical and statistical modeling are 5, True or False, Ifthe plot of the residuals against x yields a upside down U- combined with business experience and intuition, more effective shaped curve, the linear regression is confirmed. decision making will often be the result. 6. You determine that there isa valid regression relationship between ‘movement in January stock prices and the stock price movement for the Outline entire year. Nevertheless, you determine that your prediction interval is not useful. How can this be? 7. What is heteroscedasticity? 8, Truc or False. A prediction interval consists of two lines parallel to the 1. When two or more independent variables are included in a regression ‘model, we are using multiple regression. regression line IL, Parsimony is important in building regression models. ‘A, Given n points, we can find an (n-1) dimensional surface that will fit Problems for Lecture Fifteen the data perfectly. It is possible to overfit the data by introducing too Problems | through 3 relate to the following situation, Suppose that a regression many variables. line for ice eream sales ata ball park has been developed using historical data. B. ot Pixs *Baxa* Bh axa..* Bex ‘The regression equation is: y = 12000+200x, where y represents sales in dollars, C. Utilize the minimum aumber of independent vaiables to get the job and x represents average temperature in degrees Fahrenheit, aa 1. Does the slope of the regression line appear to be in the direction you would expect? Explain IIL, ‘The Analysis of Variance (ANOVA) test using data from residential real estate sales as an example. 2. Whatis the expected diference in ice cream sales a the park between a MATS ANOVA Gn mete ts ee ctu eee day when the average temperature is 60" and & day when the average relationship between y and any ofthe independent variables? temperature is 70°? Consider the following data in our example: 3, Would you expect temperature to explain most of the variation in ice eream Resi. | aales rice | square feet| —Toraize sales atthe park? Explain dential Essential Reading for Lecture Fifteen eel . e Acrel, Complete Business Statistics, Chapter 10, Irvin, Third Edition, 1996. ee o: Recommended Reading for Lecture Fifteen 2 $300,000 2.200 12,000 Hanke and Reitseh, Understanding Business Staristics, Chapter 14, Irwin, 1994. exe? 900000 000 18,000 Mendenhall and Sincich, A Second Course in Business Statistics: Regression “The statistical test or overall test is a follows Analysis, Chapt 3, Dellen, 1993, Ho: Bi-+f2=0 oF Hy: not all the is are =0. TFall the fis are equal to zero then the mean of the data set is doing all the work and the regression is not helping us. 30 (©1997 The Teaching Company Limited Parveeship 161997 The Teaching Company Limited Parership 3 32 B. ANOVA is included in most statistical or spreadsheet software applications. The statistical package runs the regression and calculations once you've entered the data. The resulting ANOVA. lable includes source of variation, degrees of freedom (k relates to the ‘number of independent variables in the regression), sums of the squares (SSR), mean square from the regression (MSR), f-ratio and p- value. Source af ss Eratio _p value Regression _k R MSR 0.010 Ln MSE Error n{kel) SSE SSE ‘a(kel) Total nl 1. The Fratio test indicates whether or not there is a regression relationship between y and any of the independent variables. ‘The higher the F value, the more likely thatthe regression has explanatory and predictive power. A rough rule of thumb for, larger sample sizes is that an F ratio greater than five indicates that there is a rogression relationship between the dependent variable and at least one of the independent variables. It should also be remembered that the p-value also needs to be less than 0.05 to indicate a regression relationship. For example, inthe ANOVA table above if the p-value were 0.10 you would conclude that there was not a regression relationship. ANOVA is important because series oft tests to compare pairs ‘of means are not independent of each other. This is especially true when there are three or more independent variables. This is ‘due to the fact that one variable may be robbing another variable of its predictive power. Thus, the ANOVA testis done first in situations involving multiple regression. C. Note that we still need separate tests to determine which ofthe slope parameters are different from 0. In this case {tests have been uscd: Variables Estimate [Standard |evalue |p value Error of Estimate Constant 36,000 Xt 70 2 58 <0.001 Xz 7 34 2A 047 ‘Since the model passed the overall F test there isa relationship between the variables, Both of the independent variables, and X2 should be included in the model since p<0.05, ‘The model would be $ = 36,000 + 70x, + 7x2 £01997 The Teaching Company Limited Patersbip 3. To predict the price of a piece of residential real estate with 2 2,000 square foot house and # 10,000 square foot lot, substitute X;=2,000, X=10,000. The regression model equation ‘calculates the sales price as follows: 16,000+70(2,000) + 7(1,000) = $246,000 IV. The usefulness and accuracy of the multiple regression is indicated by the root mean square error and the R? value. A. B. ‘The mean square error (MSE) estimates the population square error. ‘The root mean square error (SE) is (MSE . The SE is generally used as a multiplier in the prediction interval 2, which corresponds to the multiple coefficient of determination ‘measures the proportion of variation explained by the regression ‘model. R2 tends to go up as more variables are included. V. Dummy variables are also used in a regression. In a dummy variable the ‘switeh” is either on or off; the value is either O or 1 AL B. c. ‘A dummy of indicator variable expresses levels of a quality, such as whether the house is on a golf course, type of coffee or genre of Use of a dummy variable in regression analysis is straightforward. ‘Simply code the indicator variable to ifthe level is obtained or to 0 ifthe level is not obtained. Consider the regression equation: y=Bo* fini * fh axo* fh 3x3, Let x3 represent whether or not the house ison. golfcourse, Ifthe house ison the golf course 3a. Ths in the following regression equation y=$40,000—85) + 10x + 50,0003. The dummy variable x adds $50,000 to the sales pric if the house is located on the golf course. (©1997 The Teaching Comoany Limited Protein ” Questions for Lecture Sixteen Explain what is meant by parsimony in building a multiple regression mode! ‘True or False. The maximum number of independent variables that should be used in multiple regression is three. Multiple regression often provides a more adequate way of modeling ‘complex business situations than simple linear regression, Explain this statement. 4, True of False. The Analysis of Variance (ANOVA) table is used to determine which of the independent variables have a regression relationship with the dependent variable. Assume you are attempting to build a multiple regression model to explain the price of properties in a real estate development located near a golf ‘course, What are some of the independent variables you might use? True or False. Unlike simple linear regression where R® must be less than 1, in multiple regression, it is possible for R? to be greater than 1. ‘What are dummy variables and why are they coded as 0 or 1? ‘Suppose you are attempting to build a regression model to explain box office sales for upcoming movies. Which of the following are dummy variables: production cost budget, advertising budget, whether or not a ‘major star is in the film, whether or not the film isa sequel Essential Reading for Lecture Sixteen Aczel, Complete Business Statistics, Chapter 11, Irwin, Third Edition, 1996. Recommended Reading for Lecture Sixteen Hanke and Reitsch, Understanding Business Statistics, Chapter 15, Irwin, 1994. Mendenhall and Sincich, A Second Course in Business Statistics: Regression Analysis, Chapter 4, Dellen, 1993. 34 {©1997 Te Teaching Company Limited Partership Answers (©1997 The Teaching Company Limit Parmership 35 2 3. 4. 5. 6. 36 Answers to Questions for Lecture Nine ‘A random sample provides a “representative” sample; using a random sample, you can often describe how your resulls differ from those of the population, ‘A parameter is a number computed for the entie population. {A statistic is number computed from your sample data. False False A sampling distribution lists, for each possible value of the statistic, the . fraction ofall possible samples with a given value. 2 No. The units must also be chosen independently. Answers to Questions for Lecture Ten Many data sets we work with in business will be normally distributed. Other data sets will not be normally distributed. However, given the central L limit theorem, the distributions of means or sums of the data will be approximately normal if our sample is large enough, False ‘When sampling from a population, the distribution of means will tend toward a normal distribution as the sample size gets large True True Because ofthe central limit theorem No Answers to Problems for Lecture Ten Yes. You can use the normal distribution to approximate the binomial, since np(1-p) is large (greater than 5) About 90.1% (using a normal distribution table) About 21.5% (using a normal distribution table) (©1997 The Teaching Company Limited Parneship Answers to Questions for Lecture Eleven ‘An interval of numbers within which we expect the true value of the population parameter to lie Tre ‘The sample size is large enough so that the central limit theorem can be applied ‘A wider confidence interval Tre False. Given the possiblity of very remote events, a 100% confidence interval (if obtainable) is too large to be useful Answers to Problems for Lecture Eleven. From $24.85 to $25.15. From $24.88 to $25.12. Note that going to a 95% confidence interval does not “cost” you much in interval width, given the large sample size. People who send in for rebates may not be a random sample of your customers. (©1007 The Tesching Company Lin 5. 3. 38 Answers to Questions for Lecture Twelve ‘This isthe fraction of all confidence intervals that would include the true value of the population parameter ‘The area under the curve that excludes the tails ‘The area in one tail of the distribution Yes, this is a confidence interval to estimate p, the population proportion, True Answers to Problems for Lecture Twelve 48% 44.3% 10 SL7% 43.6% 10 52.4% ‘We are 95% sure that between 43.6% and 52.4% of our customers like the new pizza topping (©1997 The Teaching Company Limit Parmership Answers to Questions for Lecture Thirteen ‘What is claimed to be correet~ the status quo ‘The alternative hypothesis competes with the null ‘The chances of rejecting the mull hypothesis when itis indeed true Failing to reject the null hypothesis when its false ‘When you are testing a specific claim for a population parameter Answers to Problems for Lecture Thirteen [Null hypothesis: mean = 35; alternative hypothesis: mean (1 35 ‘The z-statistic Rejection region: 2<-1.96 or 7>1.96 20 Yes. Since z* falls in the rejection region, we conclude that there is ‘evidence that the average number of tissues used is not 40, (©1997 The Teaching Company Limited Parersiip 39 6 40 Answers to Questions for Lecture Fourteen ‘The purpose of linear regression is to provide a “best model” fora straight line relationship between two variables. False ‘The value of the dependent variable when the independent variable is equal 100. ‘This will occur when increasing values of the independent variable are associated with decreasing values of the dependent variable. For example, using age to predict the time that it takes adults to run a 100 yard dash may produce a negative slope parameter estimate. ‘True True With correlation, we assume that both x and y are random variables, ‘whereas with regression we assume that x is not random. True With 0 correlation, there will be litle if any association between the two variables, An example might be height and intelligence of company CEO's, (©1997 The Teaching Company Limited Parnership Answers to Questions for Lecture Fifteen "The test is a t-test that examines whether or not the slope parameter is equal t0 0. False True ‘Asx increases, check the residuals to see ifthe error variance is staying approximately constant. False ‘The root mean square error may be large, and the prediction interval may be too large to be useful Unequal error variance False Answers to Problems for Lecture Fifteen Yes. It makes sense for sales to go up as temperature rises. $2,000 Not necessarily. Other factors such as attendance may be very important. (©1997 The Teachine Comouny Limited Pacnershio at a Answers to Questions for Lecture Sixteen Building a good regression model withthe minimum number of independent variables False ‘Many business variables (such as sales) are complex and are better explained by using more than one independent variable. False Lot size, interior square footage, number of bedrooms, and whether or not the property is on the golf course are some examples. False Dummy variables are used to indicate whether or not a quality is present or not. A value of O means that quality is not present, and a value of I means the quality is present Whether or not a major star isin the film, whether or not the film is a sequel (©1997 The Teaching Company Limited Partnership Bibliography Acrel, Complete Business Statistics, Irwin, Third Falition, 1996, Clemen, Making Hard Decisions, PWS-Kent, 1991 Cochran, Sampling Techniques, Wiley, 1973. Crystal Ball Users Manual, Decisioneering, 1995. Deming, “On Probability as a Basis for Action,” American Statistician, Vol. 29, 1975, 146-152. Derman, Gleser, and Olkin, A Guide to Probability Theory and Applications, Holt, Rinehart and Winston, 1973. Hanke and Reitsch, Understanding Business Statistics, Irwin, 1994, ‘Mendenhall and Sincich, A Second Course in Business Statistics: Regression Analysis, Chapter 4, Dellen, 1993, Schleifer and Bell, Data Analysis, Regression, and Forecasting, Chapter 2, Course Technology, 1995. Winston, Simulation Modeling using @ Risk, Duxbury, 1995. {1007 The Teaching Carn eed Payeehin *

You might also like