0% found this document useful (0 votes)
160 views101 pages

Introduction To Econometrics Data

This document provides an introduction and overview for using MINITAB statistical software to complete assignments for an econometrics course. It discusses the following key points: 1. MINITAB will be used to complete assignments and a modeling project, allowing students to learn regression analysis tools and perform economic analyses. 2. The manual contains 4 assignments to make students familiar with regression techniques and MINITAB. Each builds on the previous one and incorporates interpreting statistical output. 3. MINITAB has different window types (worksheet, session, graph, etc.) for entering and viewing data and results. Basic navigation and functions are described.

Uploaded by

raberrios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views101 pages

Introduction To Econometrics Data

This document provides an introduction and overview for using MINITAB statistical software to complete assignments for an econometrics course. It discusses the following key points: 1. MINITAB will be used to complete assignments and a modeling project, allowing students to learn regression analysis tools and perform economic analyses. 2. The manual contains 4 assignments to make students familiar with regression techniques and MINITAB. Each builds on the previous one and incorporates interpreting statistical output. 3. MINITAB has different window types (worksheet, session, graph, etc.) for entering and viewing data and results. Basic navigation and functions are described.

Uploaded by

raberrios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 101

An Introduction to Econometrics

Data Analysis Using MINITAB

Georgia Kosmopoulou
Kodrat Wibowo

Department of Economics
University of Oklahoma
Norman
2000
An Introduction to Regression Analysis Page: 1

CHAPTER 1

ORIENTATION

1.1. Introduction to MINITAB.

MINITAB is an easy-to-use statistical package which offers the student and practitioner
various statistical and graphical tools. You will need to use this package to complete
computer assignments and the main modeling project for Econ 4223 (Intermediate
Business Statistics). Econ 4223 assumes that the students are familiar with elementary
statistical concepts such as estimation and hypothesis testing.

Econometrics uses statistical inference as one of its tool. It has many applications in
academia and business. Most applications are concerned with gathering information
(sample data from a population), summarizing the data with sample statistics, and using
the sample statistics to make inferences and tests about the population. MINITAB will
allow you to greatly increase the speed at which these functions are performed.

1.2. Goals of the Manual.

This manual is designed to help you learn all the basic tools needed to understand the
class materials better and to perform some basic economic analyses through regression.
The computer requirements of the course are divided into two main parts. The first part
consists of four computer assignments. These are designed to make you familiar with the
techniques of regression analysis and the statistical package MINITAB. The second part
is the Modeling Project, which will incorporate all the techniques you have learned
during the course of the term. This will be discussed in detail by your instructor, and will
require you to formulate an original model, find the data and perform a detailed analysis.

1.3. Computer Assignments.

Chapters three through six of the manual each represent a computer assignment. All the
assignments are related to one another, and together they will form a coherent computer
analysis resembling that which you will do for your modeling project.

In the manual, we work through a specific problem with data given in the text. For your
assignments, each one of you will be provided with a data set.

Each assignment is made up of 3 sections: the actual assignment, the tools required to
complete the assignment, and interpretation or analysis of any printouts for the
assignment.
An Introduction to Regression Analysis Page: 2

(a) Actual assignments.

The first section in each chapter describes the assignment in details. The assignment
is designed in such a way to help you understand the class materials better. The
problems are phrased in terms of a real economic phenomenon, where you are
working at International Monetary Fund performing statistical tests for the country
that will be funded. As much as possible, the assignments are designed so you can
achieve mastery of the regression tools and also gain an intuitive feel for what the
statistics really mean.

(b) Procedures for using the statistical tools.

The second section in each chapter gives an in depth description of the procedures,
needed to complete the assignment. To minimize on the "cookbook" aspect of the
manual we have separated the assignment and the instruction for how to complete it.
Each assignment will build on what you were taught in the previous assignment. The
procedures described in each assignment will not be repeated. It is your responsibility
to remember the appropriate tools. When necessary, there will be hints to remind you
what they are.

(c) Interpretations of the printouts.

You will get a number of printouts containing the results from each assignment.
There will be brief explanation on what they are to help you extract the necessary
information required. The ability to read and interpret a statistical printout is an
essential tool in the business world. Being able to "eyeball" your results and perform
simple tests are fundamentals to effective and quick decision making.

1.4. Conventions of the Manual.

The conventions used in the manual are straightforward.

A word typed in:

(i) Italics means that you have to come up with something to replace it:

Examples:
filename means you need to type in a file name of your choice, for instance, you
might give your file the name: ASSIGN1

variable means you need to select a variable from a menu or type in a variable name
of your choice, for instance if you have data on real money, you might want to call
your variable: realmon.
An Introduction to Regression Analysis Page: 3

(ii) Bold means that it is a tool (or menu item) that comes with MINITAB:

Example: Graph and Stat are both main menu items.

(iii) Any keyboard keys to be pressed will be denoted with bold type with a < in front of
the key and a > behind it. For example, the "Escape" key is denoted as <Esc>.

1.5. Additional Information.

The MINITAB computer package is available in many OU computer labs (i.e., at the
Physical Science Computer Lab., the Bizzell Library Computer Lab and others.). You
just need your 4x4 OUNETID and your password to log on.

Every time you need to use the computers, you have to have a formatted disk. You will
need to carry with you a 3.5 inch double sided double density diskette.

Data from your analyses must be saved on the diskette which you insert into the floppy
drive at the beginning of each session. Remove the diskette from the computer when you
are done. This diskette and any printouts obtained are the only records you have from
your work on the computer! Data left on computers have been know to disappear by the
time a student returns to retrieve them. Re-entering the data is a boring and time-
consuming task.

Once you have logged on to the computer, you are ready to use MINITAB. It is located
under the Programs menu in the main WINDOWS 95 menu. Select the MINITAB option
and click the mouse or hit <Enter>. You are now in the MINITAB program.
An Introduction to Regression Analysis Page: 4

CHAPTER 2

INTRODUCTION TO MINITAB

MINITAB is a menu driven software package. All functions have been placed into a
subgroup. When you enter the program, the screen should look like this:

Figure 1

We will call this the main MINITAB menu screen throughout the manual. Using the
arrow keys or the mouse cursor, you can highlight the options that are located at the top
of the screen. Whenever an option is highlighted, MINITAB gives you a brief description
of what it does at the bottom of the screen. If you need more information, hit <F1> to
bring up the Help screen. Hit <Enter> or click the mouse to activate the highlighted
option. Hit <Esc> to close the menu. Use the left and right arrow keys or the mouse to
move between menu options.

MINITAB has four types of windows, and they can all be opened at the same time. The
“Project Manager” allows you to view and access the worksheet, session, history and
graph windows (see figure 2). The “Worksheet” window displays your worksheet. You
can enter and edit data here. The “Session” window displays the output. For the students
who are familiar with the DOS version of MINITAB, they can also type commands here.
An Introduction to Regression Analysis Page: 5

The “History” window contains a record of previously executed commands. Each time
you plot variables, a “Graph” window appears automatically. The worksheet and session
windows are the defaults in the opening window.

Figure 2

Figure 2 shows these two windows and the “Project manager” window. You can
reproduce this setting by double-clicking the icon located at the bottom left corner of
your screen. You can browse over its contents using the arrow located on the right hand
side of the title bar. Another way to open a specific window is by selecting it directly
from the Window menu option (see Figure 3).

Going back to the main menu, the other functions you will use most frequently are File,
Stat, Graph and Help. Option Help will offer a small blurb about various functions.
Hitting <F1> can also activate this.

Pulling down the File menu offers you several options for specifying what data you will
use (see Figure 4). If you have some new data to enter into a file, you need to choose the
option “New” and then select “Minitab Worksheet”. If you are using data entered in a
previous session, the option “Open Worksheet” will pull up the old file. “Save Current
Worksheet” and “Save Current Worksheet As” will save the data worksheet that you
have most recently used. “Print Worksheet” will print the data appearing on your screen
(make sure the cursor is on the “worksheet” window).
An Introduction to Regression Analysis Page: 6

Figure 3

If you go back to the main MINITAB menu and place the cursor in any other window,
the File menu will give you a different set of options that corresponds to the window you
are at. If you place the cursor at the session window “Save Session Windows As” is the
option for saving the statistical output. This output can be printed by selecting the option
“Print Session Window” (see Figure 5). After producing a graph, the option “Save
Graph” will allow you to save it.

Figure 4
An Introduction to Regression Analysis Page: 7

Figure 5

The next menu item is Stat. The option we need for now is “Regression”. This option
will occupy most of your time in this class. Another commonly used tool is “Basic
Statistics”, which calculates various different statistics related to your variables, such as
the mean, median, variance etc. In this menu option we can also find the options
“Correlation” and “Covariance” that create correlation and covariance matrices for any
number of variables.

Figure 6
An Introduction to Regression Analysis Page: 8

Figure 7

There is a number of graphing options provided by MINITAB which you might find
useful for this and other classes. They are located under the main menu option Graph.
The most commonly used option will be “Plot”. We will also use less frequently the
options “3D Plot” and “3D Surface Plot”. The option “Plot” creates a scatter plot of
your data and is helpful for picking up the general relationship between variables.
An Introduction to Regression Analysis Page: 9

CHAPTER 3

ASSIGNMENT 1

3.1. Goal of this Assignment.

This assignment will help us learn to:

Create a data file.


This is very important because the data file contains all your data and MINITAB will
not perform any regression analysis for you without the data file.
Plot a graph.
You will learn how to plot a scatter graph of two variables. A scatter graph is useful
for “eyeballing” the data, i.e. whether the relationship is positive or negative, linear or
non-linear, and how strong it is.
Compute the sample correlation between two variables.
A scatter plot can only provide you a very rough estimate of the strength of the linear
relationship between two variables. To get a more precise answer, you can ask
MINITAB to compute the correlation matrix. This will give you the sample
correlation coefficients.
Print the results.
It is always convenient to have a hard copy of the statistical results so that you can
look at your results later. You will learn here how to send the results to the printer.

3.2. Assignment 1.

After, graduating from the University of Oklahoma, you have a job as a data analyst at
the International Monetary Fund. Your first task is to help the Director make sense of the
26 annual observations on money per capita, GDP per capita, interest rates and CPI of Sri
Lanka (Ceylon), a country that needs IMF’s assistance to stimulate its economic growth.

Overview:

Sri Lanka is a low-income country, near the top of the low-income group according to
the World Bank classification. In 1996, per capita GNP was US$753. Total GNP in 1996
was $13,800m. Average yearly per capita GNP growth was 3.1% over the period 1985-
96. Despite the damage to development caused by internal political conflict in recent
years, Sri Lanka still has the world’s highest ranking for achieved quality of life above
material quantity – reaching 0.711 in the UN’s Human Development Index for 1997,
over 40 places above its rank in purely GDP terms. The purchasing power of Sri Lankan
incomes is also proportionately higher. GDP growth over the year 1996 is 3.8%.

Sri Lanka has followed a trade liberalization policy since 1977. While agriculture is
central to Sri Lanka’s economy (accounting for a fifth of GDP), manufacturing and
An Introduction to Regression Analysis Page: 10

services are of increasing importance, with exports of textiles and clothing now well
ahead of the traditional agricultural exports as foreign exchange earners. The banking
and financial services sector is also developing. The former policies of nationalization
have been superseded by an extensive liberalization program since late in 1989. In this
previously largely centralized economy, privatization is under way in various sectors –
commercial and agricultural enterprises, banking, transport services and utilities. Sri
Lanka is aiming at achieving newly industrialized country (NIC) status by the year 2000.
However ethnic conflict has adversely affected the economy, notably in the areas of
foreign investment and tourism. In the recent political history of this country, the period
1983-1989 was characterized by intense civil unrest. Since the beginning of 1990 there
has been substantial reduction in the instances of conflict.

Year Nominal Nominal GDP Nominal CPI


Money Interest Rate
1970 155.67 1,090 6.5 13.7
1971 168.75 1,114 6.5 14.07
1972 191.37 1,186 6.5 14.96
1973 210.47 1,410 6.5 16.4
1974 220.11 1,790 6.5 18.42
1975 227.03 2,004 6.5 19.64
1976 301.30 2,224 6.5 19.9
1977 382.44 2,648 10 20.14
1978 415.61 3,142 10 22.59
1979 528.16 3,795 10 25.01
1980 633.26 4,637 12 31.55
1981 663.80 5,640 14 37.22
1982 768.45 6,421 14 41.25
1983 946.35 7,732 13 47.01
1984 1067.12 9,445 13 54.83
1985 1178.38 9,962 11 55.64
1986 1306.14 10,699 11 60.08
1987 1521.97 11,541 10 64.72
1988 1938.68 13,190 10 73.77
1989 2087.83 14,770 14 82.31
1990 2330.14 18,708 15 100
1991 2702.86 21,444 17 112.19
1992 2876.18 24,233 17 124.96
1993 3368.81 28,362 17 139.63
1994 3944.14 32,419 17 151.43
1995 4152.88 36,571 17 163.05

Source: International Monetary Fund, 1996


Note: All numbers are in national currency units per capita except Interest Rate
Sri Lankan currency = SLR
An Introduction to Regression Analysis Page: 11

(a) Your first assignment is to create a data file and input all the numbers into the table.

(b) The Director wants to know what kind of relationship exists between Real Money per
capita and Real GDP1 per capita. Since the Director would like to get an intuitive feel for
the correlation first, it is better to plot a graph. It does not matter which variable is on the
X and Y axes. From the graph, what can you tell the Director about the relationship
between the two variables and the strength of this relationship? Make an intelligent guess
of what the correlation coefficient is, and write it down.

(c) Suppose the Director is also curious about the relationship between Real Money and
Real Interest Rate2, and Real GDP and Real Interest Rate. Repeat what you did in (b) for
these pairs of variables. Write down your guess of the correlation coefficient.

Estimated Coefficient Actual Coefficient

Real Money and Real Interest Rate ________________ ________________

Real Money and Real GDP ________________ ________________

Real GDP and Real Interest Rate ________________ ________________

(d) Your boss, the chief data analyst, wants more precise answers on the strength of the
correlation for those three pairs of variables. It is your job to provide him with the
correlation table for the three variables. Print a copy of the correlation table and check
your guesses from (b) and (c) with the actual values from the table to see how close you
were.

(e) Your boss is also interested in knowing whether the population correlation for each of
the 3 pairs of variables is significantly different from zero. Perform a hypothesis test at
the 5% level for each of the three pairs and present your results to him.

1
To make adjustments from nominal to real values in both the cases of GDP and Money, we have
to divide nominal values by the CPI and multiply by 100. Example: Real GDP t = (Nominal
GDPt / CPIt) * 100, where t indicates the year under consideration.
2
The Interest Rate you use should be in real values, not nominal ones. To adjust interest rate we
need to know the inflation rate. To calculate inflation rate use the following formula: ((CPI t/ CPIt-
1) - 1)*100. Real Interest Rate is the nominal interest rate minus inflation. i.e., Real IR t =
Nominal IRt – Inflation Ratet. Notice that the software will adjust all values but the value for
1970. In order to make the adjustment for 1970 we need the 1969 value of the CPI. This value is
CPI1969= 12.94. Plug in the corresponding values to the following equation: ((CPI 1970 / CPI1969) –
1 ) * 100 = Inf. Rate1970.
An Introduction to Regression Analysis Page: 12

3.3. Step by Step Procedure.

Creating a Data File.

Select File from the main menu, choose the option “New” and then select “Minitab
Worksheet”. Now you are ready to enter, edit and view your data. Your cursor
should be in the “worksheet” window.

Figure 8 shows a “worksheet” window. The current cell is highlighted. You can move
about in the window as you do in other window applications, using scroll bars, the arrow
keys and the <PgUp> and <PgDn> keys. Click in a cell to make it the current cell.

Entering Data.

Use the arrow keys to get to the cell you want, enter the number and then move to the
next cell of the same row by hitting <Tab> (If you press <Enter> instead the arrow
will move to the next row.). Repeat the process until you have entered all the
numbers. Figure 8 shows the “worksheet” window after the data on Nominal Money,
Nominal GDP and Nominal Interest Rate for 1970 have been entered.

If at any time you realize the number in a cell is incorrect, move to that cell and type the
correct number.

Figure 8
An Introduction to Regression Analysis Page: 13

Naming each Variable.

You can enter or change the name of any variable placed in the data sheet. Go to the cell
just below the column indication (C1, C2 etc) and type the name of the variable whose
data you have entered below.

Saving Worksheets.

Once you have entered your data set into MINITAB, you will usually want to save it on
your disk so that you can use it again or edit the data in another session. Note that the
extension of any worksheet file is *.mtw.

Select File from the menu and choose the option Save Worksheet As. You will be given
options in a dialog menu. Type the name of your worksheet (ASSIGN1.MTW ) and click
OK.

Figure 9 shows the “worksheet” window after some of the new data have been entered
and the variables have been named.

Figure 9
An Introduction to Regression Analysis Page: 14

Creating Graphs.

- Highlight your data set in the “worksheet” window and select Graph on the menu bar.
You will see the options of available graphing tools. Using those you can create different
types of graphs and charts.
- Select Plot from this menu. The Plot function will create a scatter plot. Once you click
on Plot, MINITAB will give a dialog box.
- Select the variables you would like to plot. Notice that you need to specify which one
should be depicted on the Y-axis and which one on the X-axis. Go to the first entry of the
Y column and then click twice on the corresponding variable from the variable list.
- Repeat the process for X.
- To put a title on the graph, choose Annotation and type the title.
- Click OK or <Enter>. After a while the plot will appear on your window.

Note: For this first assignment, it does not matter which variable you choose for your X
and Y axes. We are only interested in the sign and strength of the linear relationship.

To plot the next graph follow the same steps choosing the new variable for the X and Y
axes.

To save the graph in a MINITAB Graphics Format (MGF) file, choose File and then
Save Graph Window As from an active “Graph” window. You can open the MGF file in
your next MINITAB session.

Figure 10
An Introduction to Regression Analysis Page: 15

Computing the Correlation between Variables.

Make sure that your cursor is on the “worksheet” window.

- Select Stat from the menu bar. Choose Basic Statistics and then choose Correlation.
A dialog box (figure 10) will appear giving you the choices of variables. It allows you to
pick the variables you want to analyze. If you select more than two variables, MINITAB
will create a correlation matrix showing you the sample correlation coefficients for
different pairs of variables.
- Select Variables. Highlight the variables you want to analyze and click twice on them.
- Click OK. The correlation matrix will appear after a few seconds at the “Session”
window.
- You can save the matrix in the “Session” window by clicking “Save Session Window
As” in the File menu.
An Introduction to Regression Analysis Page: 16

CHAPTER 4

ASSIGNMENT 2

MAKE SURE YOU BRING THE DISK WHICH YOU ENTERED YOUR DATA ON
FROM THE PREVIOUS ASSIGNMENT. YOU WILL USE THIS DATA FOR THE NEXT
THREE ASSIGNMENT

4.1. Goals of this Assignment.

This assignment will help us learn how to:

(1) Run a simple regression.


A simple regression computes the linear causation between two variables. Make sure
you know what your dependent and independent variables are.

(2) Conduct hypothesis testing on the significance of an independent variable.


Whenever you run a regression, you always want to test whether the independent
variable is significant or not. If it is not, you want to reconsider the formulation of
your model. You will also learn how to use the ANOVA (analysis of variance) table
to test the joint significance of the independent variables included in the model.

(3) Run a multiple regression.


If you believe that more than one independent variable causes changes in the
dependent variable, you must run a multiple regression.

(4) Check the significance of an independent variable without formal hypothesis testing.
The ability to “eyeball” your data and make quick hypothesis tests in your head is a
critical skill.

4.2. Assignment 2.

The Director wants to know how real GDP per capita affects real money per capita in the
country for the last 26 years. He knows regression would give him the relationship, but
does not have the time to perform the calculations.

(a) Help him by showing him a plot of the data with the regression line superimposed on
it. This will help him (and you) get an intuitive feel of what a regression is.

**Hint: Make sure:


 you retrieve the data file before you start your analysis.
 you correctly identify the dependent and independent variables.
An Introduction to Regression Analysis Page: 17

 the Y-axis is represented by the dependent variable and the X-axis is represented by
the independent variable in the graph.

(b) The next thing you need to do is show him the equation that represents the regression
line in the graph. Compute the regression line with MINITAB and write down the
sample regression line in equation form. Make sure you include the standard errors in
parentheses below the regression parameter estimates. Attach a copy of the printout.

(c) Please write down the following information from your regression results:
SSR: R2:
SSE: Adjusted R2:
SST:

(d) From the data given above, find the sample correlation coefficient between the two
variables? Does this match the one in the correlation table (in Assignment l)?

**Hint: What is the relationship between R2 and the sample correlation coefficient in a
simple regression?

(e) Can you tell the director if Real GDP has any significant impact on Real Money at the
5% level? A sample test is given in Chapter 7 of this guide.

(f) Suppose your boss told you that you should also include Real Interest Rate as an
addition independent variable in your regression. Follow his advice and run a
multiple regression. Write down the multiple regression equation and attach a copy
of the computer printout. Similarly, write down the information about SSR, SSE,
SST, R2 and adjusted R2 from your regression results.

(g) Compare the R2 and the adjusted R2 between the simple and multiple regression lines.
Do they go up or down between the two regressions? Does R 2 ever go down when
adding another independent variable? What hint does the change in the adjusted R 2
give you about real interest rate as an independent variable?

4.3. Step by Step Procedure.

Retrieving Previously Created Files.

You will often need data entered during one work session in an ensuing session. If the
file was saved when it was entered, this presents no problem. If it was not, you will have
to reenter the entire data set.

To retrieve previous data you must complete the following steps:


An Introduction to Regression Analysis Page: 18

1. Select File from the main menu. A submenu will appear giving you several choices.
The one you want is the option “Open worksheet” which allows you to retrieve a data
file.
2. Select “Open Worksheet”. A box will appear showing you all the available files in
you disk.
3. Select the filename under which you saved your data set in the previous session (for
example: ASSIGN1.MTW ) and click OK.

What to Do After Retrieving a Data File.

If you want to change your data, you can do so by editing the data. This was covered in
Chapter 3. If you want to perform other analysis, just follow the instructions listed under
each procedure.

Obtaining the Fitted Line Plot of Two Variables

In the previous computer assignment, you plotted the data in a simple two-dimensional
graph. It is helpful to be able to see the data plotted along with the sample regression line
to get a preliminary estimate of how good your regression is.

1. Select Stat then “Regression” and then “Fitted Line Plot”.


A dialog box will appear giving you the choices of:
 Variables: which one is the independent variable or predictor (X) and which one
is the dependent variable or response (Y).
 Type: Lets you choose the way you want to fit the line i.e., linear, quadratic or
cubic.
2. After you select the variables, choose linear and click OK.
3. The scatter plot together with the regression line will appear after a while.
4. Save the plot by selecting File from the menu and then “Save Graph As”.
5. You can also print it by selecting File and then “Print Graph”.

Running a Simple (or Multiple) Regression.

The computer program will calculate all statistics relevant to regression analysis. Before
making any difficult calculations, make sure that the statistic you are looking for has not
already been provided

1. Select Stat from the main menu. A submenu will appear showing you all the
available statistics tools. The one you want is “Regression” which allows you to run
a regression analysis.
2. Click on Regression twice. A dialog box will appear giving you the choices of
variables. Pick the dependent (or response) and independent (or predictors) variables
for your model and click OK.

Note: If you are running a multiple regression, you should select all the independent
variables of your model here.
An Introduction to Regression Analysis Page: 19

Interpretation of the Printout.

1. The Regression Results.

The following tables present the summary of the results produced by MINITAB.

The regression equation is

Y = a + b1X1 + b2X2 + b3X3

PREDICTOR COEFFICIENT STD DEVIATION T P


CONSTANT A sa a/sa p-value
X1 b1 sb1 b1/sb1 p-value
X2 b2 sb2 b2/sb2 p -value
X3 b3 sb3 b3/sb3 p -value

S= R-Sq = R-Sq (adj) =

You must be familiar with the three terms at the bottom of the regression table.

You should also get familiar with the following two columns:
bi
(i) T column: reports the t-value computed with the formula t  ,
sbi
(ii) P column: reports the p-value for a two-tail test.

2. The Analysis of Variance (ANOVA) Table.

SOURCE DF SUM-OF SQUARES MEAN-SQUARE F-RATIO P


REGRESSION k SSR SSR/k F p
ERROR n–k–l SSE SSE/(n - k - l)
TOTAL n–1 SST SST/(n - 1)

Where:

SSR  n  k  1
F 
SSE  k 

p = p-value associated with the test of the significance of the independent variables as a
group

DF =Degrees of Freedom
An Introduction to Regression Analysis Page: 20

CHAPTER 5

ASSIGNMENT 3

5.1. Goals of this Assignment.

This assignment will help us learn how to:

(1) Open old data files and enter new data.


When using MINITAB, you may decide to add another variable to your data file. This
may occur when you run a new regression with slightly different data.

(2) Perform an F-test on all regression parameters.


This is the standard test to do to see if your independent variables as a group are
significant.

(3) Perform an F-test on a subset of variables.


Whenever you drop variables, you must check if the variables as a subset were
significant. If they were, you want to leave them in your regression.

(4) Test for approximate multicollinearity between your independent variables.


Although not a violation of the Gauss-Markov assumptions, approximate multicolinearity
can adversely affect your regression by inflating the standard error of your estimates.

.
Note: There are no new computer procedures presented in this Chapter. All functions
you will perform have already been discussed in Assignments 1 and 2.

5.2. Assignment 3.

The Director has decided to incorporate the information on trade policy, ethnic conflict
and the effort to privatize a large part of the public sector in his analysis. The variable
indicating if there was civil unrest is qualitative and will enter the model as a dummy
variable that takes the value of "1'' if there is either intense or moderate civil unrest and
“0” otherwise. The variable indicating trade policy is another qualitative variable that
takes the value of “1” in the period of trade liberalization and “0” before that. Finally
privatization is captured in the last dummy variable that is given a value of “1” in the
period that the extensive liberalization program policy was in effect and “0” before that.
The Director wants you to run thorough tests on the significance of these data. He also
wants you to test for the presence of approximate multicollinearity between the
independent variables.
An Introduction to Regression Analysis Page: 21

(a) Enter the data into your original data file.

Hint: You must open the old data file using the options File and then “Open Worksheet”
(described in section 4.3) and then add more variables to your data set (described in
section 3.3).

Year Civil Unrest Trade Policy Privatization


1970 0 0 0
1971 0 0 0
1972 0 0 0
1973 0 0 0
1974 0 0 0
1975 0 0 0
1976 0 0 0
1977 0 1 0
1978 0 1 0
1979 0 1 0
1980 0 1 0
1981 0 1 0
1982 0 1 0
1983 1 1 0
1984 1 1 0
1985 1 1 0
1986 1 1 0
1987 1 1 0
1988 1 1 0
1989 1 1 0
1990 1 1 1
1991 1 1 1
1992 1 1 1
1993 1 1 1
1994 1 1 1
1995 1 1 1
Source: International Monetary Fund, 1996

(b) Using Real GDP, Real Interest Rate, Civil Unrest, Trade policy and Privatization as
your independent variables, run a multiple regression analysis with Real Money as
the dependent variable. Print the results and write out the sample regression line (with
standard errors in parentheses under each coefficient).

(c) Test at the 5% level the null hypothesis that the five variables taken as a group have
no significant impact on Real Money (See Chapter 7.).

(d) Which of the five variables has no significant individual impact on Real Money?
Conduct your test at the 5% level.

Hint: You were taught how to interpret an ANOVA table in class.


An Introduction to Regression Analysis Page: 22

(e) Based on your results in (3) run a new regression dropping the insignificance
variable(s). Similarly, print the results and write down the new sample regression line
(with standard errors in parentheses under each coefficient).

(f) Do you think the variables you dropped explain deviation in the dependent variable as
a group? Conduct your test at the 5% level. What conclusion can you make about
your decision to drop variables? (i.e. Was it the correct thing to do or should you have
left the variables in?)

Hint: You must perform this test by hand using the SSE from the original and new
regressions.

(g) Test for the presence of multicollinearity among the independent variables Real
Interest Rate, and Real GDP. Conduct your test at the 5% level.

Hint:compute the correlation coefficient and the p-value. Instructions on how to do this
were discussed in Assignment 1. Print the results.
An Introduction to Regression Analysis Page: 23

CHAPTER 6

ASSIGNMENT 4

6.1. Goals of this Assignment.

In this assignment you will:

(1) Test for heteroscedasticity in your error terms.


Heteroscedasticity violates the Gauss-Markov assumptions and will cause Least Squares
to produce inefficient estimators. This can be both detected and corrected.

(2) Test for autocorrelation in your error terms.


Autocorrelation violates the Gauss-Markov assumptions and will cause Least Squares to
produce inefficient estimators. This can be both detected and corrected.

6.2. Assignment 4.

In your previous assignment, you have shown the Director how real money per capita in
Sri Lanka is affected by real GDP per capita, real interest rates, political unrest, trade and
privatization policy by building a multiple regression model. You have also tested for
multicollinearity. However, the director is interested in knowing if the model you built
has any additional statistical flaws in it. In other words, he wants to know if your model
has problems with autocorrelation and heteroscedasticity.

Hint: You first need to open the original data file. This was discussed in Assignment 2.
The tests for heteroscedasticity and autocorrelation apply to the revised regression
model
where you dropped the insignificant variables.

(a) Test for the presence of positive autocorrelation in the revised model. Conduct your
test at the 5% level.

Hint: Run the revised regression analysis again. Ask the computer to save the residuals,
and to compute the Durbin-Watson statistic used to test for autocorrelation.

(b) Test for the presence of heteroscedasticity in the revised model. Conduct your test at
the 5 %, level .

Hint: You have to run an auxiliary regression in order to test for heteroscedasticity. The
independent variable in the auxiliary regression is the expected value of the dependent
variable from the original regression. The dependent variable is the squared error term
An Introduction to Regression Analysis Page: 24

from the original regression. This data can be found in the file in which you saved your
residuals. Go back and use this data set. Select File then “Open” from the menu bar and
you will see a file with several variables. The ones you are interested in are RESI and
FITS. RESI is your error term. FITS is the expected value of the dependent variable.
Create a new variable in this file which is the squared value of RESI. This procedure is
described below. Then run a regression using this new variable as your dependent
variable and FITS as your independent variable. Record the R2 from this regression.
Print the results from the auxiliary regression.

6.3. Step by Step Procedure.

When you do the regression procedure, MINITAB can create a file that contains the
predicted value of the dependent variable and the sample error terms. It can also compute
the Durbin-Watson Statistic, which you need to use to test for autocorrelation.

1. Go through the steps (described in chapter 4) to run a multiple regression. You should
now have your dependent and independent variables chosen.

2. When you do step 2, MINITAB will show you the dialog box. Click the options
button and check the box for the Durbin-Watson statistic. Click OK.

3. After finishing step 2. Click the storage button. You will see the storage options for
data that can be generated automatically after you do the regression. Check the box
for Fits and Residual. Click OK.

This step will give you the estimated values and the residuals in your worksheet, with
column name FITS1 for the estimated values and RESI1 for the residuals (error terms).
You can change the column names if you want by following the appropriate steps
described in chapter 3. Now, you have to save your residuals and estimates under a new
file name.

4. Select File and then Save Worksheet As. Type in filename and click OK.

You should choose a name that won't get you confused with your original data file. i.e. if
you've called your original data file ASSIGN, call your file that includes the residuals
RESID1.

Note: If you are testing for heteroscedasticity, create a variable that will be the square of
the error term first and then run the regression described in the last paragraph of the
previous subsection.

To estimate the square of the error term, go to Calc then “Matrices” and then
“Arithmetic” and multiply the column of the residuals by itself. Store the results in
another column and go back to step 5.
An Introduction to Regression Analysis Page: 25
An Introduction to Regression Analysis Page: 26

CHAPTER 7

PREPARING WRITTEN CASES


AND ANALYSIS REPORTS

The cases that you will be tackling in your modeling projects require you to communicate
your statistical results somehow. You will be discussing your results in class or computer
lab with your instructor and your other classmates. In this final assignment, you will be
asked to prepare a written report of your findings. Writing effectively will be important in
your professional career and of course oral communication as well because your will
often be asked to give a formal presentation and report your findings. Both oral and
written communication skills are highly valued in the workplace.

Preparing the written summary of your project’s results will be probably just as difficult
as doing the computation. Many students say it takes about as long to write the report as
it does to finish the statistical work.

7.1. Writing Style.

Once you are done with the computer work, in the real world you must interpret these
findings for someone, perhaps a manager or your boss. A good report should be written
so that anyone can understand it. Unless you are writing to someone you know is well-
versed in statistical techniques, avoid using statistical and mathematical jargon in your
report.

For instance, if you were the vice president of marketing who never had a statistical
course, which of the following two summaries would you find most valuable?

 The regression model is given by: SALES = 13,000 + 5ADVERTISING, which


means that the y-intercept is $13,000 and the slope is $5.
 According to the model: (a) if the firm does no advertising, sales will be $13,000, and
(b) each additional dollar spent on advertising will bring in five additional dollars in
sales.

It may be hard at first to write in a non-technical way because you will be using jargon all
throughout the course. So will your instructor. So will your textbook. You will get used
to using specific statistical terms like “least square” and “p-values,” but do not presume
that your reader understand them. You will have to find a way to translate statistical
concepts, methodologies, and outcomes for the uninitiated. Just by virtue of spending
time in your statistics class, you may well forget that certain statistical concepts do not
exist in most people’s vocabularies. Here are some useful guidelines.
An Introduction to Regression Analysis Page: 27

1. Do you have any words in the report that you did not commonly use before you
walked into this class? If you used a word or phrase on a regular basis before signing
up for statistics, it is probably acceptable to use in your write-up.
2. Would your next-door neighbor or roommate understand the essence of your report?
3. What would the editor of your local newspaper think about publishing your report in
tomorrow’s newspaper?

7.2. Introductory Material.

It sometimes helps to provide background information about the problem at hand. This
might include a statement of the problem or situation and the data available to answer this
question. Since your report involves statistical analyses, you might also provide initial
descriptive statistics (e.g., mean and/or standard deviation) of some of the more important
variables, unless that information would simply distract from the point you are trying to
make. By providing this type of background information, you can put the problem in
perspective and also ease the way into the upcoming material.

Needless to say, good grammar, correct spelling, and appropriate punctuation are all-
important components of an effective report. You probably can not convince a reader that
your statistical results are valid if your writing is poor. Sloppy, misspelled and otherwise
disorganized reports send the message that you do not think the report is very important
anyway. Brush up your writing skills. You will discover that it is both fun and valuable to
write effectively. Word processors may help; a spelling checker is useful too. In some
major universities, they also provide writing center that will help the students in doing the
paper or written report of their class.

7.3. The Junked-Up Appendix Problem.

Consider putting supporting statistical documentation, such as graphs, tables, and other
statistical output at the end of your written report. Within the report, refer to these
appendices for guidance. On the other hand, if one particular table or graph really
contains the essence of the point you are making, you probably want to put it right in with
the text, where your reader can see it quickly. A critical graph probably belongs in with
the text. A table that provides a huge amount of information and provides relatively
minor support to your argument probably belongs in an appendix.

You may include the important statistical printout you produced in the text. If you can not
decide which ones are important, you are not done with your analysis yet. Once you have
determined it, the following rule-of-thumb may help:

Do not append a statistical exhibit if you do not refer to it in the report, and of course, do
not refer to an exhibit that is not there.
An Introduction to Regression Analysis Page: 28

7.4. Length of Report.

Your project report should not exceed 10 pages. Remember you are trying to present the
essence of your statistical findings, not a comprehensive validation of every step of your
work. You must identify the fine line between too much detail and not enough detail.
You are not assigned to make a short report that it is deceptive and to bore or intimidate
your intended reader with unnecessary details, either. Only you, as the reporter of the
data, can decide on this issue of what to include.

We suggest you to make the very first page an “executive summary” An executive
summary should be written for someone who has never had a statistics class, and does not
care to have statistics explained. It should include only the important results and
implications derived from the data. It should also include any necessary caveats or
limitations of your findings. Try to limit this executive summary to one page, single –
spaced.

Next, we present two different sample summary reports3 of statistical analysis aiming to
assess the success of a company’s new wellness program.

The first of these reports is not particularly well structured, neither very informative. It
contains a lot of technical language that only a person with statistical background could
understand.

TO: Tom Jones, Director of Personnel


FROM: Jana Smith, Statistician
DATE: 5/19
SUBJECT: Analysis of Personal Data

Per your request, I have analyzed the data on sick days and the company’s new wellness
program. The results are summarized below.

The regression model suggests that there is a statistically significant relationship between
the two variables. The correlation between dollars contributed to the wellness program
and absenteeism is a positive 0.63. The standard error is 0.073. The R-squared statistic
on the simple regression model 56%, which is pretty good for cross-sectional data.
Moreover, the F-statistic measuring the statistical significance of the model as a whole is
54.90, indicating a good model.

When analyzing the relationship by gender, there is no statistically significant impact


here. The p-value on the categorical variable called GENDER (see EXHIBIT II) is .46.
This means that absenteeism does not seem to be correlated with the gender of the
employee. On the other hand, it just might be a multicollinearity problem with the two
independent variables.

I’d be happy to assist in any further analysis of this data at your request.

3
Peter G. Bryant and Marlene A. Smith, “Practical Data Analysis; Case Studies in Business Statistics”
Volume II, University of Colorado at Denver, Irwin, Chicago, 1989.
An Introduction to Regression Analysis Page: 29

The second report on the other hand is well structured and designed to appeal also to
people who are unfamiliar with statistics. In fact it contains a minimum of technical
language but at the same time it offers the opportunity to those who have a technical
background to evaluate the results on their own by including the exhibits.

TO: Tom Jones, Director of Personnel


FROM: Jana Smith, Statistician
DATE: 5/19
SUBJECT: Analysis of Personnel Data

As you asked, I have analyzed the data on sick days and the company’s new wellness
program. Here are my results

Data:
The personnel department provided a sample of 125 randomly chosen employee files.
From those files, we obtained:
 the company’s payment for that employee’s participation in our wellness program,
 that employee’s absentee record (measured in number of absent days) over the past
two years, and
 the gender of the employee.

72 % of the sample group was female.

Results:
1. On average, our employee missed 15 days of work in the first year. The typical
fluctuation around the average of 15 days was seven days.
2. In the second year, after starting the wellness program, the average number of absent
days declined to 10 days and the typical fluctuation also decreased to 2 days. We
committed about $50 per employee to the wellness program last year.
3. A statistical model (see EXHIBIT below) of the relationship between dollars
committed to wellness, absenteeism, and gender indicates that the wellness program
has a statistically significant relationship to absenteeism. The model suggests that:
 each additional dollars committed to our wellness program was associated with a two
hour decline in absenteeism, and
 there is no statistically significant relationship between gender and absenteeism.

The model would generally be considered a statistically strong one, since the model
explains 56% of the variation in absenteeism. Although this leaves 44% of the variation
unexplained, it is difficult to do much better with the type of data available for this
analysis.

Recommendation:
Because of the decline in absenteeism after the institution of the wellness program, I
recommend that the program be continued.

Limitations:
Consider redoing this study in another year with a larger sample size. It is not clear
whether one year is enough to observe the full benefits of the wellness program. There
are some troubling aspects of the statistical results that might be alleviated with a larger
sample size. For instance, many of the standard errors model (that is, measures of the
accuracy of the estimates) are quite large in my opinion.
I’d be happy to answer any further questions that might have about my analysis or report.
EXHIBIT
An Introduction to Regression Analysis Page: 30

The regression equation is

ABSENT= 12.2 – 0.26 Dollars + 0.07 GENDER

Predictor Coef Stdev t-ratio q


Constant 12.2 3.37 3.62 0.00
DOLLARS -0.26 0.04 6.50 0.00
GENDER 0.07 0.25 0.28 0.80

S = 7.9 R-sq = 56.2% R-sq (adj) = 55.5%

Analysis of Variance

SOURCE DF SS MS F q
Regression 2 9770 4885 78 . 3 0 . 000
Error 122 7614 62 .4
Total 124 17384
An Introduction to Regression Analysis Page: 31

CHAPTER 8

MODELING PROJECT

“If you torture the data long enough, Nature will confess”.
RONALD COASE

For the modeling project you must construct a regression model of a real world system of
interest to you. First you must decide what your dependent variable is. What do you wish
to explain? Sales? Housing prices? Capital spending in the economy? New business
incorporation? The poverty rate? Find a topic that interests you. Then find variables that
could potentially explain the variation in the dependent variable.

Finding data on the variables you selected can be frustrating sometimes. As a matter of
fact, finding data is probably the most difficult part of this project and you should not
underestimate the effort you have to exert. Plan to devote at least two weeks on data
collection. The following are some mistakes that students often make while working on
their projects:

1. Students think that they do not need data on the dependent variable. This of course is
incorrect. You must find data for both the dependent and independent variables.
2. Students think they have to find both time series and cross-sectional data. There are in
fact panel data techniques to handle this type of information but we are not dealing
with them here.
3. They get the idea that all of the numerical information has to be either in rates, whole
numbers, indexes or percentages. This is not the case.
4. Putting off doing the assignment or finding the data.

Students often face some of the following problems:

1. Considering constant and current dollars—which set of figures do I use?


2. Data for the same item and same year may vary in different editions. This may
happen because in different editions they may use a different base year (e.g. for the
CPI). It may also be that the way the data is collected or categorized has changed and
older figures have been recalculated or adjusted. It is recommended to use the most
recent edition.
3. They look for information that is not quantifiable e.g. mental health, lonely people,
happiness of couples in long distance marriages etc.
4. They spend too much time trying to find information that is not available. I would
suggest to avoid considering models on the following topics: abortion, adoption,
alcoholism, drug use, drunk-driving, mental health and religion. Data on these
subjects is either unavailable, not collected uniformly and/or is difficult to locate.
Long-term socioeconomic and demographic concerns as well as financial topics are
the best areas from which to select a topic.
An Introduction to Regression Analysis Page: 32

Chapter 9 of this manual contains sources for finding data on the internet. It is an
excellent list of sources, and you should start looking for your data there. We will discuss
strategies in class in more details. An enormous amount of government statistics are
available ranging from Gross National Product to sales of Girl Scout cookies. Do not give
up on finding the data before you've tried 4. Once you have found the data, you must use
MINITAB to analyze your model.

The functions and procedures needed to complete your modeling project have been
covered in the description of the first four assignments in this manual. The modeling
project will be much like your class assignments, except that you will be using your own
data and model. You must justify the model you have chosen, and your results must be
written up in the form of a report. If you have a hard time performing a particular test, go
back to chapters three through six to refresh your memory on the correct procedures.

Modeling Project Write-up.

You may discuss methodology with your fellow students, however, you must work on the
Modeling Project independently.

The Modeling Project must be typewritten (double-spaced) and should not exceed 10
pages. On your title page, you should have the name of the course (i.e., ECON 4223),
your section number, the semester (e.g., Fall 2001), the title of your paper, your Social
Security Number, and your name.

Part 1. Executive Summary.

1. Refer to the previous chapter for more details.

2. Present the original population model in variable notation.

a) Define the dependent and independent variables and specify the units in which
each is measured. (If you are using real vs. nominal data, you must specify the
base year.)

b) State the data sources for each variable. Note any difficulties you had obtaining
the data, and your techniques to overcome them.

4
You could look at demographic issues, federal funds, taxes, spending, employment, education, social
services, health, transportation, issues of interest to the local government, natural resources, science
technology, exports, imports, stock prices etc.
An Introduction to Regression Analysis Page: 33

Part 2: Statistical Analysis.

1. State (mathematically and explain in words) all the assumptions you need to make in
order to estimate the model.

2. Write out the estimated regression equation for the first computer run, with standard
errors in parentheses under each coefficient. Also present the R2, adjusted R2, F-
statistic and Durbin-Watson statistic.

3. Evaluate and interpret R2 and adjusted R2.

4. Test for multicollinearity. Discuss the consequences of multicollinearity (if any) for
your model.

5. Perform a test for the overall significance of the regression (F-test).

6. Perform the tests of significance for the individual regression coefficients (t-tests). If
these tests indicate that some of the regression coefficients are insignificant, then drop
the corresponding variables and estimate a revised model (obtain a second run).

7. If you drop more than one independent variable, test whether the variables you
dropped are significant as a group (F-test on subset).

8. Write out the estimated regression equation for the second computer run, with
standard errors in parentheses under each coefficient. Also present the R2, adjusted
R2, F-statistic and Durbin-Watson statistic.

9. Repeat steps (3), (5), and (6) for the revised model. Compare R 2 and adjusted R2 for
the two models and comment on the differences (i.e., what do the changes in these
numbers between models tell you?).

10. Based on the analysis so far, select between the original and revised models the one
that best fits the data. Interpret each estimated regression parameter in the context of
the problem (i.e., interpret the intercept and the coefficients).

11. Calculate and interpret confidence intervals for the regression coefficients.

12. Test for (i) autocorrelation and (ii) heteroscedasticity. Discuss the consequences of
autocorrelation and heteroscedasticity (if any) for your model. Are your estimators
still BLUE? Are inferential procedures such as hypothesis testing and confidence
intervals reliable?
An Introduction to Regression Analysis Page: 34

Part 3: Conclusions.

State your conclusions regarding the model(s) you have estimated. Review the original
and revised models. Comment on your testing procedure Discuss any problems your
model might have. Finally, offer any interesting implications of your model.

Part 4: Data Appendix.

Present the data set that you used. Include a printout of your data set from the computer.

Part 5: Computer Printouts.

Include the computer printouts for all runs. Write your name and which model (original
or revised) on each page of the printout.

Put the paper and the printouts in an envelope, or staple everything together.
An Introduction to Regression Analysis Page: 35

The Proposal for the Modeling Project.

Specify what you wish to predict or explain (the subject of your paper). Explain clearly
why this subject is interesting. Identify the variables (dependent and independent)
involved in your analysis. Discuss the theoretical relationship between the dependent and
independent variables. You not only need to make a prediction about what sign you
believe the sample regression coefficients will take, but you also need to explain why!
For example, suppose your dependent variable is deforestation and one of your
independent variables is price of cattle. You need to predict that there is a positive
coefficient on the variable, but also why you believe it is positive. (Perhaps higher cattle
prices induce ranchers to clear more land to raise more cattle.) Collect the data that will
be used for the analysis. Enter your data in Minitab worksheet and attach a copy of the
data file to this proposal. The proposal should be typed.

Make sure you include the following information:

A. Title of Project

Are you trying to explain cross-sectional or time series variations in the dependent
variable?
What is your sample size?

B. Definition of variables: Write out your hypothesized population regression model.

(1) a. Define your dependent variable.


b. In what units is it measured?
c. Where did you obtain data on this variable?

(2) Provide the following information on each independent variable separately:

a. Define the independent variable.


b. In what units is it measured?
c. What sign do you expect the coefficient to have?
d. Briefly explain why you think this independent variable affects the dependent
variable in the way stated.
e. Where did you obtain data on this variable?

INCLUDE AT LEAST 6 INDEPENDNENT VARIABLES.

Reminder: When you submit your proposal to me don’t forget to include Xerox copies
of the actual tables that included the data and a printout of your data set from the
computer.
An Introduction to Regression Analysis Page: 36

CHAPTER 9

DATA SOURCES

http://midas.ac.uk/macro_econ MIDAS.
http://dawww.essex.ac.uk ESRC Data Archives.
ftp://stats.bls.gov/ US Labor Time Series.
http://www.worldbank.org World Bank.
http://www.tri.org.au/, the Theoretical Research Institute tr(I), Sydney.
http://nilesonline.com/data/ A Journalist guide, by Robert Niles

U.S. Macro and Regional Data:

http://www.fedstats.gov/ FEDSTATS.
http://www.whitehouse.gov/fsbr/esbr.html Economic Statistics Briefing Room (ESBR) -
White House.
http://www.gpo.ucop.edu/info/econind.html Economic Indicators 104th Congress.
http://www.csufresno.edu/Economics/econ_EDL.htm Econ Data & Links.
telnet://ebb.stat-usa.gov EBBat the Commerce Department.
http://www.stat-usa.gov U.S. Department of Commerce (STAT-USA).
http://www.bea.doc.gov/ Bureau of Economic Analysis.
http://nces01.ed.gov/NCES/ National Center of Educational Statistics
http://www.inform.umd.edu:8080/EdRes/Topic/Economics/EconData EconData.
http://stats.bls.gov/blshome.html Bureau of Labor Statistics (LABSTAT).
http://www.cdc.gov Center for Decease Control
http://www.gpo.ucop.edu/catalog/erp97.html 1997 Economic Report of the President via
GPO Gateway at UCSD.
http://www.globalexposure.com Business Cycle Indicators from Media Logic.
http://www.nber.org/databases/macrohistory/contents/index.html NBER's Macro-
Historical Database.
http://www.lib.virginia.edu/socsci/reis/reis1.html Regional Economic Information
System.
http://www.lib.virginia.edu/socsci/ccdb/ County and City Data books, Interactive Data
Resources, University of Virginia Social Sciences Data Center.
http://bos.business.uab.edu/charts.htm Economic Chart Dispenser.
http://bos.business.uab.edu/data/data.htm Economic Time Series Page.
http://www.lib.virginia.edu/socsci/nipa/ National Income and Product Accounts,
Interactive Data Resources, University of Virginia Social Sciences Data center.
http://www.bog.frb.fed.us Board of Governors of the Federal Reserve System.
http://www.econ-line.com National Economic Research & Data Services (NERDS).
An Introduction to Regression Analysis Page: 37

Other U.S. Data:

http://www.law.vill.edu/fed-agency The Federal Web Locator.


http://www.house.gov/jec/welcome.htm Joint Economic Committee.
ftp://ftp.cu.nih.gov/NARA_ELECTRONIC National Archives Center for Electronic
Records.
http://www.ssa.gov Social Security Administration (OSS-IS).
http://www.oseda.missouri.edu/usinfo.html Missouri State Census Data Center -
Summary U.S. Census Info.
http://www.census.gov U.S. Census Bureau.
http://govinfo.kerr.orst.edu Government Information Sharing Project.
http://www.ustreas.gov U.S. Department of the Treasury.
http://www.ustr.gov The Office of the United States Trade Representative.
http://www.usitc.gov United States International Trade Commission (USITC).
http://usda.mannlib.cornell.edu/usda USDA Economics and Statistics System (Cornell
University).
http://www.econ.ag.gov USDA Agriculture Economic Research Service.
http://www.bts.gov National Transportation Statistics.
http://www.eia.doe.gov Energy Information Administration (DOE).
http://www.eia.doe.gov/energy Energy Resources Board (DOE).
http://www.fdic.gov Federal Deposit Insurance Corporation (FDIC).
http://www.stat-usa.gov/BEN/Services/ntdbhome.html National Trade Data Bank.
http://www.hist.umn.edu/~ipums Integrated Public Use Micro data Sample (IPUMS).
http://www.umich.edu/~psid Panel Study on Income Dynamics.
http://DPLS.DACC.WISC.EDU/SAF/ Study of American Families, 1994.
http://www.icpsr.umich.edu/GSS/ General Social Survey.
http://www.icpsr.umich.edu Inter-university Consortium for Political and Social
Research (ICPSR).
http://www.umich.edu/~hrswww The Health and Retirement Study (HRS) and Asset and
Health Dynamics Among the Oldest Old (AHEAD).
http://www-leland.stanford.edu/~doncram/crsp.html CRSP Data Access and Analysis.

World and Non-U.S. Data:

http://www.access.digex.net/~grimes/gate.html International Economics Gateway.


http://www.patriot.net/users/bernkopf Central Bank Resource Center.
http://nmg.clever.net/wew World Economic Window (Bonaparte Inc.).
http://lissy.ceps.lu/index.htm Luxembourg Income Study (LIS).
http://www.oecd.org Organization for Economic Cooperation and Development
(OECD).
http://www.imf.org International Monetary Fund (IMF).
http://www.worldbank.org World Bank.
An Introduction to Regression Analysis Page: 38

http://www.ciesin.org/IC/wbank/sid-home.html World Bank Social Indicators of


Development.
http://www.ciesin.org/IC/wbank/tde-home.html World Bank Social Trends in
Developing Economies (TIDE).
http://www.worldbank.org/html/prdmg/grthweb/growth_t.htm Economic Growth
Research (World Bank).
http://www.worldbank.org/html/fpd/technet World Bank's Technet: Think Tank
Electronic Conferences.
http://www.worldbank.org/html/prdph/lsms/lsmshome.html World Bank Living
Standards Measurement (LSMS).
http://bizednet.bris.ac.uk:8080/home.htm BizEd Net - CSO Data (U.K.).
http://www.worldbank.org/html/lat World Bank Latin America and Caribbean Region
Technical Department.
http://www.asiandevbank.org Asian Development Bank.
http://europa.eu.int/eurostat.html Eurostat.
http://www.iadb.org Inter-American Development Bank.
http://datacentre.epas.utoronto.ca:5680/pwt/pwt.html Penn World Tables at the EPAS
Computing Facility at the University of Toronto.
http://www.un.org/Depts/unsd/index.html United Nations Statistics Division/DESIPA.
http://www.unicc.org/itc/Welcome.html The International Trade Center
UNCTAD/GATT (ITC).
gopher://gopher.un.org/11/esc/cn17 UN Commission on Sustainable Development
(CSD).
http://www.fao.org Food and Agricultural Organization of the UN.
http://pacific.commerce.ubc.ca/xr PACIFIC Exchange Rate Service Retrieval Interface.
http://midas.ac.uk Manchester Information Datasets and Associated Services (MIDAS).
http://www.statcan.ca Statistics Canada.
http://strategis.ic.gc.ca/sc_ecnmy/sio/homepage.html Canadian Industry Overviews
http://www.shcp.gob.mx Ministry of Finance (Mexico).
http://www.ex.ac.uk/~ajgibson/scotdata/scot_database_home.html Scottish Economic
History Database, 1550 – 1780.
http://www.hm-treasury.gov.uk H.M. Treasury (U.K.).
http://www.etla.fi Arch Institute of the Finnish Economy (ETLA).
http://rabobank.info.nl/engels/default.htm Summary Information on the Dutch Economy
at Rabobank.
gopher://tcmb580.tcmb.gov.tr Central Bank of Turkey.
http://www.mfa.gov.tr/GRUPC/GRUPC.HTM Turkish Ministry of Foreign Affairs,
Business & Economy Data.
http://alfa.nic.in/indiabudget India Economic Survey 1996 and Budget.
http://www.bundesbank.de/index_e.html Bundesbank.
http://www.brandenburg.de/statreg Statistik Regional (German Regional Statistics).
http://www.statistik-bund.de/e_home.htm German Federal Statistical Office.
http://www.rbnz.govt.nz Reserve Bank of New Zealand.
http://www.treasury.govt.nz New Zealand Treasury.
An Introduction to Regression Analysis Page: 39

http://www.statistics.gov.au Australian Bureau of Statistics.


http://www.boj.go.jp/en/index.htm Bank of Japan.
http://econom10.cc.sophia.ac.jp/needs/index.htm Japanese Macro Data.

Financial Markets:

http://www.finweb.com FINWeb.
http://www.cob.ohio-state.edu/dept/fin/osudata.htm Financial Data Finder at Ohio State.
http://www.wsrn.com Wall Street Research Net.
http://www.wsdinc.com Wall Street Directory.
http://www.tsi.it/finanza/index.html Finance Area by Top Services International.
http://www.globalfindata.com Global Financial Data.
http://www.briefing.com Briefing by Charter Media.
http://www.bloomberg.com Bloomberg Personal.
http://www.sec.gov/edgarhp.htm EDGAR (SEC).
http://turnpike.net/metro/holt/index.html Martin Wong's and George Holt's Market
Report.
ftp://sunsite.unc.edu/pub/archives/misc.invest Public Domain Financial Data.
http://www.quote.com QuoteCom Data Service.
http://www.secapl.com/cgi-bin/qs Security APL QuoteServer.
http://www.jpmorgan.com JP Morgan.
http://www.wiwi.uni-frankfurt.de/AG/JWGI Student Investment Club at Johann
Wolfgang Goethe University.
http://www.charm.net/~lordhill New Zealand Investment Center.
http://www.asiawind.com/pub/hksr InTechTra's Hong Kong Stocks Reports.
http://www.fid-inv.com Fidelity Investment.
http://www.vanguard.com The Vanguard Group, Inc.
http://networth.galt.com NETworth.
http://www.schwab.com Schwab Online.
http://www.etrade.com E*Trade Securities, Inc.
http://www.lombard.com Lombard Institutional Brokerage, Inc.
http://www.gwdg.de/~ifbg/bank_2.html Banks of the World via the Institute for Finance
and Banking at the University of Göttingen.
http://www.moneyline.com MoneyLine - Real Time Fixed Income Data.
http://www.yahoo.com/r/sq Yahoo Finance, Stock Quotes
An Introduction to Regression Analysis Page: 40

CHAPTER 10

CASE STUDIES

This chapter presents a collection of case studies describing real world phenomena that
we will analyze in class. The corresponding data are provided in the appendix. We will
use this information to explain the behavior of variables of interest to us and discuss the
econometric issues arising from the nature of typical economic data.

10.1. Worldwide Fertility Rate5.

The World Development Report, among other sources, indicates that in less developed
countries the average number of births is consistently higher that that in developed
countries. For obvious reasons, this phenomenon presents both economic and
humanitarian challenges worldwide. From a public planning aspect, this problem presents
governments with several institutional challenges, most notably as the formidable task of
providing public health care, education and nutrition to a high population of newborns is
essential.

To investigate this problem, a study is conducted to clarify and define the possible factors
that may or may not be contributing to the number of births a mother chooses to give.
Potential factors included in the data set are the infant mortality rate, life expectancy at
birth, percentage of women participation in the labor force etc. The specific variables,
their definitions and units of measurement are described below. All data refer to 1995.
There are 103 countries in the sample and 7 explanatory variables in the model.

Definitions:

Tot Fert Rt Total Fertility Rate is the number of children who would be born to a
woman if she were to live to the end of her childbearing years and bear
children in accordance with the age-specific fertility rates measured as the
estimated births per woman per country.

Inf Mort Infant Mortality Rate is the number of infants who die before reaching one
year of age, expressed per 1000 live births per year.

% Urban The percentage of population that lives in urban areas.


5
The data for this case come from a variety of International Organization sources, including: the 1998/1999
World Development Report, World Bank; Yearbook of Labor Statistics, International Labor Organization;
Surveys and Report on Educational, Scientific and Cultural, United Nations Educational, Scientific, and
Cultural Organization (UNESCO); Regional Economics Surveys, United Nations Regional Economic
Commissions (UNREC). The data set was collected by William L. Ridley.
An Introduction to Regression Analysis Page: 41

%F in LF Females as a Percentage of the Labor Force is a measure of the extent to


which females are active in the work force.

Gini Ind Gini Index is the measure of equality of the distribution of wealth, the
higher the index, the lower the economic equality.

TV Sets/K Television sets per 1000.

GNP/PPP Gross National Product measured at Purchasing Power Parity is a measure


of real GNP converted to 1993 USD by the purchasing power exchange
rate, sometimes called international dollars.

Life Exp Life Expectancy at Birth is defined as the number of years a newborn
child would live if patterns of mortality rate prevailing at the time of its
birth were to stay the same throughout its life.

Based on the information above and the data provided in the appendix, perform the
following analysis:

1. Estimate the model:

Tot Fer Rt = + 1 Inf Mort Rate + % Urban + %F in LF + Gini Index
+TV Sets/K + GNP/PPP +  Life Exp +i

2. Write out the sample regression equation and the appropriate t-statistics below the
coefficients in the equation.

3. Interpret all the coefficients in this equation.

4. Interpret the R2 and adjusted R2 produced by the regression.

5. Perform a test on all the regression parameters taken as a group.

6. It can be verified that at a 5% significance level the variables “% Urban”, “Gini


Index” are insignificant. Drop them and rerun the regression. Comparing the revised
model to the original described earlier, which equation you think is better in
explaining the relationship between total fertility rate and its determinant factors?

7. Is there any discrepancy in the estimated coefficients of the two models? What could
be the reason for the difference?
An Introduction to Regression Analysis Page: 42

10.2. Norman Housing Prices6.

Lisa is a student at the University of Oklahoma who is planning to open her own business
in Norman after graduation. She is currently living in an apartment but she intends to buy
a house as soon as she graduates. Her friends have suggested her to look for a house on
her own without going to a real estate agent, to avoid the hefty commission. The decision
was difficult since it is hard to quantify a variety of characteristics of a house that cause
one home to be more popular than another (i.e., the quality and design of the wallpaper in
the upstairs bathroom, the overall desirability of the neighborhood and others).

With help from friends, Lisa collected data to get an initial feel for the type of houses
around Norman. The data set contains information on the selling price, number of
bedrooms and bathroom, number of square feet for 84 houses sold in January 1999 and
other quantitative and qualitative variables. Three of the variables of interest are
categorical. The variable Half Baths takes the value of 1 if the house has at least one half
bath and 0 otherwise. The variable depicting the age of houses is classified into 7
categories and is incorporated through the use of 6 dummy variables. Zone represents the
geographical area of homes. We included all homes in 5 zones and incorporated them in
the model using 4 dummy variables. The description of all variables in the data set
follows.

Price = the selling prices of houses in dollars;


Beds = the number of bedrooms;
Full Bath = the number of full bathrooms;
Living Area = the total number of living rooms in the house;
Garage = the number of cars that can fit in the garage of a house;
Square Feet = the total number of square feet in the house;
Days on Market = the number of days the house was on the market before it was sold;
Half Baths = 1 if the house has at least one half bath and 0 otherwise;
Age 0 = 1 if the house is new or currently under construction and 0 otherwise;
Age 3 = 1 if the house is 1 to 3 years old and 0 otherwise;
Age 10 = 1 if the house is four to ten years old and 0 otherwise;
Age 20 = 1 if the house is 11 to 20 years old and 0 otherwise;
Age 30 = 1 if the house is 21 to 30 years old and 0 otherwise;
Age 60 = 1 if the house is 31 to 60 years old and 0 otherwise;
Zone 1 = 1 if the house is located on the east side of Norman and 0 otherwise;
Zone 2 = 1 if the house is located on the west side of town and 0 otherwise;
Zone 3 = 1 if the house is in Moore7 and zero otherwise;
Zone 4 = 1 if the house is in Oklahoma county and Oklahoma City and zero otherwise.

6
The data for this case study were collected by Lisa Garrett who got help from Leslie Robertson, a Dillard
Real Estate Associate in Norman.
7
Moore is the neighbor city which is only 10 miles to the north of Norman. Moore and Norman are two of
the cities which are located in Cleveland County, Oklahoma.
An Introduction to Regression Analysis Page: 43

Taking the data set, presented in the appendix, into account perform the following
analysis:

1. Estimate the basic model:

Price = + 1*BEDS + LIVING AREA + FULL BATHS + HALF


BATHS + AGE0 + AGE3 +AGE10 + AGE20 +AGE30 +
AGE60 +GARAGE + ZONE1 + ZONE2 + ZONE3
+ZONE4 + SQ FEET + DAYS ON MARKET + i

2. Write out the sample regression equation and appropriate standard error below the
coefficient in the equation.

3. Interpret the R2 and adjusted R2.

4. Perform a hypothesis test that all the independent variables taken together do not
affect prices.

5. Interpret the coefficient of all dummy variables in this model.

6. Test the significance of each variable, qualitative or quantitative. State the null and
alternative hypothesis, the decision rule, and for each case separately your decision
and conclusion. Based on the results of this hypothesis testing, which variables are
insignificant?

7. Drop those insignificant variables, estimate the revised model and write out the new
equation.

8. Perform the test to verify if the variables you dropped were significant as a group (F-
test on subset). Which model is best to be used to interpret the estimates? Explain.

10.3. Apartment Hunting8.

Norman, Oklahoma is the city where the University of Oklahoma is located. The
University is a national leader in meteorology and energy-related disciplines. It is a
doctoral degree-granting research university serving the educational, cultural and
economic needs of the state, region and nation. Created by the Oklahoma Territorial
Legislature in 1890, the university has 18 colleges offering 134 bachelor's degrees, 82
master's degrees, 51 doctoral degrees, four graduate certificates, and one professional
degree. OU enrolls more than 25,000 students. As a result, Norman is highly populated
with college students who often choose to live in apartment complexes located near the
campus to minimize their daily commuting.

8
The data for this case was collected by Mandy Miller in Spring 1999.
An Introduction to Regression Analysis Page: 44

In this study, we collected information from 44 apartment complexes to explain the


relationship between monthly rental price (the dependent variable Y) for an apartment in
Norman and the following set of independent variables:

x1 = the size of a given apartment in square feet;


x2 = number of bedrooms in a given apartment;
x3 = number of bathrooms in an apartment;
x4 = the presence of a fireplace;
x5 = access to a washer and dryer;
x6 = the distance to the University of Oklahoma campus;
x7 = a complex’s rules regarding pets;
x8 = the presence of a security alarm in an apartment;
x9 = the presence of a bus stop near the property;
x10 = the presence of some kind of recreation facility;
x11 = whether or not an apartment is furnished;
x12 = the age of the apartment complex;
x13 = whether or not an apartment is “all bills paid”.

The data can be found in the appendix.

With this information in mind and the available data perform the following analysis:

1. Estimate the model:

yi = + 1x1i + 2x2i + 3x3i + 4x4i + 5x5i + 6x6i + 7x7i + 8x8i 


9x9i10x10i11x11i12x12i13x13ii
Write out the sample regression equation. Be sure to write the appropriate standard
error below each coefficient in the equation.

2. Interpret the coefficient of x5, x10, x11, x12, x13 in the context of this problem.

3. The plot of rental price (y) against the age of houses (x 12) suggests that the
relationship between these variables is nonlinear. Include a quadratic term to capture
this effect. Call this new variable x14. Rerun the regression and write down the
estimated regression equation.

4. From this last regression it can be verified that the only statistically significant
variables at the 5% significance level are x1 x2 , x5 , x6,x9, x10,x12 ,x13 and x14. Drop the
insignificant variables and re-estimate the following model:

yi = *+ 1*x1i + 2*x2i  5*x5i  6*x6i  9*x9i + 10*x10i  *12x12i13*x13i14*x14ii*


Write out the sample regression equation. Be sure to write the appropriate standard
error.

5. Interpret all estimated coefficients of this model.


An Introduction to Regression Analysis Page: 45

6. What are the consequences of omitting important explanatory variables if they are
correlated with existing ones in the model?

10.4 Advertising and Sales9.

“Advertisements contain the only truth to be relied on in a newspaper”


THOMAS JEFFERSON

Advertising has many purposes. An advertisement may inform consumers that a firm has
a new product or the lowest price, or it may help to differentiate the firm’s product from
that of its rivals. A firm uses advertisements to inform consumers of its product’s
strengths.

Advertising has been with us for a long time, although its forms have changed along with
technology. In ancient years, street criers announced to all those who would hear the
imminent sales of slaves, cattle, and imports. Later, when most populace was still
illiterate, merchants displayed signs with symbols calling attention to their shops, such as
a loaf of bread for a baker or a horseshoe for a cobbler. Benjamin Franklin pioneered the
use of print advertising in ten United States in the 1700’s. Today, much advertising
employs electronic media such as radio and television.

Advertising is a fascinating topic because it is an instrument that affects sales and entails
a massive expenditure of funds. Economists believe that variations in aggregate
advertising plays an important role on macroeconomic aggregate demand. In the present
study you are asked to examine the relationships between sales (Y) and advertising
expenditure (X), both in thousand of dollars, of Lydia E. Pinkham, using the twenty-five
annual observations that are presented in the appendix. In particular, perform the
following analysis:

1. Estimate the model

Yt = 1 + 1*Xt + t. (1)

2. Estimate the following model:

Yt = 1 + 1*Xt + *Yt-1 + t (2)

3. Interpret the coefficient .

4. What would you say about the coefficient of determination in this regression?

5. What is the short run effect on sales of a change in advertising expenditure in the
current period?

9
Data from Paul Newbold, “Statistics for Business & Economics” fourth edition, University of Illinois,
Urbana-Champaign, Prentice Hall, New Jersey, 1955, page 564 (Additional Topics in Regression Analysis)
An Introduction to Regression Analysis Page: 46

6. What is the long run effect on sales of a change in advertising expenditure in the
current period?

7. Interpret the R2 and perform the F-test for equation (2).

8. Estimate the transformed model:

lnYt = 1 + 1*Xt + t. (3)

9. Interpret the coefficient 1. Can you explain the main difference between equation
(1) and (3)?

10. Estimate the following model:

lnYt = 1 + 1*lnXt + *lnYt-1 + t (4)

11. Finally, consider the following formulation:

Yt = 1 + 1*X1t + 2*X2t + *Yt-1 + t (5)


where X2t = (X1t)2
12. Which model fits the data best?

10.5. School Lunch Program10.

Lake County, Colorado is located about 100 miles west of Denver in the Colorado
Rockies. The county is home to the highest airport in the continental USA; much of the
county lies above 9,000 feet. The economy is largely dominated by mining (with its
boom and bust cycles) and tourism.

As in many other rural areas, the school lunch program is an important component of
public policy. For many poor children, the lunch they get in school provides most of their
daily nutrition. Over the last few years, the average daily number of lunches served has
generally declined. Since the average per capita income in the county has also declined,
the director of the program has been eager to understand the reasons for the decline. If the
families were getting poorer and simultaneously fewer lunches were being served, it
would raise questions about how well the program is serving the public need.

In the appendix you will find relevant data for eleven years to analyze this issue. The
variables include: YEAR, the calendar year; POP, Lake County population; UNEMPL,
percentage of unemployment in the state of Colorado; LKUNEMP, percentage of
unemployment in Lake County; LUNCH, average daily lunches served; INCOME,

10
Peter G. Bryant and Marlene A. Smith, “Practical Data Analysis; Case Studies in Business Statistics”,
University of Colorado at Denver, Irwin, Chicago, 1989.
An Introduction to Regression Analysis Page: 47

average per capita income in Lake County; and ENROLL, enrollment in Lake County
schools.

With this information in mind and the available data perform the following analysis:

1. Estimate the model: yt = *+ 1*x1t + t


where y = average daily lunches served (LUNCH) and
x1 = average per capita income in Lake County (INCOME);

a) Write out the sample regression equation that expresses y as a function of x 1. Be


sure to write the appropriate standard error below the coefficient in the equation.

b) Does average income per capita affect the average number of lunches served in
Lake County? Perform a test at a 5% significance level.

2. Estimate the model yt = + 1x1t + 2x2t + 3x3t + 4x4t + t

where
y = Average daily lunches served (LUNCH);
x1 = Average per capita income in Lake County (INCOME);
x2 = Percentage of unemployment in Lake County(LKUNEMP);
x3 = Lake County population (POP) and;
x4 = Enrollment in Lake County schools (ENROLL).

a) Write out the sample regression equation that expresses y as a function of x 1, x2, x3
and x4. Be sure to write the appropriate standard error below the coefficient in the
equation.

b) Interpret the coefficients of x1 and x2 in the context of this problem.

c) Test the significance of each variable in this model. Perform the tests at a 5%
significance level. State the null and alternative hypothesis, the decision rule, and for
each case separately your decision and conclusion. (Hint: The p-value test is the
easiest to perform.)

d) Perform a test on all the regression parameters taken as a group.

e) Just by looking at the regression results, is there any sign of multicollinearity in


this model? Explain. (Hint: Can you identify any of the standard symptoms of
multicollinearity in this model?)

f) Perform formal tests to verify the existence or absence of multicollinearity in this


model.

g) Do you think there is evidence that the program is not serving well the public
needs? Should the director of the program worry about that?
An Introduction to Regression Analysis Page: 48

10.6. Longley Data11.

The purpose of this study is to explain variations in total employment during the period
from 1947 to 1962. The Longley data span the years of the Korean conflict, which ended
in 1953. We employ information on the gross national product (GNP) and the GNP
deflator, the size of the armed forces and the variable YEAR, which is included to
capture any potential time trend. With this information in mind and the data provided in
the appendix perform the following analysis:

1. Estimate the basic model:

EMPLOYMENT = + 1*YEAR + GNP Deflator + GNP+ ARMED FORCES + i

2. Write out the sample regression equation with the appropriate standard error below
each coefficient in the equation.

3. Although the observations of the last year in the data set do not appear to be unusual,
drop them and re-estimate the model.

4. Compare the estimates of the original and revised models. Comment.

10.7. Academic dishonesty12.

Academic dishonesty is a basis for disciplinary action and includes but is not limited to
activities such as cheating and plagiarism (presenting as one's own the intellectual or
creative accomplishments of another without giving credit to the source or sources).
The faculty member in whose course an act of academic dishonesty occurs has the option
of failing the student for the academic hours in question. The faculty member may
consent to refer the case to other academic personnel for further action. Most colleges
have provisions for more severe penalties including expulsion.

Despite all this, cheating is more widespread at the nation's colleges and universities than
it was years ago because it no longer seems to carry the stigma it used to. “Less social
disapproval and increased competition for spots in graduate schools have made students
more willing to do whatever it takes to get the grades” said Professor Donald McCabe, a
researcher at Rutgers University who has done extensive research on student cheating.
He also remarked that “if students feel disadvantaged because others are cheating and
seeming to get away with it, they'll say: I'm not stupid enough to blow my chances by not
doing the same.”
11
Longley, J “An Appraisal of Least Squares Programs from the Point of the User”. Journal of American
Statistical Association, 62, 1967, pp. 819-841.
12
This survey was conducted by Luke Albright and Jeff Cotner.
An Introduction to Regression Analysis Page: 49

According to McCabe's research, incidents of academic dishonesty have increased, even


at schools that have honor codes. We conducted a survey of 75 students at the University
of Oklahoma in an attempt to identify factors that increase the likelihood of cheating.

The variables used in our analysis are:

Times Cheated: The number of times students cheated in the past semester.
Hrs Studied: The number of hour’s a student studied per week.
Classes Attended: The number of class periods attended throughout a month.
Hrs Worked: The number of hours one works per week.
$ on Alcohol: The average amount of money spent on alcohol in a week.
Hrs TV: The average number of hours one spends watching television a week.
Witness Cheating: The average number of times a student witnessed others around
cheating in the past semester.
Class Status (dummy variables):
Freshman: =1 if the student was a freshman and 0 otherwise.
Sophomore: = 1 if the student was a sophomore and 0 otherwise.
Junior = 1 if the student was junior and 0 otherwise.

All data was obtained through a survey given to 75 different students on the University of
Oklahoma campus.

Estimate the model:

Times Cheated = a + b1* Hrs Studied + b2* Classes Attended + b3* Hrs Worked +
b4 * $ on Alcohol + b5* Hrs TV + b6* Witness Cheating + b7* Freshman +
b8* Sophomore + b9* Junior.

Drop the insignificant variables refine and re-estimate your model.

Create a dummy variable that takes the value of 1 if the student was a senior and zero
otherwise. Is the behavior of seniors different than that of the rest of the students? Is their
behavior different depending on the time they spend studying?

Test for the presence of heteroskedastisity. Comment.

10.8. Restaurant Tips13.

13
These data are real, though the name of the restaurant is anonymous. The data were provided by Thomas
J. Kientz, President, Colorado National Bank, Aurora, Colorado, and Sean Schneider.
An Introduction to Regression Analysis Page: 50

Foodservers’ tips in restaurants may be influenced by many factors, including the nature
of the restaurant, size of the party, table location in the restaurant, and so forth. To make
appropriate assignments for the foodservers, restaurant managers need to know what
these factors are. They must avoid either the substance of appearance of unfair treatment
of the foodservers, for whom the tips are a major component of pay.

In one restaurant, a foodserver recorded data on all customers he had served during an
interval of two and a half months in early 1990. This set of data is available in the
appendix. The restaurant, located in a suburban shopping mall, was one of a national
chain and served a varied menu. Pursuant to local law, the restaurant offered seating in a
non-smoking section to patrons who requested it. The data was recorded on those days
and during those times when the foodserver was routinely assigned to work.

Definitions:
TOTBILL : Total bill, including tax, in dollars
TIP : Tip in dollars
SIZE : Size of party
SEX : Sex of persons paying bill (0 = male, 1 = female)
SMOKER : Smoker in party? (0 = no, 1 = yes)
DAY : 3 = Thursday 5= Saturday
4 = Friday 6 = Sunday
TIME : (0 = day, 1 = night)

With this information in mind and the available data perform the following analysis:

1. Create three dummy variables to incorporate the effect of the qualitative variable
“DAY”. How would you define these variables?

2. Estimate the model:

yi = + 1x1i + 2x2i + 3x3i + 4x4i + 5x5i + 6x6i + 7x7i + 8x8i  i

where
y = Tip in dollars (TIP);
x1 = Total bill, in dollars (TOTBILL);
x2 = Size of party (SIZE);
x3 = Sex of persons paying bill (SEX);
x4 = Smoking habits (SMOKER);
x5, x6, x7 = Dummy variables related to the days of the week;
x8 = Time of the day (TIME).

Write out the sample regression equation that expresses y as a function of x 1, x2, x3, x4,
x5, x6, x7, x8. Be sure to write the appropriate standard error below the coefficient in the
equation.
An Introduction to Regression Analysis Page: 51

3. Interpret the coefficients of x1, x2 and x5 in the context of this problem.

4. a) Is there a problem of multicollinearity in this model? Perform a formal test at a 5%


significance level.

b) What are the consequences of the existence of multicollinearity in general?

5. In the above regression it can be verified that the only statistically significant variable
at the 5% significance level is the variable TOTBILL. Drop the insignificant variables
and re-estimate the model:

yi = *+ 1* x1i + t.


where

y = Tip in dollars (TIP);


x1 = Total bill, in dollars (TOTBILL).

(Save the residuals and the fits. You will need them later on.)
Write out the sample regression equation that expresses y as a function of x1. Be sure
to write the appropriate standard error below the coefficient in the equation.

6. Was the decision to drop the additional variables the right one? Perform a test on the
subset of regression parameters at a 5% significance level.

7. Based on the reduced model perform at the 5% significance level, a White’s test for
the existence of heteroskedasticity in the model. In particular, assume that the
variance of the error terms is a function of the expected value of the dependent
variable (i.e., V(i) = f(yi) ).

8. Based on the analysis performed so far are the estimators obtained from this model
BLUE? Explain.

9. If V(i) = 2yi2, describe how you would correct for the problem of
heteroskedasticity. Estimate the corrected model and report the estimates of the
parameters.

10. Summarize the results of your regression analysis. Are there any patterns of which
you think the foodserver or the restaurant manager should be aware of?

10.9. Crime Rate14.

The overall violent crime rate dropped 7.3 percent to its lowest level since 1985. For the
fifth straight year, violent crimes and all the far more numerous property crimes declined.
Crime reached all-time highs in 1991. Since then the crime rate has decreased at an
14
Thid data set was collected by Lisa Garrett.
An Introduction to Regression Analysis Page: 52

increasing rate each year. The FBI reported recently that both murder and robbery rates
reached lows not seen in three decades.

The major decline in crime has occurred in rural areas and in cities with population of
25,000 to 100,000. In metro areas, the crime rate is reducing at a slower pace. In large
cities, as the population increases, the crime rate increases. However, the increase in the
percent of crimes is much lower than before. Officials credit the aging of the population
in large metro areas as one of the factors.

The consensus on the falling crime rate is that there is no singular event, policy
implementation, or social action that can account for the decrease during the last few
years. Individuals and organizations assessing the cause and implications of this decline
are arriving at a unified theory attributing collective efforts and change as the reason.

In an effort to provide some input on this issue, and perhaps single out some factors that
may be contributing towards a reduction in the crime rate, we obtained the following
sample15 from 1996:

Definition of variables:

Total Crime: Total crime rate, per state.


Real GDP: Gross Domestic Product in 1996.
Unemployment: The unemployment rate in 1996.
Total Population: Total population in each state.
%High School Grads: Total percent of high school graduates.
% In Metro: Percent of state populated in metro areas.
Minority %: Percent of African-American and Hispanic.
Rate of Police: Number of policemen per 10,000 residents.

With this information in mind and the available data, perform the following analysis:

Plot the variable Total Crime (y) against % In Metro (x6), and Unemployment (x2).
Comment.

Estimate the basic model:


Total Crime =  + 1* Real GDP + 2* Unemployment + 3 Total Population +
4* %High School Grads + 5* % In Metro + 6* Minority % + 7* Rate of Police.

Write out the sample regression equation and appropriate standard error below the
coefficient in the equation.

Interpret the coefficient of x2, x3, and x6 in the context of this problem.

15
Data collected from: US Census Bureau – Statistical Abstract of the US
An Introduction to Regression Analysis Page: 53

In the above regression it can be verified that at the 5% significance level the variable
“Rate of Police” is statistically insignificant. Drop this variable and re-estimate the
model. (Save the residuals and fits. You will need them later on.)

Write out the new regression equation. Be sure to write the appropriate standard error
below the coefficient in the equation.

Based on the reduced model perform at the 5% significance level, a White’s test for the
existence of hereroskedasticity in the model. In particular, assume that the variance of
the error terms is a function of the expected value of the dependent variable.

10.10 Traffic Accidents16.

The fact that almost 42,000 people were killed last year, innocent victims in violent
deaths and millions more were injured, simply going about their daily business in or near
automobiles raises major concerns across the nation.

Forty-one thousand, nine hundred and seven people were killed in automobile accidents
in 1996, up from the year before, and 3,511,000 were injured. A disaster of this
magnitude needs immediate and drastic action. We need efficient and affordable public
transportation nationwide, safe automobiles, and laws that give law enforcement agencies
the ability to protect and control drivers and passengers on roads and highways.

Moving into the new millennium, it is imperative for governments to better the traffic
conditions. By researching possible factors that affect the occurrence of accidents and
fatalities, we hope to be able provide some input on this issue. The data set included in
the appendix presents information on the total number of crashes and the determinant
factors of automobile accidents in 1996. The variables included are:

Total crashes = The total number of crashes per state in 1996.


Officer/10K = The total number of officer per 10 thousand population.
BAC Limit = The maximum legal blood alcohol limit, .1%=1; .08%=0.
Seat Belt = The nature of seatbelt laws in each state, primary=1; secondary=0.
Max Speed = The maximum interstate speed limit.
Funding = Total federal highway funding in millions of dollars.
Time to work = Average number of minutes required to drive to work.
Pop /sqmi = Total state population per square mile.
%Metro Pop = Percentage of the total state population that live in metropolitan areas.
Gas Tax Rate = Number of cents tax per gallon on gas.
Licensed Drivers = Total licensed drivers in thousands.
10.11 Foreign Direct Investments17.

Globalization has become an important strategy in the business world. Prudent investors
are feeling less and less bound to invest in their home countries with the hope of realizing
16
This data set was collected by Natalie Nicole Johnson and Christopher M. Nedbalek.
17
The data were collected by Hui-Ming Ho and Khairul Anwar Moho Dewan.
An Introduction to Regression Analysis Page: 54

an above average profit. Governments accept foreign investments to help boost their own
economies instead of waiting for investments from their respective citizens. With
investors making decisions on the amount of funds to invest and governments deciding
how much foreign funding is needed to attract business, it is essential for both sides to
identify the factors that really affect the amount of foreign direct investments (FDI) a
country makes. Since U.S. has actively engaged in FDI for many years, its FDI abroad
could be a good starting point for all interested investors and governments. The following
is a list of variables that can be used to study this issue. The data set containing the
relevant information is in the appendix.

Definitions of the variables in the model


FDI The amount that the U.S. companies spent in a specific foreign
country during the year 1998, measured in millions (Mill.) of
dollars.18
GNP per capita/PPP Gross National Product per capita measured at Purchasing Power
Parity of Real Gross National Product converted to 1993 USD by
the purchasing power exchange rate expressed in dollars. 19 It is
also a measure of labor productivity.
R Real interest rate is a foreign country’s nominal interest rate 7
adjusted for its inflation2.
Tax Rate The corporate marginal tax rate2 is the highest tax rate for the
average companies in a foreign country.
20
Political Risk The risk of non-payment or non-service of payment for goods or
services, loans, trade-related finances and dividends and the non-
repatriation of capital, measured in percentage.
Ex. Rate Volatility4 Exchange rate volatility is the volatility of exchange rate for a
foreign country in 1998, expressed in terms of units. It is the
standard deviation of the daily exchange rate in terms of dollars.
Literacy Rate5 Literacy rate is a percentage of people over 15 years old that can
read and write.
Elec. Cons per capita5 Electricity consumption per person in a foreign country, in billion
kilowatts.
Railroad/km2 5 Railroads per square kilometer show the length of railroad for
each squared kilometer expressed in kilometers.
Hway/km2 5 Highways per squared kilometer show the length of highway for
each squared kilometer measured in kilometers.
Airport/1000 km2 5 Airports per thousand kilometers show the number of airports
available within an area of 1000 km2.

18
Sources from World Development Report 1999/2000
19
Sources form CIA World Fact Book 1999
20
Sources: Euromoney
4
Source: http://www.oanda.com/converter/cc_table
5
Source: CIA World Fact Book 1999
6
Source: www.wto.org
7
Source: International Financial Statistic Sept. 1999
An Introduction to Regression Analysis Page: 55

WTO5 1 if the foreign country is a member of the World Trade


Organization (WTO); 0 if otherwise.

1. Estimate the model:

FDI =  + 1 GNP per capita/PPP - 2 R - 3 tax rate + 4 political risk - 5 Literacy


Rate - 6 Exchange Rate Vol. + 7 Elec. Consumption + 8 Railroad - 9 Highway +
10 Airport - 11 WTO + i
Write out the sample regression equation and the appropriate t-statistics below the
coefficients in the equation.

2. Interpret all the coefficients in the context of this problem.

3. Perform the test for multicollinearity. If multicollinearity is present in the model,


what is the consequence? How could you remedy the problem?

4. Perform the test of the significance of each variable, qualitative or quantitative. State
the null and alternative hypothesis, the decision rule, and for each case separately
your decision and conclusion. Based on the results of this hypothesis testing, drop the
insignificant variables.

5. Test whether the variables you dropped are significant as a group.

6. Write out the revised model with the standard errors placed underneath the estimated
coefficients in parenthesis.

7. Redo steps 2 and 4 if your final model is different than the original.

8. Perform a formal test to verify the existence or absence of heteroscedasticity in this


model.

9. Based on the analysis performed so far are the estimators obtained from this model
BLUE? Explain. Are your estimates reliable? If not what can you do to improve upon
your current estimates?

10. Summarize the results of your study.

10.12. Auctions of Construction Contracts.

The design and conduct of auctions has occupied the attention of many people over
thousands of years. One of the earliest reports of an auction dates back to the fifth century
B.C. During the recent years, in the United States, auctions account for an enormous
volume of economic activity. Every week the U.S. Treasury sells a large amount of bills
and notes. The Department of Interior sells mineral rights on federally-owned properties
at auction. Billions of dollars worth of spectrum licenses have being sold by the U.S.
An Introduction to Regression Analysis Page: 56

government since 1994. Auctions are popular means of selling items not only by the
public but also by the private sector. Sellers auction antiques and artwork, flowers and
livestock, publishing rights and timber rights, property, stamps, wine, books and other
items. Recently, it has been very popular to auction off items over the internet. E-Bay is
one of the most popular companies conducting such auctions and the value of its stock
has tripled in the past year.

A large volume of transactions is currently arranged using four common form of


auctions: the English auction, the Dutch auction, the first-price sealed-bid auction and the
second price sealed-bid auction. Whenever a variety of items are placed for sale at the
same auction they are either offered simultaneously or sequentially. The main
considerations for the use of one form or the other, is the potential to maximize the
seller’s revenues and to achieve efficiency. Which form is more appropriate to use in
every occasion depends on the characteristics of the items that are auctioned off.

The Oklahoma Department of Transportation conducts auctions regularly to award


construction and maintenance contracts. Interested bidders can obtain information on the
specification of the contract and submit detailed sealed bids. The contract is awarded to
the lowest bidder. We have obtained a list of contracts awarded by the DOT in Oklahoma
during January of 1997. All the bids and the identity of bidders were made available.
Most contracts are of road-surface related work, painting of bridges and traffic signal
installation. Based on information about the type of projects, the bids, engineering
estimates and required timeframe for completion, we will try to analyze bidding behavior.
The variables used in the analysis are:
Variance of Bids = The variance of all bids submitted for each contract.
Winning Bid = The lowest bid on each contract.
Difference = The difference between the first and the second lowest bid.
Estimate = The engineering estimate on the cost of the project.
# of Bids = The total number of bids on each project.
Days for completion = The total number of days that the contract allows till
completion of the project.

Finally we included categorical variables that describe the type of project (Bridgework,
resurfacing etc.). Based on the information included in the appendix, perform the
following analysis:

Run a regression with the winning bid as the dependent variable and the number of
bidders, type of project and days for completion as the independent variables.

Run a regression with the variance of bids as the dependent variable and the estimate,
the number of bidders and the days for completion as the independent variables.

Run a regression with the difference between the lowest and the second lowest bids as
the dependent variable and the estimate, number of bidders and days for completion
as the independent variables.
An Introduction to Regression Analysis Page: 57

10.13. Research and Development Expenditures

R&D is a key element of a strong technology based economy. It is one of the major
driving forces for economic development in the United States. Through innovation and
technological development, the pharmaceutical, telecommunication, information
technology and aerospace industries have broken new ground. For almost two decades
the U.S. has ranked consistently higher in R&D expenditures than other countries (in
large part due to enormous defense budgets).

Researchers have suggested that the variance of R&D expenditures is an increasing


function of the size of the firms. The bulk of the expenditure on R&D is coming from the
largest firms in the industry with the smaller firms allocating a lower but more uniform
percentage of profits on such expenditures. In order to investigate this claim we collected
data from 18 firms of various sizes.

Definitions:
yi = R&D expenditures in 1989,
x1i = total sales of a firm in the same period,
x2i = total profit of a firm in 1989.

With this information in mind and the available data perform the following analysis:

Estimate the model: yi = + 1x1i + 2x2i


Interpret the coefficients of x1 and x2 in the context of this model.

Is there a problem of multicollinearity in the data? Perform formal tests at a 5%


significance level.

What are the consequences of the existence of multicollinearity in general?

Do you think a decision to drop the insignificant variables is the right one in this model?
Why? Why not?

Plot R&D expenditures as a function of sales and profit. What do you observe? What
problem could you be facing in your data?

On the original model, perform at the 5% significance level, a formal test for the
existence of heteroskedasticity. In particular, assume that the variance of the error terms
is a function of the expected value of the dependent variable (i.e., V(i) = f(yi) ).

Based on the analysis performed so far are the estimators obtained from this model
BLUE? Explain.

If we assume that V(i) = 2yi2, describe how you would correct for the problem of
heteroskedasticity.
An Introduction to Regression Analysis Page: 58

Finally, it has been suggested in class that one way to get rid of the problem of
heteroscedasticity in your data is to transform the variables in the original model in a
logarithmic form. Make this transformation and re-estimate the model. Write out the
sample regression equation that expresses lny as a function of lnx1 and lnx2. Do you still
have the same problem? (Plot lny against lnx1 and lnx2). Is it as severe as in the original
model? Comment.

Interpret the coefficients in the context of this model.

10.14. Economic Cycle and Sunspots21.

The idea that the aggregate economy does not climb steady trend but experiences
occasional booms of activity and recessions is very old. Virtually every economist
recognized the existence of strong fluctuations in the general level of economic activity.
But the idea that it exhibits a regular cyclical pattern, that these fluctuation were recurrent
in a precise periodic way, was only put forward late in the last century by William
Stanley Jevons and Clement Juglar.

W.S. Jevons (1884) related economic cycles to sunspots. He argued that sunspots
affected tangible things such as harvest and/or intangible such as peoples’ mood. These in
turn were creating the fluctuations in economic activity. The cyclical nature of sunspots
could be employed to explain the existence of economic cycles.

To analyze this theory, we will use data on U.S. industrial production data (IP) and
number of sunspots (SPOT) for 120 months (from 1971.1 to 1975.12).
Based on this information perform the following analysis:

1. Estimate the equation:

IPt =  +  SPOTt + ut (1)


where ut is the error term.

2. How large is the t-statistics? Does this provide support for the sunspots theory?

3. Plot the residuals and produce a scatter diagram of IP vs. SPOT. Does it seem to you
that there might be a serial correlation problem in this case? Explain!!

4. Test for the existence of first-order serial correlation using Durbin-Watson statistics.

5. How would you correct for the problem of autocorrelation in this model?

6. One explanation for the high t-statistics in (1) and the low DW statistics is the
omission of variables. Let’s proxy factor input (Materials, Labor, and Capital) using a

21
We thank Sangeeta Bishop for providing this data set to us.
An Introduction to Regression Analysis Page: 59

time trend and ask whether including this trend in the regression changes the results.
Estimate the following equation:

IPt = * + * SPOTt + 2* T + ut (2)


where T is a linear time trend.

7. Has this eliminate the problem? Is the coefficient on spot still significant? Is serial
correlation still a problem?

8. Perform the hypothesis testing where null hypothesis is that sunspots do not affect
industrial production.

10.15. Moving Averages in Stock Prices22.

Dell Computer Corporation, headquartered in Round Rock, Texas, near Austin, is the
world's leading direct computer systems company. Company revenue for the last four
quarters totaled $23.6 billion. Dell is the No. 2 and fastest growing among all major
computer systems companies worldwide, with more than 33,200 employees around the
globe. The company ranks No. 1 in the United States, where it is a leading supplier of
PCs to business customers, government agencies, educational institutions and consumers.

The company was founded in 1984 by Michael Dell, now the computer industry's
longest-tenured chief executive officer, on a simple concept: that by selling personal
computer systems directly to customers, Dell could best understand their needs, and
provide the most effective computing solutions to meet those needs. Today, Dell is
enhancing and broadening the fundamental competitive advantages of the direct model
by increasingly applying the efficiencies of the Internet to its entire business

In this study, we will use the stock prices of DELL from January 16, 2001 to April 16,
2001. Based on this information perform the following analysis:

1. Plot the observations of the stock price.

2. Arrange the observations in ascending order to find the median and the number of
runs R, in the data set.

3. Use the large-sample variant of the Runs test to test this series for randomness against
the alternative of non-randomness.

4. Compute a simple centered 5-point, 13-point, and 25-point moving average series for
the Dell stock price.

22
The data for this case came from Yahoo Finance. The data set was collected by Kimberly Maggi.
An Introduction to Regression Analysis Page: 60

10.16. Exxon23.

Exxon corporation, one of the world’s first multinational companies, traces its roots to
John D. Rockefeller. Exxon is engaged in the exploration, production, manufacture,
transportation and sale of crude oil, natural gas, and petroleum products. Exxon has a
business presence on every continent except Antarctica.

We will use Exxon’s monthly stock price over the past five years (April, 1996 to March,
2001) to illustrate the seasonal component found in the oil and gas industry. Basically, we
assume seasonal activities such as the payment of dividends; earnings announcements
and placement of orders affect the stock price. We will incorporate dummy variables to
represent the quarters in which seasonal components are present.

1. Plot the original data on a time series graph with the stock price as the dependent
variable (Y).

2. Run a regression with the stock price as the dependent variable (Y t) and time (t) as
the independent variable.

In order to incorporate seasonal fluctuations in the price of stock introduce the following
three dummy variables:
Q2 =1 if in 2nd quarter (i.e., months 4-6)
= 0 otherwise
Q3 =1 if in 3rd quarter (i.e., months 7-9)
= 0 otherwise
Q4 =1 if in 4th quarter (i.e., months 10-12)
=0 otherwise

3. Seasonal differences in the price of stock may be due to placement of orders in the
second quarter and earnings announcements as well as dividends payout in the
fourth quarter. Test the significance of seasonal fluctuations. State the null and
alternative hypothesis, the decision rule, and for each case separately state your
decision and conclusion. Based on the results of this hypothesis testing what is your
conclusion about the effect of these factors upon the price of stock?

4. Drop any insignificant variables and rerun the regression.


10.17. Starbucks24.

Starbucks Coffee brought new meaning to the phrase “Let’s go for coffee.” Starbucks is
the leading retailer, roaster and brand of specialty coffee in the world. They purchase,
roast and sell high quality whole bean, rich-brewed coffees, Italian-style beverages and a
variety of pastries. Starbucks has a presence in North America, the United Kingdom, the
Pacific Rim and the Middle East.

23
The data for this case came from Yahoo Finance. The data set was originally collected by Kimberly
Maggi.
24
The data for this case came from Yahoo Finance. The data set was collected by Kimberly Maggi.
An Introduction to Regression Analysis Page: 61

We will use the Starbucks’ monthly stock price for the past five years (1996-2001) to
illustrate the seasonal index method. Basically, we assume that for any given month, in
each year, the effect of seasonality is to raise or lower the observations by a constant
proportionate amount, compared with what they would have been in the absence of
seasonal influences.

With this information in mind and the available data perform the following analysis:

1. Plot the original data using the time series plot with the stock price as the independent
variable (Y).

2. Compute a simple centered 5-point moving average for the stock price and store the
moving averages as Xt*.

3. Find Xt as a percentage of Xt* for each month.

4. Calculate the seasonal index.

5. Find the seasonally adjusted series.

6. Using the original data, estimate the autoregressive models of orders 1 through 3:

X1 =  +  1 X t – 1 + a t
X2 =  +  1 X t – 1 +  2 X t –2 + a t
X3 =  +  1 X t –1 +  2 X t – 2 +  3 X t – 3 + a t
Where: ,  1,  2,  3 are autoregressive parameters and a t is a random variable
that has mean zero and constant variance for all t.

7. For each model test the hypothesis that the last autoregressive parameter is
insignificant (Ho:  p = 0; H1:  p  0) starting from the third order autoregressive
model.
An Introduction to Regression Analysis Page: 62

CHAPTER 11

HYPOTHESIS TESTING

When you are testing your hypothesis, you have to be explicit with your procedures.
Your null and alternative hypotheses, decision rule, decision and conclusion have to be
clearly stated. In this chapter, we put together, for your convenience, the tests that are
most frequently performed in this course.

Testing the Significance of an Independent Variable.

There are two ways you can test the significance of an independent variable: one is to use
the t-test (as outline below) or you can use the p-value. Remember that p-value is the
smallest significance level which allows you to reject your null hypothesis. MINITAB
automatically computes the p-value of a two tail test based on your data. One advantage
of using the p-value for your hypothesis testing is that you can make your decision
without using a table. The decision rule for a two-tail test is very simple when you use the
p-value. The two ways to test the significance of an independent variable in the model are
described below:

To test H0: i = 0
H1: i  0

D.R.: Reject H0 if |t| > tn-k-1, /2


or
D.R.: Reject Ho if p-value  .

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that the independent variable does not linearly affect the dependent variable against the
alternative that the independent variable does influence the dependent variable.

Testing the Significance of a Group of Independent Variables.

MINITAB automatically computes the F-ratio and p-value associated with the hypothesis
testing on the significance of a group of independent variables. This numbers are reported
in the ANOVA table, so it is not necessary to calculate the F-ratio using the formula you
learned in class.

(i) Using F-ratio:

To test H0: 1 = 2 = 3 = … = k = 0
H1: At least one of the independent variables  0
D.R.: Reject H0 if F = (SSR/k) / (SSE/(n-k-1)) > F k,,n-k-1,
An Introduction to Regression Analysis Page: 63

(ii) Using p-value:

To test H0: 1 = 2 = 3 = … = k = 0
H1: At least one of the independent variables  0

D.R.: Reject H0 if p-value   = 0.05.

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that taken as a group, the independent variables do not linearly affect the dependent
variable against the alternative that at least one influences the dependent variable.

Testing the Significance of a Group of Dropped Independent Variables.

When you decide to drop a subset of independent variables from your model you have to
test and see if those variables were significant as a group. If it turns out that they had a
significant impact on the dependent variable as a group then you have to re-evaluate your
decision to drop them from the model.

Once again, there are two ways to perform the test. You use either the SSE or the R 2. The
values SSE* and R*2 are the error sum of squares and the coefficient of determination
respectively of the revised regression.

To test H0: 1 = 2 = 3 = … = k1 = 0


H1: At least one  0

D.R.: Reject H0 if F > Fk1,n-k-1,

(i) Using SSE:

SSE *  SSE  n  k  1
F  k 
SSE  1 

(ii) Using R2:


R 2  R*2  n  k  1
F  
1  R 2  k1 

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that the subset of independent variables taken as a group do not linearly affect the
dependent variable against the alternative that they have some effect.

Testing for Multicollinearity.


An Introduction to Regression Analysis Page: 64

You have a problem of multicollinearity when some of your independent variables are
related to one another. To test for the presence of multicollinearity we have to test how
significant the relationship is between pairs of independent variables. If the correlation
coefficient between any pair of independent variables is significantly different from zero,
then your model will have multicollinearity problem.

There are two ways you can test for the presence of multicollinearity: (i) t-ratio or (ii) p-
value.

(i) t-ratio:

To test H0: ij = 0


H1: ij  0
r
D.R Reject H0 if t  t n  2, / 2
2
(1  r ) /( n  2)

(ii) p-value:
To test H0: ij = 0
H1: ij  0

D. R.: Reject H0 if p-value  .

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that the independent variables, xi and xj are/are not collinear. Therefore the model
does/does not suffer from multicollinearity.

Testing the Presence of Heteroscedasticity.

You need to run an auxiliary regression to obtain the appropriate R 2 to test for the
presence of heteroscedasticity.

To test H0: var (i) = 2; (no heteroscedasticity)


H1: var (i) = f(yi); (heteroscedasticity)

D.R.: Reject H0 if nR2A > X21,.

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that the variance of the error terms is constant, against the alternative that the variance
depends on the expected value of the dependant variable.
Testing the Presence of Positive Autocorrelation.
An Introduction to Regression Analysis Page: 65

To test for the presence of positive autocorrelation, we use the residuals from the
regression to estimate the Durbin-Watson statistic.

To test H0:  = 0; (no autocorrelation)


H1:  > 0; (positive autocorrelation)

D.R.: Reject H0 if d < dL


Fail to Reject H0 if d > dU
Inconclusive if dL < d < dU

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that the error terms are not correlated, against the alternative that there is positive
autocorrelation.

Testing the Presence of Negative Autocorrelation.

To test H0:  = 0; (no autocorrelation)


H1:  < 0; (negative autocorrelation)

D.R.: Reject H0 if 4-d < dL


Fail to Reject H0 if 4-d > dU
Inconclusive if dL < 4-d < dU.

Conclusion: At a 100 % significance level, we reject/fail to reject the null hypothesis


that the error terms are not correlated, against the alternative that there is negative
autocorrelation.

The Runs Test

One sided:

H0: Series is random


H1: Positive association between adjacent observations
R  ( n / 2)  1
  z
n 2  2n
DR: Reject the Ho if 4( n  1)

Conclusion: At an % significance level, we reject/fail to reject the null hypothesis of


randomness against the alternative of positive association between observations that are
adjacent in time.

Two sided:
An Introduction to Regression Analysis Page: 66

Ho: Series is random


H1: Some association between adjacent observations

R  ( n / 2)  1
 z / 2
n 2  2n
4( n  1)

DR: Reject the Ho if

Conclusion: At a % significance level, we reject/fail to reject the null hypothesis of


randomness against the alternative of some association between observations that are
adjacent in time.

Interpretation of Confidence Intervals.

If you construct a lot of confidence intervals in the same fashion, using different samples
with the same number of observations, 100(1-) % of the intervals will contain the true
population parameter with probability one.
An Introduction to Regression Analysis Page: 67

CHAPTER 12

USEFUL FORMULAS

n n

cov( x, y )
 (xi 1
i  x )( yi  y ) x y
i 1
i i nx y
r  
SxSy n n n n

 (x
i 1
i  x ) . ( y i  y )
2

i 1
2
( xi2  nx 2 )( yi2  ny 2 )
i 1 i 1

r
t
1 r2 ti 
bi   i
S x2 
 (x i  x)2

x 2
i
 nx 2
S bi n 1 n 1
n2

n n

S cov( x, y )  ( xi  x )(yi  y ) x y i i nx y


r b x b  i 1
 i 1
Sy S 2x n n

 (x
i 1
i  x) 2
x
i 1
2
i
 nx 2
a  y  bx

SST  SSR  SSE


SST   ( y i  y ) 2   y i2  ny 2
SSR   ( yˆ  y )  b( xy  nx y )
i
2

SSE   e   y  a  y  b  x y
i
2 2
2 i i i

S x2 SSR SSE SSE /( n  k  1) k


R2  b2 2
  1 R 2  1  R2  (1  R 2 )
S y SST SST SST /( n  1) n  k 1

SSE S e2 S e2
S 
2
S 
2

 ( xi  x ) 2  x 2i  nx 2
e b
n  k 1

b  t n  2 , / 2 .S b  2  R2 A
yˆ n 1  a  bx n 1
An Introduction to Regression Analysis Page: 68

 1 ( xn1  x ) 2  2  1 (x  x)2  2
yˆ n1  t n2, / 2 1  _ 2 2  S e yˆ n 1  t n  2, / 2  _ n 12 S
2  e

 n  xi  nx 
 n  xi  nx 

SSR / k n  k 1 R2 ( SSE *  SSE ) / k1 n  k  1 R 2   R * 2


F  F 
SSE /( n  k  1) k 1 R2 SSE /( n  k  1) k1 1 R2

d
 (e  e t t 1 )2
 2(1  r ) r  1
d
hr
n
e t
2
2 1  ns c2

  P1i    P1i    q0i P1i 


Price indices: 100  , 100   / k, 100  .
P  P   q P 
 0i   0i   0i 0i 

  q1i P0i 
Quantity indices: 100  
 q P 
 0i 0i 

Moving Averages:

x
j m
t j
where t  m  1,..., n  m
x 
*
t
2m  1

S/2

x t j
j   ( S / 2 ) 1 where t 
S S
,  1,...., n 
S
x *
t  0.5  2 2 2
S

xt*0.5  xt*0.5 S S
xt*  where t   1,...., n 
2 2 2

Autoregressive model of order p


An Introduction to Regression Analysis Page: 69

xt    1 xt 1   2 xt  2  ...   p xt  p  at
ˆ p
Z  ; xˆ n  h  ˆ  ˆ1 xˆ n  h 1  ...  ˆ p x n  h  p (h  1,2,3,...)
sp
An Introduction to Regression Analysis Page: 70

CHAPTER 13

REGRESSION ANALYSIS IN MICROSOFT EXCEL

General guidelines

In order to perform regression analysis in Microsoft Excel, you have to use the “Data
Analysis” tool available in Microsoft Office 97 or Office 2000. The “Data Analysis” tool
is not the default in Microsoft Excel system, so you need to activate it first. To check
whether you already have this tool in your Excel program or not, you can go to the
“Tools” main menu option and see whether you can find the “Data Analysis” option. If
you can find it go directly to step 2, otherwise, go to step 1 to activate the “Data
Analysis” tool in Excel.

Step 1.

Go to “Tools” menu and click on “Add-Ins”.


There will be a pop up window (Figure 1) asking you to check the Add-Ins you need to
activate. Check the box of the option “Analysis ToolPak” and “Analysis ToolPak-
VBA” and then click “OK”. The computer may ask you to put the Microsoft
Office/Excel CD installation in your CD driver.

Figure 1.

Go back to the “tools” menu and check once more whether the “Data Analysis” option
has been installed.
An Introduction to Regression Analysis Page: 71

Step 2

Select “Tools” and then “Data Analysis”. Select “Regression” from the list of “Analysis
Tools” and the click OK. You will get the Regression dialog box as it appears in Figure
2. In order to perform regression analysis you have to mark some important edit and
check boxes. Here are some of the things you need to check:

Input Y Range. You need to enter the range for the dependent variable in this edit box.
Another way to enter the data on the dependent variable is highlighting the column
where you have the dependent variable data.
Input X Range. You need to enter the range for the independent variable in this edit box.
Again, you can also enter the independent variable by highlighting the column where
you have the independent variable data. When you do multiple regression analysis
you need to highlight the columns that correspond to all you independent variables at
once. Highlight all your data including the name of the variable and check on the box
“label” before you do further analysis.
New Worksheet Ply. You need to select the check box and enter some name in the edit
box to have the Regression analysis output in a different worksheet under a different
name.
Residuals and Residuals Plot. If you need to test for the problems of heteroscedasticity
and autocorrelation you need to check these two boxes. Excel will automatically
produce the plot of residuals and give the residuals and estimated values of the
dependent variable in the regression.
Line Fit Plots. Select this option if you wish to obtain the scatter diagram with a fitted
regression line in a simple regression analysis.

Figure 2.
An Introduction to Regression Analysis Page: 72

Using Microsoft Excel to test for autocorrelation.

The easiest way to detect the problem of autocorrelation in a set of data is to plot the
residuals through time. Go back to step 2 to recall how to produce the plot of the
residuals. The formal way to test the existence of autocorrelation is using the Durbin-
Watson statistic. The Durbin-Watson statistic (D) is defined as follows:
n

 (e t  et 1 ) 2
D t 2
n

e
t 1
2
t

Unfortunately, Excel does not produce this statistic automatically. To generate the
Durbin-Watson statistic you need to calculate its value from the column of residuals.
An Introduction to Regression Analysis Page: 73

APPENDIX

DATA SETS FOR THE CASE STUDIES

Case Study 1.
Worldwide Fertility Rate.

Tot Fert Inf Mort Gini


Country Rate Rate %Urban %F in LF Index TV Sets/K GNP/PPP Life Exp
Algeria 3.5 34 56 24 35.3 68 19.6 70
Argentina 2.7 22 88 31 47.5 347 30.8 73
Armenia 1.8 16 69 48 39.39 216 8.4 71
Australia 1.9 6 85 43 33.7 666 70.2 77
Austria 1.5 6 56 41 23.1 493 78.8 77
Azerbaijan 2.3 25 56 44 48 212 5.4 70
Bangladesh 3.5 79 18 42 28.3 7 5.1 58
Belarus 1.4 13 71 49 21.6 292 15.6 70
Belgium 1.6 8 97 40 25 464 80.3 77
Benin 6 95 42 48 44 73 6.5 50
Bolivia 4.5 69 58 41 42 202 9.4 60
Botswana 4.4 56 31 46 54.21 27 20.7 68
Brazil 2.4 44 78 35 60.1 289 20 67
Bulgaria 1.2 15 71 48 30.8 361 16.6 71
Burkina Faso 6.7 99 27 47 39 6 2.9 49
Cameroon 5.7 56 45 38 49 75 7.8 57
Canada 1.7 6 77 45 31.5 709 78.3 78
Central African Rep. 5.1 98 39 47 55 5 4 48
Chad 5.9 117 21 44 35 2 2.6 48
Chile 2.3 12 86 32 56.5 280 35.3 72
China 1.9 34 30 45 41.5 252 10.8 69
[China] Hong Kong 1.2 5 95 37 45 388 85.1 79
Colombia 2.8 26 73 37 57.2 188 22.7 70
Congo Republic 6 90 59 43 41 8 7.6 51
Costa Rica 2.8 13 50 30 47 220 21.7 77
Cote d'Ivoire 5.3 86 44 33 36.9 60 5.9 55
Czech Rp. 1.3 8 65 47 26.6 406 36.2 73
Denmark 1.8 6 85 46 24.7 533 78.7 75
Dominican Rep. 2.9 37 65 29 50.5 84 14.3 71
Ecuador 3.2 36 58 26 46.6 148 15.6 69
Eqypt 3.4 56 45 29 32 126 14.2 63
El Salvador 3.7 36 45 34 49.9 250 9.7 67
Estonia 1.3 14 73 49 39.5 449 15.6 70
Ethiopia 7 112 13 41 32.42 4 1.7 49
Finland 1.8 5 63 48 25.6 605 65.8 76
France 1.7 6 73 44 32.7 598 78 78
Germany 1.2 6 87 42 28.1 493 74.4 76
Ghana 5.1 73 36 51 33.9 41 7.4 59
An Introduction to Regression Analysis Page: 74

Greece 1.4 8 65 36 37 442 43.4 78


Guatemala 4.7 44 42 26 59.6 122 12.4 66
Haiti 4.4 72 32 43 40.2 5 3.4 57
Honduras 4.6 45 48 30 53.7 80 3.9 67
Hungary 1.6 11 65 44 27.9 444 23.8 70
India 3.2 68 27 32 29.7 64 5.2 62
Indonesia 2.7 51 34 40 34.2 232 14.1 64
Ireland 1.9 6 98 53.1 35.9 77 58.1 77
Italy 1.2 7 66 38 31.2 436 73.7 78
Jamaica 2.4 13 55 46 41.1 326 82 80
Japan 1.5 4 78 41 35 700 13.1 74
Jordan 4.8 31 72 21 43.4 175 15.1 70
Kazakhstan 2.3 27 60 47 32.7 275 11.2 69
Kenya 4.7 58 28 46 57.5 19 5.1 58
Korea Republic 1.8 10 81 40 31.4 326 42.4 72
Kyrgyz Republic 3.3 30 39 47 35.3 238 6.7 68
Latvia 1.3 16 73 50 27 598 12.5 69
Lesotho 4.6 76 23 37 56 13 6.6 61
Lithuania 1.5 14 72 48 33.6 376 15.3 69
Madagascar 5.8 89 27 45 43.4 24 2.4 52
Malaysia 3.4 42 54 37 48.4 228 33.4 71
Mali 6.8 123 27 46 54 11 2 50
Mauritania 5.2 96 54 44 42.4 82 5.7 51
Mauritius 2.2 16 41 32 36.69 219 49 71
Mongolia 3.4 55 60 46 33.2 63 7.2 65
Morocco 3.4 55 49 35 39.2 145 12.4 65
Nepal 5.3 91 14 40 36.7 4 4.3 55
Netherlands 1.6 6 89 40 31.5 495 73.9 78
New Zealand 2.1 7 84 44 40.2 517 60.6 76
Nicaragua 4.1 46 62 36 50.3 170 7.4 68
Niger 7.4 119 23 44 36.1 23 2.8 47
Nigeria 5.5 80 39 36 45 55 4.5 53
Norway 1.9 5 73 46 25.2 569 81.3 78
Pakistan 5.2 90 35 26 31.2 24 8.3 60
Panama 2.7 23 56 34 56.8 229 22.2 73
Papua New Guinea 4.8 64 16 42 50.9 4 9 57
Paraguay 4 41 54 29 59.1 144 13.5 68
Peru 3.1 47 72 29 44.9 142 14 66
Philippines 3.7 39 53 37 42.9 125 10.6 66
Poland 1.6 14 65 46 27.2 418 20 70
Portugal 1.4 7 36 43 35.63 367 47 75
Romania 1.4 23 55 44 25.5 226 16.2 70
Russian Federation 1.4 18 73 49 31 386 16.6 65
Senegal 5.7 62 42 42 54.1 38 6.6 50
Sierra Leone 6.5 179 39 36 62.9 17 2.2 40
Singapore 1.7 4 100 38 39 361 84.4 76
Slovak Republic 1.5 11 59 48 19.5 384 13.4 72
South Africa 3.9 50 51 37 58.4 123 18.6 64
Spain 1.2 7 77 36 32.5 509 53.8 77
Sri Lanka 2.3 16 22 35 30.1 82 12.1 72
An Introduction to Regression Analysis Page: 75

Sweden 1.7 4 83 48 25 476 68.7 79


Switzerland 1.5 6 61 40 36.1 493 95.9 78
Thailand 1.8 35 20 46 46.2 167 28 69
Togo 6.4 88 31 40 33.8 14 4.2 56
Trinidad & Tobago 2.1 13 68 36 41.7 318 31.9 72
Tunisia 2.9 39 57 30 40.2 156 18.5 69
Turkey 2.7 48 70 35 44 309 20.7 67
Uganda 6.7 98 12 48 40.8 26 5.5 42
Ukraine 1.5 15 70 49 25.7 341 8.9 69
United Kingdom 1.7 6 90 43 32.6 612 71.4 77
United States 2.1 8 76 46 40.1 806 100 77
Uruguay 2.2 18 90 40 42 305 24.6 73
Venezuela 3.1 23 93 33 46.8 180 29.3 71
Zambia 5.7 109 45 51 46.2 80 3.5 45
Zimbabwe 3.8 55 32 44 56.8 29 7.5 57
An Introduction to Regression Analysis Page: 76

Case Study 2.
Norman Housing Prices.

No Asking Price Price Beds Living Full Half AGE Age 0 Age 3 Age 10
Areas Baths Baths
1 34950 34950 3 1 2 0 30 0 0 0
2 34950 34950 3 1 2 0 30 0 0 0
3 79900 72500 2 1 2 1 20 0 0 0
4 47500 41000 2 1 1 0 20 0 0 0
5 189900 185000 2 2 2 0 10 0 0 1
6 29000 29000 3 1 1 0 60 0 0 0
7 39500 37000 2 1 1 0 60 0 0 0
8 139900 130000 5 3 2 0 61 0 0 0
9 34500 34500 2 1 1 0 60 0 0 0
10 67000 68000 2 1 2 0 10 0 0 1
11 71000 71000 3 1 2 0 0 1 0 0
12 82500 79000 3 1 2 1 3 0 1 0
13 89900 89900 3 1 2 0 10 0 0 1
14 98500 93500 4 1 2 0 10 0 0 1
15 103900 101000 3 1 2 0 20 0 0 0
16 116000 114000 4 1 2 0 3 0 1 0
17 120000 120000 3 1 2 0 0 1 0 0
18 89900 89900 3 1 2 0 60 0 0 0
19 92500 91500 3 2 2 0 30 0 0 0
20 174500 169900 3 2 2 1 60 0 0 0
21 226000 219000 4 3 4 0 60 0 0 0
22 34900 34900 3 1 2 0 30 0 0 0
23 44500 44500 3 1 2 0 20 0 0 0
24 64950 58900 3 1 2 0 30 0 0 0
25 78900 78900 4 1 2 0 0 1 0 0
26 79900 79200 3 1 2 0 10 0 0 1
27 91900 87500 3 1 2 0 3 0 1 0
28 169000 165000 4 2 2 1 60 0 0 0
29 52000 48000 3 1 1 0 30 0 0 0
30 52500 50000 3 1 1 0 30 0 0 0
31 60000 60000 2 2 1 1 60 0 0 0
32 67500 67500 3 1 2 0 20 0 0 0
33 81500 79500 3 2 1 1 60 0 0 0
34 84500 81000 3 2 1 0 60 0 0 0
35 83000 83000 3 1 2 0 10 0 0 1
36 94900 94000 3 1 2 0 10 0 0 1
37 98585 97722 3 1 2 0 0 1 0 0
38 109000 105000 5 2 2 1 60 0 0 0
39 107500 107500 3 1 2 0 20 0 0 0
40 119900 117000 2 1 2 0 3 0 1 0
41 124900 124900 3 2 2 0 0 1 0 0
42 125000 125000 3 2 2 0 60 0 0 0
43 139900 135000 4 1 2 0 60 0 0 0
44 255000 248000 3 2 2 1 20 0 0 0
45 53900 49000 3 2 1 0 60 0 0 0
An Introduction to Regression Analysis Page: 77

46 67000 65737 4 1 2 0 30 0 0 0
47 73900 73500 3 1 2 0 10 0 0 1
48 47950 45000 3 1 2 0 20 0 0 0
49 48500 47000 3 1 1 1 20 0 0 0
50 99000 95000 3 2 2 0 20 0 0 0
51 88500 88000 3 1 2 0 20 0 0 0
52 88500 88500 3 1 2 0 10 0 0 1
53 93500 93500 3 1 2 0 20 0 0 0
54 109500 106000 3 2 2 0 10 0 0 1
55 109900 107000 3 1 2 0 3 0 1 0
56 115000 111500 3 1 2 0 20 0 0 0
57 129900 126000 3 2 2 0 30 0 0 0
58 147900 143500 3 2 2 0 10 0 0 1
59 149900 149900 3 2 2 0 10 0 0 1
60 162900 161250 4 2 2 1 3 0 1 0
61 162000 162000 4 1 2 0 0 1 0 0
62 226500 217500 4 2 4 0 10 0 0 1
63 295000 281000 4 3 4 1 10 0 0 1
64 95997 95997 3 1 2 0 0 1 0 0
65 119637 119637 4 1 2 0 0 1 0 0
66 59400 52000 3 1 1 1 60 0 0 0
67 59900 60000 2 2 0 1 60 0 0 0
68 80400 76400 3 1 2 0 20 0 0 0
69 119900 117500 3 2 2 0 20 0 0 0
70 122900 118500 3 1 2 0 10 0 0 1
71 118500 119500 3 1 2 0 0 1 0 0
72 149900 144900 3 2 2 0 20 0 0 0
73 189900 188900 4 1 3 0 0 1 0 0
74 204900 207500 4 2 3 0 0 1 0 0
75 209900 209900 4 2 3 0 0 1 0 0
76 212000 210000 3 2 2 1 10 0 0 1
77 225000 223000 4 2 3 0 0 1 0 0
78 89900 87500 3 1 1 1 60 0 0 0
79 89900 89900 4 2 2 1 61 0 0 0
80 149500 147000 4 3 3 0 20 0 0 0
81 85000 75000 3 2 2 0 30 0 0 0
82 385000 375000 4 3 3 1 20 0 0 0
83 86900 85500 3 2 2 0 60 0 0 0
84 122650 114000 4 2 2 1 20 0 0 0

Continues…
An Introduction to Regression Analysis Page: 78

No Age Age Age Garage ZONE ZONE Zone 1 Zone 2 Zone 3 Zone 4 SQ FT DAYS ON
20 30 60 (ORIG) MARKET
1 0 1 0 0 CSE 1 1 0 0 0 1150 156
2 0 1 0 0 CSE 1 1 0 0 0 1150 137
3 1 0 0 2 NWI 2 0 1 0 0 1540 293
4 1 0 0 1 CSE 1 1 0 0 0 950 244
5 0 0 0 2 SWI 2 0 1 0 0 2250 27
6 0 0 1 1 CCE 1 1 0 0 0 1020 29
7 0 0 1 1 CCE 1 1 0 0 0 820 107
8 0 0 0 2 CCE 1 1 0 0 0 2980 54
9 0 0 1 0 CNE 1 1 0 0 0 720 1
10 0 0 0 2 CNE 1 1 0 0 0 1250 35
11 0 0 0 2 CNE 1 1 0 0 0 1130 260
12 0 0 0 2 CNE 1 1 0 0 0 1570 111
13 0 0 0 2 CNE 1 1 0 0 0 1770 308
14 0 0 0 2 CNE 1 1 0 0 0 1700 76
15 1 0 0 2 CNE 1 1 0 0 0 1940 55
16 0 0 0 3 CNE 1 1 0 0 0 1810 144
17 0 0 0 2 CNE 1 1 0 0 0 530 5
18 0 0 1 2 CNW 2 0 1 0 0 1600 3
19 0 1 0 2 CNW 2 0 1 0 0 1790 214
20 0 0 1 2 CNW 2 0 1 0 0 2800 109
21 0 0 1 0 CNW 2 0 1 0 0 4200 93
22 0 1 0 0 CSE 1 1 0 0 0 1150 118
23 1 0 0 2 CSE 1 1 0 0 0 1250 28
24 0 1 0 2 CSE 1 1 0 0 0 1700 84
25 0 0 0 2 CSE 1 1 0 0 0 1400 1
26 0 0 0 2 CSE 1 1 0 0 0 1650 10
27 0 0 0 2 CSE 1 1 0 0 0 1710 78
28 0 0 1 2 CSE 1 1 0 0 0 2630 41
29 0 1 0 1 CSW 2 0 1 0 0 990 5
30 0 1 0 1 CSW 2 0 1 0 0 850 32
31 0 0 1 2 CSW 2 0 1 0 0 1350 389
32 1 0 0 2 CSW 2 0 1 0 0 1230 29
33 0 0 1 2 CSW 2 0 1 0 0 1550 55
34 0 0 1 0 CSW 2 0 1 0 0 1750 7
35 0 0 0 2 CSW 2 0 1 0 0 1500 108
36 0 0 0 2 CSW 2 0 1 0 0 1620 122
37 0 0 0 2 CSW 2 0 1 0 0 1550 235
38 0 0 1 0 CSW 2 0 1 0 0 2280 56
39 1 0 0 2 CSW 2 0 1 0 0 2170 117
40 0 0 0 2 CSW 2 0 1 0 0 1720 603
41 0 0 0 2 CSW 2 0 1 0 0 1960 162
42 0 0 1 2 CSW 2 0 1 0 0 2000 3
43 0 0 1 0 CSW 2 0 1 0 0 2320 240
44 1 0 0 3 CSW 2 0 1 0 0 2930 33
45 0 0 1 0 MOR 3 0 0 1 0 1150 5
46 0 1 0 2 MOR 3 0 0 1 0 1500 93
47 0 0 0 2 MOR 3 0 0 1 0 1450 25
48 1 0 0 1 NOB 5 0 0 0 0 1600 15
An Introduction to Regression Analysis Page: 79

49 1 0 0 1 NOB 5 0 0 0 0 1240 111


50 1 0 0 2 NOB 5 0 0 0 0 1850 30
51 1 0 0 2 NWI 2 0 1 0 0 1450 60
52 0 0 0 2 NWI 2 0 1 0 0 1560 161
53 1 0 0 2 NWI 2 0 1 0 0 1540 1
54 0 0 0 2 NWI 2 0 1 0 0 1890 45
55 0 0 0 2 NWI 2 0 1 0 0 1740 34
56 1 0 0 2 NWI 2 0 1 0 0 2380 68
57 0 1 0 2 NWI 2 0 1 0 0 1980 141
58 0 0 0 2 NWI 2 0 1 0 0 2070 42
59 0 0 0 2 NWI 2 0 1 0 0 2150 13
60 0 0 0 2 NWI 2 0 1 0 0 2370 64
61 0 0 0 3 NWI 2 0 1 0 0 2250 238
62 0 0 0 2 NWI 2 0 1 0 0 2810 17
63 0 0 0 2 NWI 2 0 1 0 0 3750 60
64 0 0 0 2 OCO 4 0 0 0 1 1540 187
65 0 0 0 3 OCO 4 0 0 0 1 2000 92
66 0 0 1 0 OKC 4 0 0 0 1 1570 149
67 0 0 1 2 OKC 4 0 0 0 1 1300 72
68 1 0 0 2 SWI 2 0 1 0 0 1220 95
69 1 0 0 2 SWI 2 0 1 0 0 1960 32
70 0 0 0 2 SWI 2 0 1 0 0 1800 76
71 0 0 0 2 SWI 2 0 1 0 0 1600 99
72 1 0 0 2 SWI 2 0 1 0 0 2420 217
73 0 0 0 3 SWI 2 0 1 0 0 2390 156
74 0 0 0 3 SWI 2 0 1 0 0 2570 145
75 0 0 0 3 SWI 2 0 1 0 0 2640 65
76 0 0 0 2 SWI 2 0 1 0 0 2600 94
77 0 0 0 3 SWI 2 0 1 0 0 2830 162
78 0 0 1 2 CNE 1 1 0 0 0 1450 17
79 0 0 0 2 CNE 1 1 0 0 0 1920 88
80 1 0 0 2 ECC 1 1 0 0 0 2810 42
81 0 1 0 2 MCC 5 0 0 0 0 1590 41
82 1 0 0 2 NWI 2 0 1 0 0 4500 0
83 0 0 1 0 RNE 1 1 0 0 0 1790 17
84 1 0 0 0 RSE 1 1 0 0 0 2370 22
An Introduction to Regression Analysis Page: 80

Case Study 3.
Apartment Hunting.

Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13


345 600 1 1 1 1 1 1 0 0 1 1 14 0
471 800 2 2 1 1 1 1 0 0 1 1 14 0
335 540 1 1 0 0 1.5 0 0 1 1 0 29 0
375 832 2 1.5 0 0 1.5 0 0 1 1 0 29 0
595 1350 3 3 0 0 1.5 0 0 1 1 0 29 0
800 1400 4 2 1 0 1.5 0 0 1 1 0 5 0
340 550 1 1 0 0 1.5 0 0 1 1 0 5 0
360 681 1 1 0 0 1.5 0 0 1 1 0 5 0
370 750 1 1 0 0 1.5 0 0 1 1 0 5 0
450 831 2 1.5 0 0 1.5 0 0 1 1 0 5 0
600 1300 3 2 1 0 1.5 0 0 1 1 0 5 0
375 610 1 1 0 0 1.5 0 0 1 1 0 5 0
460 1000 2 2 0 0 1.5 0 0 1 1 0 5 0
915 1057 3 2 0 0 3 0 0 0 1 1 0 0
1040 1297 4 2 0 0 3 0 0 0 1 0 0 0
1050 998 3 3 0 0 3 0 0 0 1 1 0 0
360 500 1 1 0 0 5 1 0 1 1 1 15 0
410 650 1 1 0 0 5 1 1 1 1 0 15 0
465 750 2 1 0 1 5 1 1 1 1 0 15 0
550 900 2 2 1 1 5 1 1 1 1 0 15 0
375 600 1 1 0 1 2 1 1 1 1 0 13 1
409 650 1 1 0 1 2 1 1 1 1 0 13 1
419 650 1 1 1 1 2 1 1 1 1 0 13 1
565 950 2 2 1 1 2 1 1 1 1 0 13 1
315 700 1 1 0 0 0.25 0 0 1 1 0 25 1
370 800 1 1 0 0 0.25 0 0 1 1 0 25 1
470 900 2 1 0 0 0.25 0 0 1 1 0 25 1
385 715 1 1 0 0 3.5 1 0 1 1 0 29 0
425 900 1 1.5 1 0 3.5 1 0 1 1 0 29 0
475 1000 2 2 1 1 3.5 1 0 1 1 0 29 0
545 1200 2 2.5 1 1 3.5 1 0 1 1 0 29 0
700 1400 2 2 1 1 3.5 1 0 1 1 0 29 0
675 1500 3 2.5 1 1 3.5 1 0 1 1 0 29 *
245 700 1 1 0 0 0.25 0 0 0 0 1 22 0
395 600 1 1 0 0 2 1 0 0 1 0 14 0
485 700 2 2 0 1 2 1 0 0 1 0 14 0
650 900 2 2 0 0 2 1 0 0 1 0 14 0
475 1000 2 2 0 1 0.5 1 1 1 1 0 17 1
280 700 1 1 0 0 0.25 0 0 0 0 1 22 1
309 471 1 1 0 0 5 1 0 1 0 0 22 0
354 651 1 1 0 0 5 1 0 1 0 0 22 0
374 651 1 1 0 0 5 1 0 1 0 0 22 0
409 876 1 1 0 0 5 1 0 1 0 0 22 0
410 876 2 1 0 0 5 1 0 1 0 0 22 0

Case Study 4.
Advertising and Sales.
An Introduction to Regression Analysis Page: 81

Y X
1103 339
1266 562
1473 745
1423 749
1767 862
2161 1034
2336 1054
2602 1164
2518 1102
2637 1145
2177 1012
1920 836
1910 941
1984 981
1787 974
1689 766
1866 920
1896 964
1684 811
1633 789
1657 802
1569 770
1390 639
1387 644
1289 564

Case Study 5.
School Lunch Program.

Year POP UNEMPL LKUNEMP LUNCH INCOME ENROLL


1980 8830 5.9 9.1 940 10996 1918
1981 8677 5.5 9 929 11728 1843
1982 8523 7.7 29.2 924 10190 1630
1983 7788 6.7 31.3 793 8207 1419
1984 7325 5.6 16.7 746 9269 1355
1985 6980 5.9 13.2 804 9814 1290
1986 6667 7.4 18.2 718 9253 1217
1987 6292 7.7 21.3 660 9092 1117
1988 6010 6.4 15.9 725 8908 1154
1989 6197 5.8 12 719 8515 1163
1990 5975 5 11 723 8115 1147
Case Study 6.
Longley Data.

Year GNP Deflator GNP(Millions) ARMED FORCES EMPLOYMENT


An Introduction to Regression Analysis Page: 82

1947 83 234289 1590 60323


1948 88.5 259426 1456 61122
1949 88.2 258054 1616 60171
1950 89.5 284599 1650 61187
1951 96.2 328975 3099 63221
1952 98.1 346999 3594 63639
1953 99 365385 3547 64989
1954 100 363112 3350 63761
1955 101.2 397469 3048 66019
1956 104.6 419180 2857 67857
1957 108.4 442769 2798 68169
1958 110.8 444546 2637 66513
1959 112.6 482704 2552 68655
1960 114.2 502601 2514 69564
1961 115.7 518173 2572 69331

Case Study 7.
Academic Dishonesty.

Times Hrs Class Hrs Alcohol $ Hrs TV Times Freshman sophomore Junior
Cheated Studied Attended Worked Witnessed
0 12 12 0 0 6 1 1 0 0
4 2 9 9 35 12 3 1 0 0
2 6 12 6 16 8 6 1 0 0
0 8 12 5 8 9 2 1 0 0
0 4 11 12 5 5 0 1 0 0
0 8 12 0 30 10 0 1 0 0
5 4 10 12 48 15 8 1 0 0
2 4 10 6 30 8 6 1 0 0
3 7 12 5 25 16 4 1 0 0
1 8 12 0 12 7 1 1 0 0
4 2 8 24 16 13 5 1 0 0
1 8 11 5 25 2 2 1 0 0
0 6 12 0 10 4 1 1 0 0
1 4 8 0 60 9 0 1 0 0
4 4 10 16 100 20 6 1 0 0
6 2 9 25 50 15 8 0 1 0
0 7 12 4 15 6 0 0 1 0
0 10 12 0 0 4 2 0 1 0
8 0 7 25 35 25 8 0 1 0
2 5 11 9 72 14 3 0 1 0
10 0 5 30 48 24 10 0 1 0
2 4 12 16 55 6 3 0 1 0
2 9 10 8 4 8 2 0 1 0
0 8 11 0 12 4 0 0 1 0
1 6 11 9 30 12 2 0 1 0
1 6 12 3 35 2 1 0 1 0
0 8 12 0 28 1 4 0 1 0
1 4 10 4 10 10 2 0 1 0
An Introduction to Regression Analysis Page: 83

2 8 12 8 45 11 2 0 1 0
2 7 8 12 70 5 4 0 1 0
0 12 12 8 25 0 0 0 1 0
0 10 12 0 5 6 0 0 1 0
0 8 10 0 50 9 3 0 1 0
6 2 7 28 35 15 7 0 1 0
4 4 8 18 46 10 5 0 1 0
2 6 11 6 10 12 2 0 1 0
3 4 10 15 0 20 4 0 1 0
7 0 9 22 50 11 7 0 1 0
1 8 12 5 15 9 2 0 1 0
0 10 12 0 10 5 1 0 0 1
0 9 10 0 6 16 0 0 0 1
2 5 10 10 18 7 2 0 0 1
5 6 7 20 25 10 6 0 0 1
3 4 9 16 35 8 2 0 0 1
2 7 10 6 38 20 3 0 0 1
0 4 12 10 8 1 5 0 0 1
1 11 12 0 5 12 1 0 0 1
0 8 11 2 20 9 3 0 0 1
4 8 10 6 30 16 2 0 0 1
6 2 8 24 45 5 7 0 0 1
2 3 11 12 25 10 5 0 0 1
1 7 12 6 10 2 2 0 0 1
0 9 11 0 5 4 1 0 0 1
0 10 12 0 15 1 0 0 0 1
0 6 12 3 25 6 3 0 0 1
0 8 10 4 8 5 1 0 0 1
5 2 8 16 32 8 7 0 0 1
7 1 9 18 40 12 6 0 0 1
2 6 11 9 20 5 2 0 0 1
1 9 12 2 20 5 1 0 0 1
1 9 12 0 5 3 3 0 0 1
3 5 8 15 15 7 3 0 0 1
5 3 9 6 60 15 6 0 0 0
1 7 12 0 12 10 3 0 0 0
0 10 11 0 5 6 1 0 0 0
0 12 12 0 0 4 0 0 0 0
7 1 7 16 80 11 9 0 0 0
10 0 5 30 100 10 10 0 0 0
6 3 11 9 52 22 7 0 0 0
3 5 10 12 20 9 4 0 0 0
3 6 9 15 25 7 3 0 0 0
4 5 12 3 45 10 5 0 0 0
5 5 11 10 30 16 6 0 0 0
2 7 10 5 20 9 4 0 0 0
0 12 12 0 0 5 2 0 0 0

Case Study 8.
Restaurant Tips.
An Introduction to Regression Analysis Page: 84

TOTBILL TIP SEX SMOKER DAY TIME SIZE


16.99 1.01 1 0 6 1 2
10.34 1.66 0 0 6 1 3
21.01 3.5 0 0 6 1 3
23.68 3.31 0 0 6 1 2
24.59 3.61 1 0 6 1 4
25.29 4.71 0 0 6 1 4
8.77 2 0 0 6 1 2
26.88 3.12 0 0 6 1 4
15.04 1.96 0 0 6 1 2
14.78 3.23 0 0 6 1 2
10.27 1.71 0 0 6 1 2
35.26 5 1 0 6 1 4
15.42 1.57 0 0 6 1 2
18.43 3 0 0 6 1 4
14.83 3.02 1 0 6 1 2
21.58 3.92 0 0 6 1 2
10.33 1.67 1 0 6 1 3
16.29 3.71 0 0 6 1 3
16.97 3.5 1 0 6 1 3
20.65 3.35 0 0 5 1 3
17.92 4.08 0 0 5 1 2
20.29 2.75 1 0 5 1 2
15.77 2.23 1 0 5 1 2
39.42 7.58 0 0 5 1 4
19.82 3.18 0 0 5 1 2
17.81 2.34 0 0 5 1 4
13.37 2 0 0 5 1 2
12.69 2 0 0 5 1 2
21.7 4.3 0 0 5 1 2
19.65 3 1 0 5 1 2
9.55 1.45 0 0 5 1 2
18.35 2.5 0 0 5 1 4
15.06 3 1 0 5 1 2
20.69 2.45 1 0 5 1 4
17.78 3.27 0 0 5 1 2
24.06 3.6 0 0 5 1 3
16.31 2 0 0 5 1 3
16.93 3.07 1 0 5 1 3
18.69 2.31 0 0 5 1 3
31.27 5 0 0 5 1 3
16.04 2.24 0 0 5 1 3
17.46 2.54 0 0 6 1 2
13.94 3.06 0 0 6 1 2
9.68 1.32 0 0 6 1 2
30.4 5.6 0 0 6 1 4
18.29 3 0 0 6 1 2
22.23 5 0 0 6 1 2
32.4 6 0 0 6 1 4
28.55 2.05 0 0 6 1 3
An Introduction to Regression Analysis Page: 85

18.04 3 0 0 6 1 2
12.54 2.5 0 0 6 1 2
10.29 2.6 1 0 6 1 2
34.81 5.2 1 0 6 1 4
9.94 1.56 0 0 6 1 2
25.56 4.34 0 0 6 1 4
19.49 3.51 0 0 6 1 2
38.01 3 0 1 5 1 4
26.41 1.5 1 0 5 1 2
11.24 1.76 0 1 5 1 2
48.27 6.73 0 0 5 1 4
20.29 3.21 0 1 5 1 2
13.81 2 0 1 5 1 2
11.02 1.98 0 1 5 1 2
18.29 3.76 0 1 5 1 4
17.59 2.64 0 0 5 1 3
20.08 3.15 0 0 5 1 3
16.45 2.47 1 0 5 1 2
3.07 1 1 1 5 1 1
20.23 2.01 0 0 5 1 2
15.01 2.09 0 1 5 1 2
12.02 1.97 0 0 5 1 2
17.07 3 1 0 5 1 3
26.86 3.14 1 1 5 1 2
25.28 5 1 1 5 1 2
14.73 2.2 1 0 5 1 2
10.51 1.25 0 0 5 1 2
17.92 3.08 0 1 5 1 2
27.2 4 0 0 3 0 4
22.76 3 0 0 3 0 2
17.29 2.71 0 0 3 0 2
19.44 3 0 1 3 0 2
16.66 3.4 0 0 3 0 2
10.07 1.83 1 0 3 0 1
32.68 5 0 1 3 0 2
15.98 2.03 0 0 3 0 2
34.83 5.17 1 0 3 0 4
13.03 2 0 0 3 0 2
18.28 4 0 0 3 0 2
24.71 5.85 0 0 3 0 2
21.16 3 0 0 3 0 2
28.97 3 0 1 4 1 2
22.49 3.5 0 0 4 1 2
5.75 1 1 1 4 1 2
16.32 4.3 1 1 4 1 2
22.75 3.25 1 0 4 1 2
40.17 4.73 0 1 4 1 4
27.28 4 0 1 4 1 2
12.03 1.5 0 1 4 1 2
21.01 3 0 1 4 1 2
12.46 1.5 0 0 4 1 2
An Introduction to Regression Analysis Page: 86

11.35 2.5 1 1 4 1 2
15.38 3 1 1 4 1 2
44.3 2.5 1 1 5 1 3
22.42 3.48 1 1 5 1 2
20.92 4.08 1 0 5 1 2
15.36 1.64 0 1 5 1 2
20.49 4.06 0 1 5 1 2
25.21 4.29 0 1 5 1 2
18.24 3.76 0 0 5 1 2
14.31 4 1 1 5 1 2
14 3 0 0 5 1 2
7.25 1 1 0 5 1 1
38.07 4 0 0 6 1 3
23.95 2.55 0 0 6 1 2
25.71 4 1 0 6 1 3
17.31 3.5 1 0 6 1 2
29.93 5.07 0 0 6 1 4
10.65 1.5 1 0 3 0 2
12.43 1.8 1 0 3 0 2
24.08 2.92 1 0 3 0 4
11.69 2.31 0 0 3 0 2
13.42 1.68 1 0 3 0 2
14.26 2.5 0 0 3 0 2
15.95 2 0 0 3 0 2
12.48 2.52 1 0 3 0 2
29.8 4.2 1 0 3 0 6
8.52 1.48 0 0 3 0 2
14.52 2 1 0 3 0 2
11.38 2 1 0 3 0 2
22.82 2.18 0 0 3 0 3
19.08 1.5 0 0 3 0 2
20.27 2.83 1 0 3 0 2
11.17 1.5 1 0 3 0 2
12.26 2 1 0 3 0 2
18.26 3.25 1 0 3 0 2
8.51 1.25 1 0 3 0 2
10.33 2 1 0 3 0 2
14.15 2 1 0 3 0 2
16 2 0 1 3 0 2
13.16 2.75 1 0 3 0 2
17.47 3.5 1 0 3 0 2
34.3 6.7 0 0 3 0 6
41.19 5 0 0 3 0 5
27.05 5 1 0 3 0 6
16.43 2.3 1 0 3 0 2
8.35 1.5 1 0 3 0 2
18.64 1.36 1 0 3 0 3
11.87 1.63 1 0 3 0 2
9.78 1.73 0 0 3 0 2
7.51 2 0 0 3 0 2
14.07 2.5 0 0 6 1 2
An Introduction to Regression Analysis Page: 87

13.13 2 0 0 6 1 2
17.26 2.74 0 0 6 1 3
24.55 2 0 0 6 1 4
19.77 2 0 0 6 1 4
29.85 5.14 1 0 6 1 5
48.17 5 0 0 6 1 6
25 3.75 1 0 6 1 4
13.39 2.61 1 0 6 1 2
16.49 2 0 0 6 1 4
21.5 3.5 0 0 6 1 4
12.66 2.5 0 0 6 1 2
16.21 2 1 0 6 1 3
13.81 2 0 0 6 1 2
17.51 3 1 1 6 1 2
24.52 3.48 0 0 6 1 3
20.76 2.24 0 0 6 1 2
31.71 4.5 0 0 6 1 4
10.59 1.61 1 1 5 1 2
10.63 2 1 1 5 1 2
50.81 10 0 1 5 1 3
15.81 3.16 0 1 5 1 2
7.25 5.15 0 1 6 1 2
31.85 3.18 0 1 6 1 2
16.82 4 0 1 6 1 2
32.9 3.11 0 1 6 1 2
17.89 2 0 1 6 1 2
14.48 2 0 1 6 1 2
9.6 4 1 1 6 1 2
34.63 3.55 0 1 6 1 2
34.65 3.68 0 1 6 1 4
23.33 5.65 0 1 6 1 2
45.35 3.5 0 1 6 1 3
23.17 6.5 0 1 6 1 4
40.55 3 0 1 6 1 2
20.69 5 0 0 6 1 5
20.9 3.5 1 1 6 1 3
30.46 2 0 1 6 1 5
18.15 3.5 1 1 6 1 3
23.1 4 0 1 6 1 3
15.69 1.5 0 1 6 1 2
19.81 4.19 1 1 3 0 2
28.44 2.56 0 1 3 0 2
15.48 2.02 0 1 3 0 2
16.58 4 0 1 3 0 2
7.56 1.44 0 0 3 0 2
10.34 2 0 1 3 0 2
43.11 5 1 1 3 0 4
13 2 1 1 3 0 2
13.51 2 0 1 3 0 2
18.71 4 0 1 3 0 3
12.74 2.01 1 1 3 0 2
An Introduction to Regression Analysis Page: 88

13 2 1 1 3 0 2
16.4 2.5 1 1 3 0 2
20.53 4 0 1 3 0 4
16.47 3.23 1 1 3 0 3
26.59 3.41 0 1 5 1 3
38.73 3 0 1 5 1 4
24.27 2.03 0 1 5 1 2
12.76 2.23 1 1 5 1 2
30.06 2 0 1 5 1 3
25.89 5.16 0 1 5 1 4
48.33 9 0 0 5 1 4
13.27 2.5 1 1 5 1 2
28.17 6.5 1 1 5 1 3
12.9 1.1 1 1 5 1 2
28.15 3 0 1 5 1 5
11.59 1.5 0 1 5 1 2
7.74 1.44 0 1 5 1 2
30.14 3.09 1 1 5 1 4
12.16 2.2 0 1 4 0 2
13.42 3.48 1 1 4 0 2
8.58 1.92 0 1 4 0 1
15.98 3 1 0 4 0 3
13.42 1.58 0 1 4 0 2
16.27 2.5 1 1 4 0 2
10.09 2 1 1 4 0 2
20.45 3 0 0 5 1 4
13.28 2.72 0 0 5 1 2
22.12 2.88 1 1 5 1 2
24.01 2 0 1 5 1 4
15.69 3 0 1 5 1 3
11.61 3.39 0 0 5 1 2
10.77 1.47 0 0 5 1 2
15.53 3 0 1 5 1 2
10.07 1.25 0 0 5 1 2
12.6 1 0 1 5 1 2
32.83 1.17 0 1 5 1 2
35.83 4.67 1 0 5 1 3
29.03 5.92 0 0 5 1 3
27.18 2 1 1 5 1 2
22.67 2 0 1 5 1 2
17.82 1.75 0 0 5 1 2
18.78 3 1 0 3 1 2

Case Study 9.
Research and Development Expenditures.

R&D
expenses Sales Profits
62.5 6375 185.1
An Introduction to Regression Analysis Page: 89

92.9 11626 1569.5


178.3 14655 276.8
258.4 21869 2828.1
494.7 26408 225.9
1083 32406 3751.9
1620.6 35108 2884.1
421.7 40295 4645.7
509.2 70762 5036.4
6620.1 80553 13869.9
3918.6 95294 4487.8
1595.3 101314 10278.9
6107.5 116141 8787.3
4454.1 122316 16438.8
3163.8 141650 9761.4
13210.7 175026 19774.5
1703.8 230615 22626.6
9528.2 293543 18415.4

Case Study 10.


Crime Rate.

STATE TOTAL REAL UNEMPLOY- POPULA- % HS % IN MINORITY %POLICE


CRIME GDP MENT TION GRAD METRO %
AL 4820 77.8 5.1 4287178 75.7 67.7 26.6754 23
AK 5450 13.5 7.8 604966 91.4 41.3 7.5118 21
AZ 7067 85.6 5.5 4434340 83.5 87.6 24.9582 23
AR 4699 43.0 5.4 2506293 76.2 48.3 17.7029 23
CA 5208 731.6 7.2 31857646 79.8 96.6 37.6862 22
CO 5119 88.9 4.2 3816179 89.1 84.0 18.3330 26
CT 4228 101.1 5.7 3267293 85.3 95.6 16.8001 26
DE 4895 18.2 5.2 723475 82.7 81.9 21.9235 23
FL 7497 315.8 5.1 14418917 81.5 92.9 29.2816 26
GA 6310 152.9 4.6 7334274 76.5 68.5 30.7754 26
HI 6585 27.2 6.4 1182948 84.4 73.6 10.8236 25
ID 4013 21.4 5.2 1187597 85.9 37.5 7.3940 21
IL 5316 288.0 5.3 11845316 83.2 84.1 24.8642 32
IN 4498 119.4 4.1 5828090 83.7 71.7 10.3897 19
IA 3649 57.6 3.8 2848033 87.4 44.3 3.6960 18
KS 4682 54.0 4.5 2579149 87.7 55.4 10.7922 24
KY 3166 69.5 5.6 3882071 74.0 48.2 7.9667 17
LA 6839 77.4 6.7 4340818 74.6 75.2 34.5285 37
ME 3394 23.6 5.1 1238566 84.7 35.8 1.1425 19
MD 6062 126.8 4.9 5060296 84.6 92.8 30.4655 27
MA 3837 164.2 4.3 6085396 84.9 96.1 11.8632 29
MI 5118 216.6 4.9 9730925 84.2 82.4 16.7694 21
MN 4463 108.1 4.0 4648596 87.9 69.7 4.3784 17
MS 4523 43.1 6.1 2710750 75.2 35.3 37.0827 21
MO 5084 111.6 4.6 5363669 83.9 68.0 12.6067 24
An Introduction to Regression Analysis Page: 90

* 4494 15.3 5.3 876684 85.6 23.5 2.0523 19


MT
NE 4437 34.3 2.9 1648696 87.4 51.3 7.7741 20
NV 5992 37.8 5.4 1600810 85.4 85.7 21.5683 27
NH 2824 28.1 4.2 1160213 86.4 59.8 2.0658 20
NJ 4333 226.5 6.2 8001850 84.9 100.0 26.0267 35
NM 6602 29.1 8.1 1711256 77.1 56.7 42.1354 24
NY 4132 479.7 6.2 18134226 81.6 91.8 31.5312 39
NC 5526 147.2 4.3 7309055 76.0 66.8 23.9956 23
ND 2669 11.9 3.1 642633 80.2 42.7 1.6348 18
OH 4456 237.4 4.9 11162797 84.9 81.1 12.8357 21
OK 5653 58.4 4.1 3295315 83.8 60.2 11.2242 22
OR 5997 66.9 5.9 3196313 87.5 70.2 7.3516 19
PA 3393 270.9 5.3 12040084 81.6 84.6 12.0672 21
RI 3994 22.0 5.1 988283 78.6 93.8 10.7016 24
SC 6214 66.9 6.0 3716645 73.8 69.6 31.2718 23
SD 2970 13.9 3.2 737561 82.4 33.3 1.6603 20
TN 5449 105.9 5.2 5307381 79.0 68.0 17.4248 23
TX 5709 385.8 5.6 19091207 76.4 84.2 41.0954 25
UT 5986 35.4 3.5 2017573 90.7 77.1 6.9325 18
VT 3003 12.0 4.6 586461 86.9 27.7 1.3414 17
VA 3968 152.4 4.4 6666167 82.0 77.9 23.2250 28
WA 5909 126.3 6.5 5519525 90.2 82.8 9.2578 17
WV 2483 30.0 7.5 1820407 74.7 41.8 3.7239 16
WI 3821 109.0 3.5 5146199 88.7 67.7 7.8765 25
WY 4254 9.4 5.0 480011 90.2 29.7 6.6048 29

Case Study 11.


Traffic Accidents.

State Officer/ BAC Seat Max Funding Time to Pop/ %Metro Gas Tax Licenced Total
10K Limit Belt Speed work sqmi Pop Rate Drivers Crashes
AL 23 1 0 70 1064 21.2 85.1 67.7 18 3138 1022
AK 21 1 0 65 453 16.7 1.1 41.3 8 440 71
AZ 23 1 0 75 1532 21.6 40.1 87.6 18 2727 857
AR 23 1 0 65 755 19 48.4 48.3 19 1752 539
CA 22 0 1 70 531 24.6 207 96.6 18 20249 3576
CO 26 1 0 65 922 20.7 37.5 84 22 2757 555
CT 26 1 1 55 1202 21.1 675 95.6 34 2344 296
DE 23 1 0 65 452 20 374 81.9 23 529 105
DC 72 1 0 55 163 27.1 8615 100 20 333 58
FL 26 0 0 70 3472 21.8 272 92.9 12 11400 2496
GA 26 1 0 70 1675 22.7 129 68.5 7.5 4966 1403
HI 25 0 1 55 405 23.8 185 73.6 16 733 134
ID 21 1 0 75 359 17.3 14.6 37.8 21 820 228
IL 32 1 0 65 3097 25.1 214 84.1 19 7610 1312
IN 19 1 0 65 1444 20.4 164 71.7 15 3704 872
IA 18 1 1 65 1128 16.2 51.1 44.3 20 1956 411
KS 24 0 0 70 1162 17.2 31.7 55.4 18 1788 443

*
Data collected from: US Census Bureau – Statistical Abstract of the US
An Introduction to Regression Analysis Page: 91

KY 17 1 0 65 1372 20.7 98.4 48.2 16 2567 733


LA 37 1 1 65 1417 22.3 99.9 75.2 20 2624 701
ME 19 0 0 65 509 19 40.2 35.8 19 874 157
MD 27 1 0 65 1449 27 521 92.8 24 3377 558
MA 29 1 0 65 2545 22.7 781 96.1 21 4355 392
MI 21 1 0 65 1966 21.2 172 82.4 15 6717 1339
MN 17 1 0 65 1374 19.1 58.9 69.7 20 2830 503
MS 21 1 0 70 826 20.6 58.2 35.3 18 1700 695
MO 24 1 0 70 1402 21.6 78.4 68 15 3749 1006
MT 19 1 0 75 377 14.8 6 23.5 27 574 179
NE 20 1 0 75 595 15.8 21.6 51.3 25 116 240
NV 27 1 0 75 468 19.8 15.3 85.7 24 1117 315
NH 20 0 0 65 346 21.9 131 59.8 19 915 125
NJ 35 1 0 55 2928 25.3 1085 100 11 5486 757
NM 24 0 1 75 532 19.1 14.3 56.7 18 1179 412
NY 39 1 1 65 4424 28.6 384 91.8 22 10484 1422
NC 23 0 1 65 1939 19.8 152 66.8 22 5187 1328
ND 18 1 0 65 266 13 9.3 42.7 18 449 80
OH 21 1 0 65 2709 20.7 273 81.1 22 7853 1247
OK 22 1 0 75 918 19.3 48.3 60.2 17 2396 670
OR 19 0 1 65 995 19.6 33.8 70.2 24 2613 460
PA 21 1 0 65 3118 21.6 268 84.6 22 8221 1353
RI 24 1 0 65 297 19.2 945 93.8 29 669 65
SC 23 1 0 65 678 20.5 125 69.6 16 2575 821
SD 20 1 0 75 289 13.8 9.7 33.3 18 519 142
TN 23 1 0 65 1283 21.5 130 68 20 3806 1120
TX 25 1 1 70 4312 22.2 74.2 84.2 20 12568 3248
UT 18 0 0 75 457 18.9 25.1 77.1 19 1319 284
VT 17 0 0 65 192 18 63.7 27.7 16 469 74
VA 28 0 0 65 2321 24 170 77.9 18 4692 807
WA 17 1 0 75 1766 22 84.3 82.8 23 3908 643
WV 16 1 0 65 935 21 75.4 41.8 25 1274 318
WI 25 1 0 65 1324 18.3 95.2 67.7 23 3724 658
WY 29 1 0 75 283 15.4 4.9 29.7 9 343 121

Case Study 12.


Foreign Direct Investments.

Country FDI GNP/PPP R tax political Ex. Rate. Vol Literacy Elec. Railroad Highway Airport WTO
rate risk Rate Consumption
Argentina 11489 8970 6.6775 33 13.73 0.000238347 96.2 1837.6 0.01382 0.07613 0.50207 1
Austria 3838 26850 1.745 34 24.19 0.466487267 99 6892.49 0.07069 1.55988 0.66475 1
Austrialia 33676 20300 3.68 36 22.6 0.07342276 100 8873.88 0.00506 0.11985 0.05356 1
Belgium 18920 25380 2.0125 39 23.72 1.359769267 99 6979.55 0.11181 4.73619 1.38935 0
Bolivia 328 1000 8.0825 25 11.26 0.07596122 83.1 369.292 0.0034 0.04815 1.04206 1
Botswana 21 3600 -0.385 15 15.9 0.343349442 69.8 1144 0.00166 0.03157 0.15717 1
Brazil 37802 4570 24.935 15 11.69 0.026770083 83.3 1880.76 0.00341 0.23414 0.38609 1
Cameroon 238 610 2.5 39 6.31 22.01135171 63.4 176.629 0.00235 0.07307 0.11077 1
An Introduction to Regression Analysis Page: 92

Canada 10390 20020 4.195 38 22.98 0.049380827 97 16499.4 0.00735 0.09893 0.15129 1
8
Chile 9132 4810 10.9125 15 18.56 8.536221677 95.2 2391.5 0.00906 0.10657 0.50481 1
China 6348 750 13.6925 30 17.08 0.000891166 81.5 797.934 0.00696 0.12974 0.02209 0
Colombia 4317 2600 17.56 35 13.31 90.91734867 91.3 1370.08 0.00325 0.11126 1.07827 1
Costa Rica 2126 2780 0.8925 30 13.48 7.709686852 94.8 1341.95 0.01875 0.70266 3.07935 1
Cote 229 700 -2.5 35 8.73 22.48792979 48.5 118.851 0.00208 0.15849 0.11321 1
D'Ivoire'Dji
bouti
Czech 543 5040 -2.72 35 17.58 1.991887651 99 5852.24 0.12003 0.70556 0.87736 1
Republic
Denmark 2628 33260 1.345 34 23.74 0.257600836 99 6572.53 0.07838 1.68892 2.78341 1
Dominican 535 1770 12.425 25 14.34 0.458878059 82.1 824.135 0.01565 0.26044 0.74411 1
Republic
Ecuador 952 1530 -1.6375 25 8.13 750.6349495 90.1 672.637 0.00349 0.15487 0.66103 1
Egypt 1955 1290 5.7025 40 14.83 0.007006364 51.4 683.772 0.00477 0.06429 0.08941 1
El Salvador 599 1850 7.5725 25 10.95 0.00336941 71.5 607.459 0.02905 0.48403 4.15058 1
France 39188 24940 2.435 33 24.22 0.220332823 99 6981.28 0.0587 1.63646 0.86872 1
Germany 42853 25850 8.415 30 24.89 0.06615193 99 6206.29 0.13247 1.87707 1.76814 1
Ghana 321 390 4.01 35 10.61 19.09120667 64.5 311.315 0.00414 0.17133 0.05217 1
Greece 660 11650 6.6825 35 18.03 12.59637463 95 3865.46 0.01948 0.8945 0.59633 1
Guatamela 429 1640 -0.8875 30 9.31 0.141230961 55.6 251.306 0.00815 0.12082 4.40837 1
Honduras 186 730 3.9525 15 8.37 0.200119515 72.7 455.87 0.00532 0.12667 1.09036 1
Hong Kong 20802 23670 3.635 17 19.32 0.003599807 92.2 4176.64 0.03263 1.7572 2.87908 1
Indonesia 6932 680 -34.343 30 8.89 2403.654346 83.8 309.104 0.00354 0.18763 0.24255 1
Ireland 15936 18340 -2.1 32 23.02 0.024349236 98 4883.92 0.02826 1.34272 0.6387 1
Italy 14638 20250 1.1025 37 21.66 62.56302439 97 4653.33 0.03154 1.07816 0.46255 1
Jamaica 2105 1680 5.8575 33 16.36 0.373295565 85 2309.19 0.03416 1.72669 3.3241 1
Japan 38153 32380 -0.64 38 23.39 8.864378691 99 7517.38 0.06316 3.09545 0.45364 1
Kenya 238 330 15.445 35 8.96 1.134899583 78.1 138.326 0.00466 0.11208 0.40755 1
Korea, 7365 7970 4.555 28 15.11 141.4675659 98 4141.28 0.06355 0.64671 1.04899 1
Republic of
Latvia -32 2430 0.94 25 12.65 0.009878048 100 2625.46 0.03734 0.86612 0.77413 1
Lithuania 42 2440 0.825 29 11.6 0.000897902 98 2672.27 0.03203 1.09058 1.536 0
Malaysia 6193 3600 2.98 28 15.25 0.220977994 83.5 2244426 0.00547 0.28763 0.35002 1
Mexico 25877 3970 -4.5875 34 14.25 0.711567319 89.6 1539.95 0.01615 0.13104 0.93862 1
Namibia 2 1940 6.1425 35 15.25 0.518987106 38 673.433 0.00289 0.0785 0.16355 1
Netherland 79386 24760 1.09 35 24.84 0.074086453 99 5716602 0.08301 0.37475 0.82623 1
s
New 6136 14700 5.3675 33 22.23 0.097091554 99 9702.74 0.01479 0.34317 0.41315 1
Zealand
Norway 7609 34330 4.0125 28 22.91 0.117205284 99 25317.7 0.01303 0.29617 0.33457 1
Panama 26957 3080 5.3575 15 12.56 0 90.8 1255.34 0.00467 0.14607 1.44756 1
Papua New 120 890 2.4475 15 10.48 0.193867569 72.2 361.308 0 0.04328 1.08643 1
Guinea
Paraguay 204 1760 1.9175 30 10.1 182.0772537 92.1 877.423 0.00244 0.07425 2.36849 1
Peru 2587 2460 8.72 30 10.89 0.124470621 88.7 608.873 0.00159 0.05636 0.19063 1
Philiphines 3192 1050 1.9 34 13.69 1.956142523 94.7 405.819 0.00301 0.54101 0.25153 1
Poland 1698 3900 6.8325 36 17.6 0.075719325 99 34193.2 0.07984 1.23821 0.24301 1
Portugal 1474 10690 0.4925 37 21.76 6.694341299 85 3218.38 0.03341 0.74749 0.71777 1
Russia 1101 2300 -65.605 35 5.35 5.547440566 98 5383 0.00883 0.05578 0.1481 1
Senegal 67 530 1.7 35 7.74 22.48792979 33.1 72.6229 0.00471 0.07592 0.10417 1
Singapore 19783 30060 4.96 26 23.29 0.053503308 91.1 7928.42 0.06055 4.73255 14.1177 1
South 2363 2880 8.1625 35 14.96 0.525834743 81.8 4177.28 0.01757 0.27155 0.11804 1
An Introduction to Regression Analysis Page: 93

Africa
Spain 12807 14080 0.84 35 22.24 5.447225436 96 4201.62 0.03019 0.69455 0.19824 1
Sri Lanka 24 810 3.7 35 10.53 2.166877811 90.2 263.778 0.02319 1.53228 0.2008 1
Sweden 6053 25620 0.1575 28 23.25 0.141819826 99 15866.6 0.03265 0.33583 0.62055 1
Switzerland 37616 40080 0.915 45 25 0.061403186 99 7389.9 0.11262 1.78642 1.68469 1
Tanzania 26 210 -10 30 6.21 13.64270001 67.8 58.2012 0.00403 0.09954 0.14559 1
Thailand 5721 2200 6.0075 30 14 4.493464936 93.8 1362.19 0.00903 0.12623 0.20908 1
Turkey 1069 3160 9.465 25 13.48 27254.40765 82.3 1389.65 0.01348 0.49613 0.1518 0
Ukraine 92 850 3.155 30 5.63 0.786598652 98 3493.19 0.03868 0.28585 1.16946 0
United 17864 21400 1.7925 31 25 0.008186231 99 5520.27 0.06986 1.5398 2.0572 1
Kingdom 8
Uruguay 567 6180 6.115 30 13.06 0.244762331 97.3 2485.4 0.01724 0.0485 0.37438 1
Zambia 36 330 -30.523 35 4.98 246.9949177 78.2 661.559 0.00292 0.0536 0.1512 1
Zimbabwe 103 610 -2.28 38 9.4 8.190968019 85 964.691 0.00714 0.04743 1.20775 1

Case Study 13.


Auction Data.
Project # Variance of Bids Winning Bid Difference Estimate # of Bids Days for Completion
1465804 125805.216 451130 43470 400000 6 180
1037904 41982.869 191687 22789 204286 5 100
1620704 52933.639 38910 5552 27857 4 30
1530704 30414.534 69881 7794 92386 5 75
1616104 4476.868 45478 947 56753 5 45
1570205 29623.126 84706 2033 66600 7 45
1255904 130859.985 3388034 151516 3252444 4 190
1442804 39163.434 262870 38924 252534 4 100
1617604 63170.118 146125 6058 131271 3 30
1483204 17957.684 129374 25396 124778 2 100
1273904 22094.422 210461 13317 214799 9 90
1622404 4390.073 41249 6326 42166 3 30
637804 205920.779 2007406 60293 2307312 5 300
948605 321306.737 7687638 396580 7675860 3 450
1531404 16517.343 163316 1471 179318 3 100
1532104 10315.319 274299 3247 296168 5 90
1054904 176554.576 1058586 184034 1065977 3 120
1537804 13513.086 77812 6868 99071 5 85
1469204 34852.526 408904 5415 397048 4 65
1312504 60209.261 359747 34934 410193 4 150
930204 120564.188 1196990 137286 1204445 5 165
1490204 146273.65 813595 61322 745998 5 150
1515404 54296.608 512458 76787 532953 2 100
1530804 2287.134 47701 2016 46259 4 60
1294304 51803.768 346279 44591 351743 3 150
1349508 237814.326 1405399 37894 1436064 4 200
1000104 867780.903 1538461 94512 2147013 6 200
1317604 16252.849 333635 9088 340460 5 100
1518604 27570.093 126130 38990 134955 2 90
1622304 39358.23 159789 35539 189972 4 30
1613304 53972.393 272586 44462 292017 4 35
An Introduction to Regression Analysis Page: 94

1451304 35928.609 291968 2609 288901 5 150


1323804 12209.865 223430 11710 225183 3 100
1361704 431980.129 5935981 37003 6869667 4 300
1604604 7034.298 179860 9948 173508 2 30
144804 11183.769 129914 12014 121889 8 120
1391604 176383.731 1041251 173249 1032569 3 300
1904 664011.32 4576224 145593 4723568 5 275
1590504 1576.281 8644 479 11658 3 30
1008804 422985.609 7852609 67894 6998523 5 425
1476504 29007.447 233560 14438 226868 6 100
930204 120564.188 1196990 137286 1204445 5 165
1490204 146273.65 813595 61322 745998 5 150
1515404 54296.608 512458 76787 532953 2 100
1530804 2287.134 47701 2016 46259 4 60
1294304 51803.768 346279 44591 351743 3 150
1349508 237814.326 1405399 37894 1436064 4 200
1000104 867780.903 1538461 94512 2147013 6 200
1317604 16252.849 333635 9088 340460 5 100
1518604 27570.093 126130 38990 134955 2 90
1622304 39358.23 159789 35539 189972 4 30
1613304 53972.393 272586 44462 292017 4 35
1451304 35928.609 291968 2609 288901 5 150
1323804 12209.865 223430 11710 225183 3 100
1361704 431980.129 5935981 37003 6869667 4 300
1604604 7034.298 179860 9948 173508 2 30
144804 11183.769 129914 12014 121889 8 120
1391604 176383.731 1041251 173249 1032569 3 300
1904 664011.32 4576224 145593 4723568 5 275
1590504 1576.281 8644 479 11658 3 30
1008804 422985.609 7852609 67894 6998523 5 425
1476504 29007.447 233560 14438 226868 6 100

Enhance Bridge H-Light t-Signals Erosion Resurf Resurf-a MRB Landsc Surf-a Guard Depot Grade
1 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0
An Introduction to Regression Analysis Page: 95

0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0

Case Study 14.


Economic Cycle and Sunspots.
An Introduction to Regression Analysis Page: 96

IP Spot
78.6 2576.96
78.4 2582.67
78.4 2439.17
78.8 2494.10
79.2 2446.63
79.5 2562.63
79.3 2378.22
78.8 2407.07
80.1 2480.48
80.7 2530.85
81.1 2618.55
82.0 2686.20
83.8 2699.74
84.4 2594.36
85.1 2758.82
86.5 2888.10
86.3 3028.40
86.5 3138.27
86.4 3116.82
87.6 3145.43
88.5 3059.57
89.8 3151.60
90.9 3248.15
91.8 3274.4
91.8 3338.81
93.1 3435.49
93.1 3482.76
93.4 3287.14
93.8 3432.44
94.8 3557.74
95.1 3587.15
95.1 3624.74
95.8 3657.14
96.1 3592.16
96.2 3495.45
94.7 3641.65
93.3 3713.79
93.0 3719.93
93.4 3972.32
93.2 3964.05
94.3 3973.36
94.6 4009.09
94.2 4050.24
93.9 4099.67
94.2 3932.73
93.6 3960.86
90.9 3842.59
87.1 3778.68
84.8 3731.44
83.5 3800.93
An Introduction to Regression Analysis Page: 97

82.0 3675.79
82.7 3586.29
82.5 3625.46
83.6 3538.69
84.1 3574.25
85.6 3541.78
86.4 3484.80
86.9 3584.60
87.7 3534.74
88.4 3683.37
An Introduction to Regression Analysis Page: 98

Case Study 15.


Dell.

Date Stock Price


16-Jan-01 21.5000
17-Jan-01 22.6875
18-Jan-01 24.1875
19-Jan-01 25.6250
22-Jan-01 25.5000
23-Jan-01 26.3750
24-Jan-01 27.1250
25-Jan-01 26.4375
26-Jan-01 26.5000
29-Jan-01 28.4375
30-Jan-01 28.1250
31-Jan-01 26.1250
1-Feb-01 25.9375
2-Feb-01 25.1875
5-Feb-01 24.4375
6-Feb-01 26.8750
7-Feb-01 26.5000
8-Feb-01 26.0625
9-Feb-01 23.5000
12-Feb-01 23.2500
13-Feb-01 22.2500
14-Feb-01 22.9375
15-Feb-01 25.0000
16-Feb-01 23.5000
20-Feb-01 22.0000
21-Feb-01 20.6250
22-Feb-01 21.9375
23-Feb-01 23.2500
26-Feb-01 22.8125
27-Feb-01 22.2500
28-Feb-01 21.8750
1-Mar-01 21.5000
2-Mar-01 22.0625
5-Mar-01 23.4375
6-Mar-01 26.1875
7-Mar-01 25.9375
8-Mar-01 26.1250
9-Mar-01 23.3750
12-Mar-01 22.0625
13-Mar-01 23.9375
14-Mar-01 24.3750
15-Mar-01 24.1875
16-Mar-01 23.6875
19-Mar-01 24.8750
20-Mar-01 24.4375
21-Mar-01 24.6875
22-Mar-01 26.2500
23-Mar-01 27.4375
26-Mar-01 25.6875
An Introduction to Regression Analysis Page: 99

27-Mar-01 27.0000
28-Mar-01 26.4375
29-Mar-01 26.9375
30-Mar-01 25.6875
2-Apr-01 24.0625
3-Apr-01 23.4375
4-Apr-01 22.1875
5-Apr-01 25.1875
6-Apr-01 24.8125
9-Apr-01 24.8900
10-Apr-01 26.2600
11-Apr-01 26.7400
12-Apr-01 27.9200

Case Study 16.


Exxon.
Year Month Stocka Price Year Month Stock Price
1996 4 40.9341 10 68.9859
5 40.8137 11 72.2366
6 41.8370 12 70.4307
7 39.6097 1999 1 67.6616
8 39.2485 2 64.1100
9 40.0913 3 67.9626
10 42.6798 4 80.0020
11 45.4489 5 76.9320
12 47.1946 6 74.2833
1997 1 49.9034 7 76.4504
2 48.2781 8 76.3544
3 51.8900 9 73.5713
4 54.5386 10 71.6957
5 57.0669 11 77.2352
6 58.9932 12 78.4525
7 61.8827 2000 1 80.7044
8 58.9330 2 73.7511
9 61.7021 3 76.3829
10 59.1738 4 76.0768
11 58.7524 5 82.0281
12 58.9330 6 77.2898
1998 1 57.1271 7 78.9513
2 61.4011 8 80.8207
3 65.1333 9 88.2144
4 70.3705 10 88.2917
5 67.9024 11 87.5409
6 68.7452 12 86.4839
7 67.6616 2001 1 83.7110
8 63.0264 2 81.0500
9 68.0228 3 81.0000

a
Adjusted for 2:1 stock split by dividing stock price before stock split by 2.
An Introduction to Regression Analysis Page: 100

Case Study 17.


Starbucks.
Year Month Stock Priceb Year Month Stock Price
1996 4 13.5625 10 18.0938
5 13.5625 11 21.6875
6 14.1250 12 23.0625
7 13.0000 1999 1 28.0625
8 16.3750 2 26.0312
9 16.5000 3 26.4375
10 16.2500 4 28.0625
11 17.3125 5 36.9375
12 14.3125 6 36.8750
1997 1 17.1250 7 37.5625
2 16.8125 8 23.2500
3 14.8125 9 22.8750
4 14.9375 10 24.7812
5 15.7500 11 27.1875
6 19.4688 12 26.5625
7 20.4688 2000 1 24.2500
8 20.5000 2 32.0000
9 20.9062 3 35.1250
10 16.5000 4 44.8125
11 17.4375 5 30.2344
12 19.1875 6 34.0000
1998 1 18.2812 7 38.1875
2 19.7812 8 37.5000
3 22.6562 9 36.6250
4 24.0625 10 40.0625
5 24.0000 11 44.6875
6 26.7188 12 45.5625
7 20.9375 2001 1 44.2500
8 15.7812 2 49.9375
9 13.5625 3 47.6250

b
Adjusted for 2 different 2:1 stock splits by dividing stock price prior to first split by 2 and all stock prices
before the second split by 2.

You might also like