Introduction To Econometrics Data
Introduction To Econometrics Data
Georgia Kosmopoulou
Kodrat Wibowo
Department of Economics
University of Oklahoma
Norman
2000
An Introduction to Regression Analysis Page: 1
CHAPTER 1
ORIENTATION
MINITAB is an easy-to-use statistical package which offers the student and practitioner
various statistical and graphical tools. You will need to use this package to complete
computer assignments and the main modeling project for Econ 4223 (Intermediate
Business Statistics). Econ 4223 assumes that the students are familiar with elementary
statistical concepts such as estimation and hypothesis testing.
Econometrics uses statistical inference as one of its tool. It has many applications in
academia and business. Most applications are concerned with gathering information
(sample data from a population), summarizing the data with sample statistics, and using
the sample statistics to make inferences and tests about the population. MINITAB will
allow you to greatly increase the speed at which these functions are performed.
This manual is designed to help you learn all the basic tools needed to understand the
class materials better and to perform some basic economic analyses through regression.
The computer requirements of the course are divided into two main parts. The first part
consists of four computer assignments. These are designed to make you familiar with the
techniques of regression analysis and the statistical package MINITAB. The second part
is the Modeling Project, which will incorporate all the techniques you have learned
during the course of the term. This will be discussed in detail by your instructor, and will
require you to formulate an original model, find the data and perform a detailed analysis.
Chapters three through six of the manual each represent a computer assignment. All the
assignments are related to one another, and together they will form a coherent computer
analysis resembling that which you will do for your modeling project.
In the manual, we work through a specific problem with data given in the text. For your
assignments, each one of you will be provided with a data set.
Each assignment is made up of 3 sections: the actual assignment, the tools required to
complete the assignment, and interpretation or analysis of any printouts for the
assignment.
An Introduction to Regression Analysis Page: 2
The first section in each chapter describes the assignment in details. The assignment
is designed in such a way to help you understand the class materials better. The
problems are phrased in terms of a real economic phenomenon, where you are
working at International Monetary Fund performing statistical tests for the country
that will be funded. As much as possible, the assignments are designed so you can
achieve mastery of the regression tools and also gain an intuitive feel for what the
statistics really mean.
The second section in each chapter gives an in depth description of the procedures,
needed to complete the assignment. To minimize on the "cookbook" aspect of the
manual we have separated the assignment and the instruction for how to complete it.
Each assignment will build on what you were taught in the previous assignment. The
procedures described in each assignment will not be repeated. It is your responsibility
to remember the appropriate tools. When necessary, there will be hints to remind you
what they are.
You will get a number of printouts containing the results from each assignment.
There will be brief explanation on what they are to help you extract the necessary
information required. The ability to read and interpret a statistical printout is an
essential tool in the business world. Being able to "eyeball" your results and perform
simple tests are fundamentals to effective and quick decision making.
(i) Italics means that you have to come up with something to replace it:
Examples:
filename means you need to type in a file name of your choice, for instance, you
might give your file the name: ASSIGN1
variable means you need to select a variable from a menu or type in a variable name
of your choice, for instance if you have data on real money, you might want to call
your variable: realmon.
An Introduction to Regression Analysis Page: 3
(ii) Bold means that it is a tool (or menu item) that comes with MINITAB:
(iii) Any keyboard keys to be pressed will be denoted with bold type with a < in front of
the key and a > behind it. For example, the "Escape" key is denoted as <Esc>.
The MINITAB computer package is available in many OU computer labs (i.e., at the
Physical Science Computer Lab., the Bizzell Library Computer Lab and others.). You
just need your 4x4 OUNETID and your password to log on.
Every time you need to use the computers, you have to have a formatted disk. You will
need to carry with you a 3.5 inch double sided double density diskette.
Data from your analyses must be saved on the diskette which you insert into the floppy
drive at the beginning of each session. Remove the diskette from the computer when you
are done. This diskette and any printouts obtained are the only records you have from
your work on the computer! Data left on computers have been know to disappear by the
time a student returns to retrieve them. Re-entering the data is a boring and time-
consuming task.
Once you have logged on to the computer, you are ready to use MINITAB. It is located
under the Programs menu in the main WINDOWS 95 menu. Select the MINITAB option
and click the mouse or hit <Enter>. You are now in the MINITAB program.
An Introduction to Regression Analysis Page: 4
CHAPTER 2
INTRODUCTION TO MINITAB
MINITAB is a menu driven software package. All functions have been placed into a
subgroup. When you enter the program, the screen should look like this:
Figure 1
We will call this the main MINITAB menu screen throughout the manual. Using the
arrow keys or the mouse cursor, you can highlight the options that are located at the top
of the screen. Whenever an option is highlighted, MINITAB gives you a brief description
of what it does at the bottom of the screen. If you need more information, hit <F1> to
bring up the Help screen. Hit <Enter> or click the mouse to activate the highlighted
option. Hit <Esc> to close the menu. Use the left and right arrow keys or the mouse to
move between menu options.
MINITAB has four types of windows, and they can all be opened at the same time. The
“Project Manager” allows you to view and access the worksheet, session, history and
graph windows (see figure 2). The “Worksheet” window displays your worksheet. You
can enter and edit data here. The “Session” window displays the output. For the students
who are familiar with the DOS version of MINITAB, they can also type commands here.
An Introduction to Regression Analysis Page: 5
The “History” window contains a record of previously executed commands. Each time
you plot variables, a “Graph” window appears automatically. The worksheet and session
windows are the defaults in the opening window.
Figure 2
Figure 2 shows these two windows and the “Project manager” window. You can
reproduce this setting by double-clicking the icon located at the bottom left corner of
your screen. You can browse over its contents using the arrow located on the right hand
side of the title bar. Another way to open a specific window is by selecting it directly
from the Window menu option (see Figure 3).
Going back to the main menu, the other functions you will use most frequently are File,
Stat, Graph and Help. Option Help will offer a small blurb about various functions.
Hitting <F1> can also activate this.
Pulling down the File menu offers you several options for specifying what data you will
use (see Figure 4). If you have some new data to enter into a file, you need to choose the
option “New” and then select “Minitab Worksheet”. If you are using data entered in a
previous session, the option “Open Worksheet” will pull up the old file. “Save Current
Worksheet” and “Save Current Worksheet As” will save the data worksheet that you
have most recently used. “Print Worksheet” will print the data appearing on your screen
(make sure the cursor is on the “worksheet” window).
An Introduction to Regression Analysis Page: 6
Figure 3
If you go back to the main MINITAB menu and place the cursor in any other window,
the File menu will give you a different set of options that corresponds to the window you
are at. If you place the cursor at the session window “Save Session Windows As” is the
option for saving the statistical output. This output can be printed by selecting the option
“Print Session Window” (see Figure 5). After producing a graph, the option “Save
Graph” will allow you to save it.
Figure 4
An Introduction to Regression Analysis Page: 7
Figure 5
The next menu item is Stat. The option we need for now is “Regression”. This option
will occupy most of your time in this class. Another commonly used tool is “Basic
Statistics”, which calculates various different statistics related to your variables, such as
the mean, median, variance etc. In this menu option we can also find the options
“Correlation” and “Covariance” that create correlation and covariance matrices for any
number of variables.
Figure 6
An Introduction to Regression Analysis Page: 8
Figure 7
There is a number of graphing options provided by MINITAB which you might find
useful for this and other classes. They are located under the main menu option Graph.
The most commonly used option will be “Plot”. We will also use less frequently the
options “3D Plot” and “3D Surface Plot”. The option “Plot” creates a scatter plot of
your data and is helpful for picking up the general relationship between variables.
An Introduction to Regression Analysis Page: 9
CHAPTER 3
ASSIGNMENT 1
3.2. Assignment 1.
After, graduating from the University of Oklahoma, you have a job as a data analyst at
the International Monetary Fund. Your first task is to help the Director make sense of the
26 annual observations on money per capita, GDP per capita, interest rates and CPI of Sri
Lanka (Ceylon), a country that needs IMF’s assistance to stimulate its economic growth.
Overview:
Sri Lanka is a low-income country, near the top of the low-income group according to
the World Bank classification. In 1996, per capita GNP was US$753. Total GNP in 1996
was $13,800m. Average yearly per capita GNP growth was 3.1% over the period 1985-
96. Despite the damage to development caused by internal political conflict in recent
years, Sri Lanka still has the world’s highest ranking for achieved quality of life above
material quantity – reaching 0.711 in the UN’s Human Development Index for 1997,
over 40 places above its rank in purely GDP terms. The purchasing power of Sri Lankan
incomes is also proportionately higher. GDP growth over the year 1996 is 3.8%.
Sri Lanka has followed a trade liberalization policy since 1977. While agriculture is
central to Sri Lanka’s economy (accounting for a fifth of GDP), manufacturing and
An Introduction to Regression Analysis Page: 10
services are of increasing importance, with exports of textiles and clothing now well
ahead of the traditional agricultural exports as foreign exchange earners. The banking
and financial services sector is also developing. The former policies of nationalization
have been superseded by an extensive liberalization program since late in 1989. In this
previously largely centralized economy, privatization is under way in various sectors –
commercial and agricultural enterprises, banking, transport services and utilities. Sri
Lanka is aiming at achieving newly industrialized country (NIC) status by the year 2000.
However ethnic conflict has adversely affected the economy, notably in the areas of
foreign investment and tourism. In the recent political history of this country, the period
1983-1989 was characterized by intense civil unrest. Since the beginning of 1990 there
has been substantial reduction in the instances of conflict.
(a) Your first assignment is to create a data file and input all the numbers into the table.
(b) The Director wants to know what kind of relationship exists between Real Money per
capita and Real GDP1 per capita. Since the Director would like to get an intuitive feel for
the correlation first, it is better to plot a graph. It does not matter which variable is on the
X and Y axes. From the graph, what can you tell the Director about the relationship
between the two variables and the strength of this relationship? Make an intelligent guess
of what the correlation coefficient is, and write it down.
(c) Suppose the Director is also curious about the relationship between Real Money and
Real Interest Rate2, and Real GDP and Real Interest Rate. Repeat what you did in (b) for
these pairs of variables. Write down your guess of the correlation coefficient.
(d) Your boss, the chief data analyst, wants more precise answers on the strength of the
correlation for those three pairs of variables. It is your job to provide him with the
correlation table for the three variables. Print a copy of the correlation table and check
your guesses from (b) and (c) with the actual values from the table to see how close you
were.
(e) Your boss is also interested in knowing whether the population correlation for each of
the 3 pairs of variables is significantly different from zero. Perform a hypothesis test at
the 5% level for each of the three pairs and present your results to him.
1
To make adjustments from nominal to real values in both the cases of GDP and Money, we have
to divide nominal values by the CPI and multiply by 100. Example: Real GDP t = (Nominal
GDPt / CPIt) * 100, where t indicates the year under consideration.
2
The Interest Rate you use should be in real values, not nominal ones. To adjust interest rate we
need to know the inflation rate. To calculate inflation rate use the following formula: ((CPI t/ CPIt-
1) - 1)*100. Real Interest Rate is the nominal interest rate minus inflation. i.e., Real IR t =
Nominal IRt – Inflation Ratet. Notice that the software will adjust all values but the value for
1970. In order to make the adjustment for 1970 we need the 1969 value of the CPI. This value is
CPI1969= 12.94. Plug in the corresponding values to the following equation: ((CPI 1970 / CPI1969) –
1 ) * 100 = Inf. Rate1970.
An Introduction to Regression Analysis Page: 12
Select File from the main menu, choose the option “New” and then select “Minitab
Worksheet”. Now you are ready to enter, edit and view your data. Your cursor
should be in the “worksheet” window.
Figure 8 shows a “worksheet” window. The current cell is highlighted. You can move
about in the window as you do in other window applications, using scroll bars, the arrow
keys and the <PgUp> and <PgDn> keys. Click in a cell to make it the current cell.
Entering Data.
Use the arrow keys to get to the cell you want, enter the number and then move to the
next cell of the same row by hitting <Tab> (If you press <Enter> instead the arrow
will move to the next row.). Repeat the process until you have entered all the
numbers. Figure 8 shows the “worksheet” window after the data on Nominal Money,
Nominal GDP and Nominal Interest Rate for 1970 have been entered.
If at any time you realize the number in a cell is incorrect, move to that cell and type the
correct number.
Figure 8
An Introduction to Regression Analysis Page: 13
You can enter or change the name of any variable placed in the data sheet. Go to the cell
just below the column indication (C1, C2 etc) and type the name of the variable whose
data you have entered below.
Saving Worksheets.
Once you have entered your data set into MINITAB, you will usually want to save it on
your disk so that you can use it again or edit the data in another session. Note that the
extension of any worksheet file is *.mtw.
Select File from the menu and choose the option Save Worksheet As. You will be given
options in a dialog menu. Type the name of your worksheet (ASSIGN1.MTW ) and click
OK.
Figure 9 shows the “worksheet” window after some of the new data have been entered
and the variables have been named.
Figure 9
An Introduction to Regression Analysis Page: 14
Creating Graphs.
- Highlight your data set in the “worksheet” window and select Graph on the menu bar.
You will see the options of available graphing tools. Using those you can create different
types of graphs and charts.
- Select Plot from this menu. The Plot function will create a scatter plot. Once you click
on Plot, MINITAB will give a dialog box.
- Select the variables you would like to plot. Notice that you need to specify which one
should be depicted on the Y-axis and which one on the X-axis. Go to the first entry of the
Y column and then click twice on the corresponding variable from the variable list.
- Repeat the process for X.
- To put a title on the graph, choose Annotation and type the title.
- Click OK or <Enter>. After a while the plot will appear on your window.
Note: For this first assignment, it does not matter which variable you choose for your X
and Y axes. We are only interested in the sign and strength of the linear relationship.
To plot the next graph follow the same steps choosing the new variable for the X and Y
axes.
To save the graph in a MINITAB Graphics Format (MGF) file, choose File and then
Save Graph Window As from an active “Graph” window. You can open the MGF file in
your next MINITAB session.
Figure 10
An Introduction to Regression Analysis Page: 15
- Select Stat from the menu bar. Choose Basic Statistics and then choose Correlation.
A dialog box (figure 10) will appear giving you the choices of variables. It allows you to
pick the variables you want to analyze. If you select more than two variables, MINITAB
will create a correlation matrix showing you the sample correlation coefficients for
different pairs of variables.
- Select Variables. Highlight the variables you want to analyze and click twice on them.
- Click OK. The correlation matrix will appear after a few seconds at the “Session”
window.
- You can save the matrix in the “Session” window by clicking “Save Session Window
As” in the File menu.
An Introduction to Regression Analysis Page: 16
CHAPTER 4
ASSIGNMENT 2
MAKE SURE YOU BRING THE DISK WHICH YOU ENTERED YOUR DATA ON
FROM THE PREVIOUS ASSIGNMENT. YOU WILL USE THIS DATA FOR THE NEXT
THREE ASSIGNMENT
(4) Check the significance of an independent variable without formal hypothesis testing.
The ability to “eyeball” your data and make quick hypothesis tests in your head is a
critical skill.
4.2. Assignment 2.
The Director wants to know how real GDP per capita affects real money per capita in the
country for the last 26 years. He knows regression would give him the relationship, but
does not have the time to perform the calculations.
(a) Help him by showing him a plot of the data with the regression line superimposed on
it. This will help him (and you) get an intuitive feel of what a regression is.
the Y-axis is represented by the dependent variable and the X-axis is represented by
the independent variable in the graph.
(b) The next thing you need to do is show him the equation that represents the regression
line in the graph. Compute the regression line with MINITAB and write down the
sample regression line in equation form. Make sure you include the standard errors in
parentheses below the regression parameter estimates. Attach a copy of the printout.
(c) Please write down the following information from your regression results:
SSR: R2:
SSE: Adjusted R2:
SST:
(d) From the data given above, find the sample correlation coefficient between the two
variables? Does this match the one in the correlation table (in Assignment l)?
**Hint: What is the relationship between R2 and the sample correlation coefficient in a
simple regression?
(e) Can you tell the director if Real GDP has any significant impact on Real Money at the
5% level? A sample test is given in Chapter 7 of this guide.
(f) Suppose your boss told you that you should also include Real Interest Rate as an
addition independent variable in your regression. Follow his advice and run a
multiple regression. Write down the multiple regression equation and attach a copy
of the computer printout. Similarly, write down the information about SSR, SSE,
SST, R2 and adjusted R2 from your regression results.
(g) Compare the R2 and the adjusted R2 between the simple and multiple regression lines.
Do they go up or down between the two regressions? Does R 2 ever go down when
adding another independent variable? What hint does the change in the adjusted R 2
give you about real interest rate as an independent variable?
You will often need data entered during one work session in an ensuing session. If the
file was saved when it was entered, this presents no problem. If it was not, you will have
to reenter the entire data set.
1. Select File from the main menu. A submenu will appear giving you several choices.
The one you want is the option “Open worksheet” which allows you to retrieve a data
file.
2. Select “Open Worksheet”. A box will appear showing you all the available files in
you disk.
3. Select the filename under which you saved your data set in the previous session (for
example: ASSIGN1.MTW ) and click OK.
If you want to change your data, you can do so by editing the data. This was covered in
Chapter 3. If you want to perform other analysis, just follow the instructions listed under
each procedure.
In the previous computer assignment, you plotted the data in a simple two-dimensional
graph. It is helpful to be able to see the data plotted along with the sample regression line
to get a preliminary estimate of how good your regression is.
The computer program will calculate all statistics relevant to regression analysis. Before
making any difficult calculations, make sure that the statistic you are looking for has not
already been provided
1. Select Stat from the main menu. A submenu will appear showing you all the
available statistics tools. The one you want is “Regression” which allows you to run
a regression analysis.
2. Click on Regression twice. A dialog box will appear giving you the choices of
variables. Pick the dependent (or response) and independent (or predictors) variables
for your model and click OK.
Note: If you are running a multiple regression, you should select all the independent
variables of your model here.
An Introduction to Regression Analysis Page: 19
The following tables present the summary of the results produced by MINITAB.
You must be familiar with the three terms at the bottom of the regression table.
You should also get familiar with the following two columns:
bi
(i) T column: reports the t-value computed with the formula t ,
sbi
(ii) P column: reports the p-value for a two-tail test.
Where:
SSR n k 1
F
SSE k
p = p-value associated with the test of the significance of the independent variables as a
group
DF =Degrees of Freedom
An Introduction to Regression Analysis Page: 20
CHAPTER 5
ASSIGNMENT 3
.
Note: There are no new computer procedures presented in this Chapter. All functions
you will perform have already been discussed in Assignments 1 and 2.
5.2. Assignment 3.
The Director has decided to incorporate the information on trade policy, ethnic conflict
and the effort to privatize a large part of the public sector in his analysis. The variable
indicating if there was civil unrest is qualitative and will enter the model as a dummy
variable that takes the value of "1'' if there is either intense or moderate civil unrest and
“0” otherwise. The variable indicating trade policy is another qualitative variable that
takes the value of “1” in the period of trade liberalization and “0” before that. Finally
privatization is captured in the last dummy variable that is given a value of “1” in the
period that the extensive liberalization program policy was in effect and “0” before that.
The Director wants you to run thorough tests on the significance of these data. He also
wants you to test for the presence of approximate multicollinearity between the
independent variables.
An Introduction to Regression Analysis Page: 21
Hint: You must open the old data file using the options File and then “Open Worksheet”
(described in section 4.3) and then add more variables to your data set (described in
section 3.3).
(b) Using Real GDP, Real Interest Rate, Civil Unrest, Trade policy and Privatization as
your independent variables, run a multiple regression analysis with Real Money as
the dependent variable. Print the results and write out the sample regression line (with
standard errors in parentheses under each coefficient).
(c) Test at the 5% level the null hypothesis that the five variables taken as a group have
no significant impact on Real Money (See Chapter 7.).
(d) Which of the five variables has no significant individual impact on Real Money?
Conduct your test at the 5% level.
(e) Based on your results in (3) run a new regression dropping the insignificance
variable(s). Similarly, print the results and write down the new sample regression line
(with standard errors in parentheses under each coefficient).
(f) Do you think the variables you dropped explain deviation in the dependent variable as
a group? Conduct your test at the 5% level. What conclusion can you make about
your decision to drop variables? (i.e. Was it the correct thing to do or should you have
left the variables in?)
Hint: You must perform this test by hand using the SSE from the original and new
regressions.
(g) Test for the presence of multicollinearity among the independent variables Real
Interest Rate, and Real GDP. Conduct your test at the 5% level.
Hint:compute the correlation coefficient and the p-value. Instructions on how to do this
were discussed in Assignment 1. Print the results.
An Introduction to Regression Analysis Page: 23
CHAPTER 6
ASSIGNMENT 4
6.2. Assignment 4.
In your previous assignment, you have shown the Director how real money per capita in
Sri Lanka is affected by real GDP per capita, real interest rates, political unrest, trade and
privatization policy by building a multiple regression model. You have also tested for
multicollinearity. However, the director is interested in knowing if the model you built
has any additional statistical flaws in it. In other words, he wants to know if your model
has problems with autocorrelation and heteroscedasticity.
Hint: You first need to open the original data file. This was discussed in Assignment 2.
The tests for heteroscedasticity and autocorrelation apply to the revised regression
model
where you dropped the insignificant variables.
(a) Test for the presence of positive autocorrelation in the revised model. Conduct your
test at the 5% level.
Hint: Run the revised regression analysis again. Ask the computer to save the residuals,
and to compute the Durbin-Watson statistic used to test for autocorrelation.
(b) Test for the presence of heteroscedasticity in the revised model. Conduct your test at
the 5 %, level .
Hint: You have to run an auxiliary regression in order to test for heteroscedasticity. The
independent variable in the auxiliary regression is the expected value of the dependent
variable from the original regression. The dependent variable is the squared error term
An Introduction to Regression Analysis Page: 24
from the original regression. This data can be found in the file in which you saved your
residuals. Go back and use this data set. Select File then “Open” from the menu bar and
you will see a file with several variables. The ones you are interested in are RESI and
FITS. RESI is your error term. FITS is the expected value of the dependent variable.
Create a new variable in this file which is the squared value of RESI. This procedure is
described below. Then run a regression using this new variable as your dependent
variable and FITS as your independent variable. Record the R2 from this regression.
Print the results from the auxiliary regression.
When you do the regression procedure, MINITAB can create a file that contains the
predicted value of the dependent variable and the sample error terms. It can also compute
the Durbin-Watson Statistic, which you need to use to test for autocorrelation.
1. Go through the steps (described in chapter 4) to run a multiple regression. You should
now have your dependent and independent variables chosen.
2. When you do step 2, MINITAB will show you the dialog box. Click the options
button and check the box for the Durbin-Watson statistic. Click OK.
3. After finishing step 2. Click the storage button. You will see the storage options for
data that can be generated automatically after you do the regression. Check the box
for Fits and Residual. Click OK.
This step will give you the estimated values and the residuals in your worksheet, with
column name FITS1 for the estimated values and RESI1 for the residuals (error terms).
You can change the column names if you want by following the appropriate steps
described in chapter 3. Now, you have to save your residuals and estimates under a new
file name.
4. Select File and then Save Worksheet As. Type in filename and click OK.
You should choose a name that won't get you confused with your original data file. i.e. if
you've called your original data file ASSIGN, call your file that includes the residuals
RESID1.
Note: If you are testing for heteroscedasticity, create a variable that will be the square of
the error term first and then run the regression described in the last paragraph of the
previous subsection.
To estimate the square of the error term, go to Calc then “Matrices” and then
“Arithmetic” and multiply the column of the residuals by itself. Store the results in
another column and go back to step 5.
An Introduction to Regression Analysis Page: 25
An Introduction to Regression Analysis Page: 26
CHAPTER 7
The cases that you will be tackling in your modeling projects require you to communicate
your statistical results somehow. You will be discussing your results in class or computer
lab with your instructor and your other classmates. In this final assignment, you will be
asked to prepare a written report of your findings. Writing effectively will be important in
your professional career and of course oral communication as well because your will
often be asked to give a formal presentation and report your findings. Both oral and
written communication skills are highly valued in the workplace.
Preparing the written summary of your project’s results will be probably just as difficult
as doing the computation. Many students say it takes about as long to write the report as
it does to finish the statistical work.
Once you are done with the computer work, in the real world you must interpret these
findings for someone, perhaps a manager or your boss. A good report should be written
so that anyone can understand it. Unless you are writing to someone you know is well-
versed in statistical techniques, avoid using statistical and mathematical jargon in your
report.
For instance, if you were the vice president of marketing who never had a statistical
course, which of the following two summaries would you find most valuable?
It may be hard at first to write in a non-technical way because you will be using jargon all
throughout the course. So will your instructor. So will your textbook. You will get used
to using specific statistical terms like “least square” and “p-values,” but do not presume
that your reader understand them. You will have to find a way to translate statistical
concepts, methodologies, and outcomes for the uninitiated. Just by virtue of spending
time in your statistics class, you may well forget that certain statistical concepts do not
exist in most people’s vocabularies. Here are some useful guidelines.
An Introduction to Regression Analysis Page: 27
1. Do you have any words in the report that you did not commonly use before you
walked into this class? If you used a word or phrase on a regular basis before signing
up for statistics, it is probably acceptable to use in your write-up.
2. Would your next-door neighbor or roommate understand the essence of your report?
3. What would the editor of your local newspaper think about publishing your report in
tomorrow’s newspaper?
It sometimes helps to provide background information about the problem at hand. This
might include a statement of the problem or situation and the data available to answer this
question. Since your report involves statistical analyses, you might also provide initial
descriptive statistics (e.g., mean and/or standard deviation) of some of the more important
variables, unless that information would simply distract from the point you are trying to
make. By providing this type of background information, you can put the problem in
perspective and also ease the way into the upcoming material.
Needless to say, good grammar, correct spelling, and appropriate punctuation are all-
important components of an effective report. You probably can not convince a reader that
your statistical results are valid if your writing is poor. Sloppy, misspelled and otherwise
disorganized reports send the message that you do not think the report is very important
anyway. Brush up your writing skills. You will discover that it is both fun and valuable to
write effectively. Word processors may help; a spelling checker is useful too. In some
major universities, they also provide writing center that will help the students in doing the
paper or written report of their class.
Consider putting supporting statistical documentation, such as graphs, tables, and other
statistical output at the end of your written report. Within the report, refer to these
appendices for guidance. On the other hand, if one particular table or graph really
contains the essence of the point you are making, you probably want to put it right in with
the text, where your reader can see it quickly. A critical graph probably belongs in with
the text. A table that provides a huge amount of information and provides relatively
minor support to your argument probably belongs in an appendix.
You may include the important statistical printout you produced in the text. If you can not
decide which ones are important, you are not done with your analysis yet. Once you have
determined it, the following rule-of-thumb may help:
Do not append a statistical exhibit if you do not refer to it in the report, and of course, do
not refer to an exhibit that is not there.
An Introduction to Regression Analysis Page: 28
Your project report should not exceed 10 pages. Remember you are trying to present the
essence of your statistical findings, not a comprehensive validation of every step of your
work. You must identify the fine line between too much detail and not enough detail.
You are not assigned to make a short report that it is deceptive and to bore or intimidate
your intended reader with unnecessary details, either. Only you, as the reporter of the
data, can decide on this issue of what to include.
We suggest you to make the very first page an “executive summary” An executive
summary should be written for someone who has never had a statistics class, and does not
care to have statistics explained. It should include only the important results and
implications derived from the data. It should also include any necessary caveats or
limitations of your findings. Try to limit this executive summary to one page, single –
spaced.
Next, we present two different sample summary reports3 of statistical analysis aiming to
assess the success of a company’s new wellness program.
The first of these reports is not particularly well structured, neither very informative. It
contains a lot of technical language that only a person with statistical background could
understand.
Per your request, I have analyzed the data on sick days and the company’s new wellness
program. The results are summarized below.
The regression model suggests that there is a statistically significant relationship between
the two variables. The correlation between dollars contributed to the wellness program
and absenteeism is a positive 0.63. The standard error is 0.073. The R-squared statistic
on the simple regression model 56%, which is pretty good for cross-sectional data.
Moreover, the F-statistic measuring the statistical significance of the model as a whole is
54.90, indicating a good model.
I’d be happy to assist in any further analysis of this data at your request.
3
Peter G. Bryant and Marlene A. Smith, “Practical Data Analysis; Case Studies in Business Statistics”
Volume II, University of Colorado at Denver, Irwin, Chicago, 1989.
An Introduction to Regression Analysis Page: 29
The second report on the other hand is well structured and designed to appeal also to
people who are unfamiliar with statistics. In fact it contains a minimum of technical
language but at the same time it offers the opportunity to those who have a technical
background to evaluate the results on their own by including the exhibits.
As you asked, I have analyzed the data on sick days and the company’s new wellness
program. Here are my results
Data:
The personnel department provided a sample of 125 randomly chosen employee files.
From those files, we obtained:
the company’s payment for that employee’s participation in our wellness program,
that employee’s absentee record (measured in number of absent days) over the past
two years, and
the gender of the employee.
Results:
1. On average, our employee missed 15 days of work in the first year. The typical
fluctuation around the average of 15 days was seven days.
2. In the second year, after starting the wellness program, the average number of absent
days declined to 10 days and the typical fluctuation also decreased to 2 days. We
committed about $50 per employee to the wellness program last year.
3. A statistical model (see EXHIBIT below) of the relationship between dollars
committed to wellness, absenteeism, and gender indicates that the wellness program
has a statistically significant relationship to absenteeism. The model suggests that:
each additional dollars committed to our wellness program was associated with a two
hour decline in absenteeism, and
there is no statistically significant relationship between gender and absenteeism.
The model would generally be considered a statistically strong one, since the model
explains 56% of the variation in absenteeism. Although this leaves 44% of the variation
unexplained, it is difficult to do much better with the type of data available for this
analysis.
Recommendation:
Because of the decline in absenteeism after the institution of the wellness program, I
recommend that the program be continued.
Limitations:
Consider redoing this study in another year with a larger sample size. It is not clear
whether one year is enough to observe the full benefits of the wellness program. There
are some troubling aspects of the statistical results that might be alleviated with a larger
sample size. For instance, many of the standard errors model (that is, measures of the
accuracy of the estimates) are quite large in my opinion.
I’d be happy to answer any further questions that might have about my analysis or report.
EXHIBIT
An Introduction to Regression Analysis Page: 30
Analysis of Variance
SOURCE DF SS MS F q
Regression 2 9770 4885 78 . 3 0 . 000
Error 122 7614 62 .4
Total 124 17384
An Introduction to Regression Analysis Page: 31
CHAPTER 8
MODELING PROJECT
“If you torture the data long enough, Nature will confess”.
RONALD COASE
For the modeling project you must construct a regression model of a real world system of
interest to you. First you must decide what your dependent variable is. What do you wish
to explain? Sales? Housing prices? Capital spending in the economy? New business
incorporation? The poverty rate? Find a topic that interests you. Then find variables that
could potentially explain the variation in the dependent variable.
Finding data on the variables you selected can be frustrating sometimes. As a matter of
fact, finding data is probably the most difficult part of this project and you should not
underestimate the effort you have to exert. Plan to devote at least two weeks on data
collection. The following are some mistakes that students often make while working on
their projects:
1. Students think that they do not need data on the dependent variable. This of course is
incorrect. You must find data for both the dependent and independent variables.
2. Students think they have to find both time series and cross-sectional data. There are in
fact panel data techniques to handle this type of information but we are not dealing
with them here.
3. They get the idea that all of the numerical information has to be either in rates, whole
numbers, indexes or percentages. This is not the case.
4. Putting off doing the assignment or finding the data.
Chapter 9 of this manual contains sources for finding data on the internet. It is an
excellent list of sources, and you should start looking for your data there. We will discuss
strategies in class in more details. An enormous amount of government statistics are
available ranging from Gross National Product to sales of Girl Scout cookies. Do not give
up on finding the data before you've tried 4. Once you have found the data, you must use
MINITAB to analyze your model.
The functions and procedures needed to complete your modeling project have been
covered in the description of the first four assignments in this manual. The modeling
project will be much like your class assignments, except that you will be using your own
data and model. You must justify the model you have chosen, and your results must be
written up in the form of a report. If you have a hard time performing a particular test, go
back to chapters three through six to refresh your memory on the correct procedures.
You may discuss methodology with your fellow students, however, you must work on the
Modeling Project independently.
The Modeling Project must be typewritten (double-spaced) and should not exceed 10
pages. On your title page, you should have the name of the course (i.e., ECON 4223),
your section number, the semester (e.g., Fall 2001), the title of your paper, your Social
Security Number, and your name.
a) Define the dependent and independent variables and specify the units in which
each is measured. (If you are using real vs. nominal data, you must specify the
base year.)
b) State the data sources for each variable. Note any difficulties you had obtaining
the data, and your techniques to overcome them.
4
You could look at demographic issues, federal funds, taxes, spending, employment, education, social
services, health, transportation, issues of interest to the local government, natural resources, science
technology, exports, imports, stock prices etc.
An Introduction to Regression Analysis Page: 33
1. State (mathematically and explain in words) all the assumptions you need to make in
order to estimate the model.
2. Write out the estimated regression equation for the first computer run, with standard
errors in parentheses under each coefficient. Also present the R2, adjusted R2, F-
statistic and Durbin-Watson statistic.
4. Test for multicollinearity. Discuss the consequences of multicollinearity (if any) for
your model.
6. Perform the tests of significance for the individual regression coefficients (t-tests). If
these tests indicate that some of the regression coefficients are insignificant, then drop
the corresponding variables and estimate a revised model (obtain a second run).
7. If you drop more than one independent variable, test whether the variables you
dropped are significant as a group (F-test on subset).
8. Write out the estimated regression equation for the second computer run, with
standard errors in parentheses under each coefficient. Also present the R2, adjusted
R2, F-statistic and Durbin-Watson statistic.
9. Repeat steps (3), (5), and (6) for the revised model. Compare R 2 and adjusted R2 for
the two models and comment on the differences (i.e., what do the changes in these
numbers between models tell you?).
10. Based on the analysis so far, select between the original and revised models the one
that best fits the data. Interpret each estimated regression parameter in the context of
the problem (i.e., interpret the intercept and the coefficients).
11. Calculate and interpret confidence intervals for the regression coefficients.
12. Test for (i) autocorrelation and (ii) heteroscedasticity. Discuss the consequences of
autocorrelation and heteroscedasticity (if any) for your model. Are your estimators
still BLUE? Are inferential procedures such as hypothesis testing and confidence
intervals reliable?
An Introduction to Regression Analysis Page: 34
Part 3: Conclusions.
State your conclusions regarding the model(s) you have estimated. Review the original
and revised models. Comment on your testing procedure Discuss any problems your
model might have. Finally, offer any interesting implications of your model.
Present the data set that you used. Include a printout of your data set from the computer.
Include the computer printouts for all runs. Write your name and which model (original
or revised) on each page of the printout.
Put the paper and the printouts in an envelope, or staple everything together.
An Introduction to Regression Analysis Page: 35
Specify what you wish to predict or explain (the subject of your paper). Explain clearly
why this subject is interesting. Identify the variables (dependent and independent)
involved in your analysis. Discuss the theoretical relationship between the dependent and
independent variables. You not only need to make a prediction about what sign you
believe the sample regression coefficients will take, but you also need to explain why!
For example, suppose your dependent variable is deforestation and one of your
independent variables is price of cattle. You need to predict that there is a positive
coefficient on the variable, but also why you believe it is positive. (Perhaps higher cattle
prices induce ranchers to clear more land to raise more cattle.) Collect the data that will
be used for the analysis. Enter your data in Minitab worksheet and attach a copy of the
data file to this proposal. The proposal should be typed.
A. Title of Project
Are you trying to explain cross-sectional or time series variations in the dependent
variable?
What is your sample size?
Reminder: When you submit your proposal to me don’t forget to include Xerox copies
of the actual tables that included the data and a printout of your data set from the
computer.
An Introduction to Regression Analysis Page: 36
CHAPTER 9
DATA SOURCES
http://midas.ac.uk/macro_econ MIDAS.
http://dawww.essex.ac.uk ESRC Data Archives.
ftp://stats.bls.gov/ US Labor Time Series.
http://www.worldbank.org World Bank.
http://www.tri.org.au/, the Theoretical Research Institute tr(I), Sydney.
http://nilesonline.com/data/ A Journalist guide, by Robert Niles
http://www.fedstats.gov/ FEDSTATS.
http://www.whitehouse.gov/fsbr/esbr.html Economic Statistics Briefing Room (ESBR) -
White House.
http://www.gpo.ucop.edu/info/econind.html Economic Indicators 104th Congress.
http://www.csufresno.edu/Economics/econ_EDL.htm Econ Data & Links.
telnet://ebb.stat-usa.gov EBBat the Commerce Department.
http://www.stat-usa.gov U.S. Department of Commerce (STAT-USA).
http://www.bea.doc.gov/ Bureau of Economic Analysis.
http://nces01.ed.gov/NCES/ National Center of Educational Statistics
http://www.inform.umd.edu:8080/EdRes/Topic/Economics/EconData EconData.
http://stats.bls.gov/blshome.html Bureau of Labor Statistics (LABSTAT).
http://www.cdc.gov Center for Decease Control
http://www.gpo.ucop.edu/catalog/erp97.html 1997 Economic Report of the President via
GPO Gateway at UCSD.
http://www.globalexposure.com Business Cycle Indicators from Media Logic.
http://www.nber.org/databases/macrohistory/contents/index.html NBER's Macro-
Historical Database.
http://www.lib.virginia.edu/socsci/reis/reis1.html Regional Economic Information
System.
http://www.lib.virginia.edu/socsci/ccdb/ County and City Data books, Interactive Data
Resources, University of Virginia Social Sciences Data Center.
http://bos.business.uab.edu/charts.htm Economic Chart Dispenser.
http://bos.business.uab.edu/data/data.htm Economic Time Series Page.
http://www.lib.virginia.edu/socsci/nipa/ National Income and Product Accounts,
Interactive Data Resources, University of Virginia Social Sciences Data center.
http://www.bog.frb.fed.us Board of Governors of the Federal Reserve System.
http://www.econ-line.com National Economic Research & Data Services (NERDS).
An Introduction to Regression Analysis Page: 37
Financial Markets:
http://www.finweb.com FINWeb.
http://www.cob.ohio-state.edu/dept/fin/osudata.htm Financial Data Finder at Ohio State.
http://www.wsrn.com Wall Street Research Net.
http://www.wsdinc.com Wall Street Directory.
http://www.tsi.it/finanza/index.html Finance Area by Top Services International.
http://www.globalfindata.com Global Financial Data.
http://www.briefing.com Briefing by Charter Media.
http://www.bloomberg.com Bloomberg Personal.
http://www.sec.gov/edgarhp.htm EDGAR (SEC).
http://turnpike.net/metro/holt/index.html Martin Wong's and George Holt's Market
Report.
ftp://sunsite.unc.edu/pub/archives/misc.invest Public Domain Financial Data.
http://www.quote.com QuoteCom Data Service.
http://www.secapl.com/cgi-bin/qs Security APL QuoteServer.
http://www.jpmorgan.com JP Morgan.
http://www.wiwi.uni-frankfurt.de/AG/JWGI Student Investment Club at Johann
Wolfgang Goethe University.
http://www.charm.net/~lordhill New Zealand Investment Center.
http://www.asiawind.com/pub/hksr InTechTra's Hong Kong Stocks Reports.
http://www.fid-inv.com Fidelity Investment.
http://www.vanguard.com The Vanguard Group, Inc.
http://networth.galt.com NETworth.
http://www.schwab.com Schwab Online.
http://www.etrade.com E*Trade Securities, Inc.
http://www.lombard.com Lombard Institutional Brokerage, Inc.
http://www.gwdg.de/~ifbg/bank_2.html Banks of the World via the Institute for Finance
and Banking at the University of Göttingen.
http://www.moneyline.com MoneyLine - Real Time Fixed Income Data.
http://www.yahoo.com/r/sq Yahoo Finance, Stock Quotes
An Introduction to Regression Analysis Page: 40
CHAPTER 10
CASE STUDIES
This chapter presents a collection of case studies describing real world phenomena that
we will analyze in class. The corresponding data are provided in the appendix. We will
use this information to explain the behavior of variables of interest to us and discuss the
econometric issues arising from the nature of typical economic data.
The World Development Report, among other sources, indicates that in less developed
countries the average number of births is consistently higher that that in developed
countries. For obvious reasons, this phenomenon presents both economic and
humanitarian challenges worldwide. From a public planning aspect, this problem presents
governments with several institutional challenges, most notably as the formidable task of
providing public health care, education and nutrition to a high population of newborns is
essential.
To investigate this problem, a study is conducted to clarify and define the possible factors
that may or may not be contributing to the number of births a mother chooses to give.
Potential factors included in the data set are the infant mortality rate, life expectancy at
birth, percentage of women participation in the labor force etc. The specific variables,
their definitions and units of measurement are described below. All data refer to 1995.
There are 103 countries in the sample and 7 explanatory variables in the model.
Definitions:
Tot Fert Rt Total Fertility Rate is the number of children who would be born to a
woman if she were to live to the end of her childbearing years and bear
children in accordance with the age-specific fertility rates measured as the
estimated births per woman per country.
Inf Mort Infant Mortality Rate is the number of infants who die before reaching one
year of age, expressed per 1000 live births per year.
Gini Ind Gini Index is the measure of equality of the distribution of wealth, the
higher the index, the lower the economic equality.
Life Exp Life Expectancy at Birth is defined as the number of years a newborn
child would live if patterns of mortality rate prevailing at the time of its
birth were to stay the same throughout its life.
Based on the information above and the data provided in the appendix, perform the
following analysis:
Tot Fer Rt = + 1 Inf Mort Rate + % Urban + %F in LF + Gini Index
+TV Sets/K + GNP/PPP + Life Exp +i
2. Write out the sample regression equation and the appropriate t-statistics below the
coefficients in the equation.
7. Is there any discrepancy in the estimated coefficients of the two models? What could
be the reason for the difference?
An Introduction to Regression Analysis Page: 42
Lisa is a student at the University of Oklahoma who is planning to open her own business
in Norman after graduation. She is currently living in an apartment but she intends to buy
a house as soon as she graduates. Her friends have suggested her to look for a house on
her own without going to a real estate agent, to avoid the hefty commission. The decision
was difficult since it is hard to quantify a variety of characteristics of a house that cause
one home to be more popular than another (i.e., the quality and design of the wallpaper in
the upstairs bathroom, the overall desirability of the neighborhood and others).
With help from friends, Lisa collected data to get an initial feel for the type of houses
around Norman. The data set contains information on the selling price, number of
bedrooms and bathroom, number of square feet for 84 houses sold in January 1999 and
other quantitative and qualitative variables. Three of the variables of interest are
categorical. The variable Half Baths takes the value of 1 if the house has at least one half
bath and 0 otherwise. The variable depicting the age of houses is classified into 7
categories and is incorporated through the use of 6 dummy variables. Zone represents the
geographical area of homes. We included all homes in 5 zones and incorporated them in
the model using 4 dummy variables. The description of all variables in the data set
follows.
6
The data for this case study were collected by Lisa Garrett who got help from Leslie Robertson, a Dillard
Real Estate Associate in Norman.
7
Moore is the neighbor city which is only 10 miles to the north of Norman. Moore and Norman are two of
the cities which are located in Cleveland County, Oklahoma.
An Introduction to Regression Analysis Page: 43
Taking the data set, presented in the appendix, into account perform the following
analysis:
2. Write out the sample regression equation and appropriate standard error below the
coefficient in the equation.
4. Perform a hypothesis test that all the independent variables taken together do not
affect prices.
6. Test the significance of each variable, qualitative or quantitative. State the null and
alternative hypothesis, the decision rule, and for each case separately your decision
and conclusion. Based on the results of this hypothesis testing, which variables are
insignificant?
7. Drop those insignificant variables, estimate the revised model and write out the new
equation.
8. Perform the test to verify if the variables you dropped were significant as a group (F-
test on subset). Which model is best to be used to interpret the estimates? Explain.
Norman, Oklahoma is the city where the University of Oklahoma is located. The
University is a national leader in meteorology and energy-related disciplines. It is a
doctoral degree-granting research university serving the educational, cultural and
economic needs of the state, region and nation. Created by the Oklahoma Territorial
Legislature in 1890, the university has 18 colleges offering 134 bachelor's degrees, 82
master's degrees, 51 doctoral degrees, four graduate certificates, and one professional
degree. OU enrolls more than 25,000 students. As a result, Norman is highly populated
with college students who often choose to live in apartment complexes located near the
campus to minimize their daily commuting.
8
The data for this case was collected by Mandy Miller in Spring 1999.
An Introduction to Regression Analysis Page: 44
With this information in mind and the available data perform the following analysis:
2. Interpret the coefficient of x5, x10, x11, x12, x13 in the context of this problem.
3. The plot of rental price (y) against the age of houses (x 12) suggests that the
relationship between these variables is nonlinear. Include a quadratic term to capture
this effect. Call this new variable x14. Rerun the regression and write down the
estimated regression equation.
4. From this last regression it can be verified that the only statistically significant
variables at the 5% significance level are x1 x2 , x5 , x6,x9, x10,x12 ,x13 and x14. Drop the
insignificant variables and re-estimate the following model:
6. What are the consequences of omitting important explanatory variables if they are
correlated with existing ones in the model?
Advertising has many purposes. An advertisement may inform consumers that a firm has
a new product or the lowest price, or it may help to differentiate the firm’s product from
that of its rivals. A firm uses advertisements to inform consumers of its product’s
strengths.
Advertising has been with us for a long time, although its forms have changed along with
technology. In ancient years, street criers announced to all those who would hear the
imminent sales of slaves, cattle, and imports. Later, when most populace was still
illiterate, merchants displayed signs with symbols calling attention to their shops, such as
a loaf of bread for a baker or a horseshoe for a cobbler. Benjamin Franklin pioneered the
use of print advertising in ten United States in the 1700’s. Today, much advertising
employs electronic media such as radio and television.
Advertising is a fascinating topic because it is an instrument that affects sales and entails
a massive expenditure of funds. Economists believe that variations in aggregate
advertising plays an important role on macroeconomic aggregate demand. In the present
study you are asked to examine the relationships between sales (Y) and advertising
expenditure (X), both in thousand of dollars, of Lydia E. Pinkham, using the twenty-five
annual observations that are presented in the appendix. In particular, perform the
following analysis:
4. What would you say about the coefficient of determination in this regression?
5. What is the short run effect on sales of a change in advertising expenditure in the
current period?
9
Data from Paul Newbold, “Statistics for Business & Economics” fourth edition, University of Illinois,
Urbana-Champaign, Prentice Hall, New Jersey, 1955, page 564 (Additional Topics in Regression Analysis)
An Introduction to Regression Analysis Page: 46
6. What is the long run effect on sales of a change in advertising expenditure in the
current period?
9. Interpret the coefficient 1. Can you explain the main difference between equation
(1) and (3)?
Lake County, Colorado is located about 100 miles west of Denver in the Colorado
Rockies. The county is home to the highest airport in the continental USA; much of the
county lies above 9,000 feet. The economy is largely dominated by mining (with its
boom and bust cycles) and tourism.
As in many other rural areas, the school lunch program is an important component of
public policy. For many poor children, the lunch they get in school provides most of their
daily nutrition. Over the last few years, the average daily number of lunches served has
generally declined. Since the average per capita income in the county has also declined,
the director of the program has been eager to understand the reasons for the decline. If the
families were getting poorer and simultaneously fewer lunches were being served, it
would raise questions about how well the program is serving the public need.
In the appendix you will find relevant data for eleven years to analyze this issue. The
variables include: YEAR, the calendar year; POP, Lake County population; UNEMPL,
percentage of unemployment in the state of Colorado; LKUNEMP, percentage of
unemployment in Lake County; LUNCH, average daily lunches served; INCOME,
10
Peter G. Bryant and Marlene A. Smith, “Practical Data Analysis; Case Studies in Business Statistics”,
University of Colorado at Denver, Irwin, Chicago, 1989.
An Introduction to Regression Analysis Page: 47
average per capita income in Lake County; and ENROLL, enrollment in Lake County
schools.
With this information in mind and the available data perform the following analysis:
b) Does average income per capita affect the average number of lunches served in
Lake County? Perform a test at a 5% significance level.
where
y = Average daily lunches served (LUNCH);
x1 = Average per capita income in Lake County (INCOME);
x2 = Percentage of unemployment in Lake County(LKUNEMP);
x3 = Lake County population (POP) and;
x4 = Enrollment in Lake County schools (ENROLL).
a) Write out the sample regression equation that expresses y as a function of x 1, x2, x3
and x4. Be sure to write the appropriate standard error below the coefficient in the
equation.
c) Test the significance of each variable in this model. Perform the tests at a 5%
significance level. State the null and alternative hypothesis, the decision rule, and for
each case separately your decision and conclusion. (Hint: The p-value test is the
easiest to perform.)
g) Do you think there is evidence that the program is not serving well the public
needs? Should the director of the program worry about that?
An Introduction to Regression Analysis Page: 48
The purpose of this study is to explain variations in total employment during the period
from 1947 to 1962. The Longley data span the years of the Korean conflict, which ended
in 1953. We employ information on the gross national product (GNP) and the GNP
deflator, the size of the armed forces and the variable YEAR, which is included to
capture any potential time trend. With this information in mind and the data provided in
the appendix perform the following analysis:
2. Write out the sample regression equation with the appropriate standard error below
each coefficient in the equation.
3. Although the observations of the last year in the data set do not appear to be unusual,
drop them and re-estimate the model.
Academic dishonesty is a basis for disciplinary action and includes but is not limited to
activities such as cheating and plagiarism (presenting as one's own the intellectual or
creative accomplishments of another without giving credit to the source or sources).
The faculty member in whose course an act of academic dishonesty occurs has the option
of failing the student for the academic hours in question. The faculty member may
consent to refer the case to other academic personnel for further action. Most colleges
have provisions for more severe penalties including expulsion.
Despite all this, cheating is more widespread at the nation's colleges and universities than
it was years ago because it no longer seems to carry the stigma it used to. “Less social
disapproval and increased competition for spots in graduate schools have made students
more willing to do whatever it takes to get the grades” said Professor Donald McCabe, a
researcher at Rutgers University who has done extensive research on student cheating.
He also remarked that “if students feel disadvantaged because others are cheating and
seeming to get away with it, they'll say: I'm not stupid enough to blow my chances by not
doing the same.”
11
Longley, J “An Appraisal of Least Squares Programs from the Point of the User”. Journal of American
Statistical Association, 62, 1967, pp. 819-841.
12
This survey was conducted by Luke Albright and Jeff Cotner.
An Introduction to Regression Analysis Page: 49
Times Cheated: The number of times students cheated in the past semester.
Hrs Studied: The number of hour’s a student studied per week.
Classes Attended: The number of class periods attended throughout a month.
Hrs Worked: The number of hours one works per week.
$ on Alcohol: The average amount of money spent on alcohol in a week.
Hrs TV: The average number of hours one spends watching television a week.
Witness Cheating: The average number of times a student witnessed others around
cheating in the past semester.
Class Status (dummy variables):
Freshman: =1 if the student was a freshman and 0 otherwise.
Sophomore: = 1 if the student was a sophomore and 0 otherwise.
Junior = 1 if the student was junior and 0 otherwise.
All data was obtained through a survey given to 75 different students on the University of
Oklahoma campus.
Times Cheated = a + b1* Hrs Studied + b2* Classes Attended + b3* Hrs Worked +
b4 * $ on Alcohol + b5* Hrs TV + b6* Witness Cheating + b7* Freshman +
b8* Sophomore + b9* Junior.
Create a dummy variable that takes the value of 1 if the student was a senior and zero
otherwise. Is the behavior of seniors different than that of the rest of the students? Is their
behavior different depending on the time they spend studying?
13
These data are real, though the name of the restaurant is anonymous. The data were provided by Thomas
J. Kientz, President, Colorado National Bank, Aurora, Colorado, and Sean Schneider.
An Introduction to Regression Analysis Page: 50
Foodservers’ tips in restaurants may be influenced by many factors, including the nature
of the restaurant, size of the party, table location in the restaurant, and so forth. To make
appropriate assignments for the foodservers, restaurant managers need to know what
these factors are. They must avoid either the substance of appearance of unfair treatment
of the foodservers, for whom the tips are a major component of pay.
In one restaurant, a foodserver recorded data on all customers he had served during an
interval of two and a half months in early 1990. This set of data is available in the
appendix. The restaurant, located in a suburban shopping mall, was one of a national
chain and served a varied menu. Pursuant to local law, the restaurant offered seating in a
non-smoking section to patrons who requested it. The data was recorded on those days
and during those times when the foodserver was routinely assigned to work.
Definitions:
TOTBILL : Total bill, including tax, in dollars
TIP : Tip in dollars
SIZE : Size of party
SEX : Sex of persons paying bill (0 = male, 1 = female)
SMOKER : Smoker in party? (0 = no, 1 = yes)
DAY : 3 = Thursday 5= Saturday
4 = Friday 6 = Sunday
TIME : (0 = day, 1 = night)
With this information in mind and the available data perform the following analysis:
1. Create three dummy variables to incorporate the effect of the qualitative variable
“DAY”. How would you define these variables?
where
y = Tip in dollars (TIP);
x1 = Total bill, in dollars (TOTBILL);
x2 = Size of party (SIZE);
x3 = Sex of persons paying bill (SEX);
x4 = Smoking habits (SMOKER);
x5, x6, x7 = Dummy variables related to the days of the week;
x8 = Time of the day (TIME).
Write out the sample regression equation that expresses y as a function of x 1, x2, x3, x4,
x5, x6, x7, x8. Be sure to write the appropriate standard error below the coefficient in the
equation.
An Introduction to Regression Analysis Page: 51
5. In the above regression it can be verified that the only statistically significant variable
at the 5% significance level is the variable TOTBILL. Drop the insignificant variables
and re-estimate the model:
(Save the residuals and the fits. You will need them later on.)
Write out the sample regression equation that expresses y as a function of x1. Be sure
to write the appropriate standard error below the coefficient in the equation.
6. Was the decision to drop the additional variables the right one? Perform a test on the
subset of regression parameters at a 5% significance level.
7. Based on the reduced model perform at the 5% significance level, a White’s test for
the existence of heteroskedasticity in the model. In particular, assume that the
variance of the error terms is a function of the expected value of the dependent
variable (i.e., V(i) = f(yi) ).
8. Based on the analysis performed so far are the estimators obtained from this model
BLUE? Explain.
9. If V(i) = 2yi2, describe how you would correct for the problem of
heteroskedasticity. Estimate the corrected model and report the estimates of the
parameters.
10. Summarize the results of your regression analysis. Are there any patterns of which
you think the foodserver or the restaurant manager should be aware of?
The overall violent crime rate dropped 7.3 percent to its lowest level since 1985. For the
fifth straight year, violent crimes and all the far more numerous property crimes declined.
Crime reached all-time highs in 1991. Since then the crime rate has decreased at an
14
Thid data set was collected by Lisa Garrett.
An Introduction to Regression Analysis Page: 52
increasing rate each year. The FBI reported recently that both murder and robbery rates
reached lows not seen in three decades.
The major decline in crime has occurred in rural areas and in cities with population of
25,000 to 100,000. In metro areas, the crime rate is reducing at a slower pace. In large
cities, as the population increases, the crime rate increases. However, the increase in the
percent of crimes is much lower than before. Officials credit the aging of the population
in large metro areas as one of the factors.
The consensus on the falling crime rate is that there is no singular event, policy
implementation, or social action that can account for the decrease during the last few
years. Individuals and organizations assessing the cause and implications of this decline
are arriving at a unified theory attributing collective efforts and change as the reason.
In an effort to provide some input on this issue, and perhaps single out some factors that
may be contributing towards a reduction in the crime rate, we obtained the following
sample15 from 1996:
Definition of variables:
With this information in mind and the available data, perform the following analysis:
Plot the variable Total Crime (y) against % In Metro (x6), and Unemployment (x2).
Comment.
Write out the sample regression equation and appropriate standard error below the
coefficient in the equation.
Interpret the coefficient of x2, x3, and x6 in the context of this problem.
15
Data collected from: US Census Bureau – Statistical Abstract of the US
An Introduction to Regression Analysis Page: 53
In the above regression it can be verified that at the 5% significance level the variable
“Rate of Police” is statistically insignificant. Drop this variable and re-estimate the
model. (Save the residuals and fits. You will need them later on.)
Write out the new regression equation. Be sure to write the appropriate standard error
below the coefficient in the equation.
Based on the reduced model perform at the 5% significance level, a White’s test for the
existence of hereroskedasticity in the model. In particular, assume that the variance of
the error terms is a function of the expected value of the dependent variable.
The fact that almost 42,000 people were killed last year, innocent victims in violent
deaths and millions more were injured, simply going about their daily business in or near
automobiles raises major concerns across the nation.
Forty-one thousand, nine hundred and seven people were killed in automobile accidents
in 1996, up from the year before, and 3,511,000 were injured. A disaster of this
magnitude needs immediate and drastic action. We need efficient and affordable public
transportation nationwide, safe automobiles, and laws that give law enforcement agencies
the ability to protect and control drivers and passengers on roads and highways.
Moving into the new millennium, it is imperative for governments to better the traffic
conditions. By researching possible factors that affect the occurrence of accidents and
fatalities, we hope to be able provide some input on this issue. The data set included in
the appendix presents information on the total number of crashes and the determinant
factors of automobile accidents in 1996. The variables included are:
Globalization has become an important strategy in the business world. Prudent investors
are feeling less and less bound to invest in their home countries with the hope of realizing
16
This data set was collected by Natalie Nicole Johnson and Christopher M. Nedbalek.
17
The data were collected by Hui-Ming Ho and Khairul Anwar Moho Dewan.
An Introduction to Regression Analysis Page: 54
an above average profit. Governments accept foreign investments to help boost their own
economies instead of waiting for investments from their respective citizens. With
investors making decisions on the amount of funds to invest and governments deciding
how much foreign funding is needed to attract business, it is essential for both sides to
identify the factors that really affect the amount of foreign direct investments (FDI) a
country makes. Since U.S. has actively engaged in FDI for many years, its FDI abroad
could be a good starting point for all interested investors and governments. The following
is a list of variables that can be used to study this issue. The data set containing the
relevant information is in the appendix.
18
Sources from World Development Report 1999/2000
19
Sources form CIA World Fact Book 1999
20
Sources: Euromoney
4
Source: http://www.oanda.com/converter/cc_table
5
Source: CIA World Fact Book 1999
6
Source: www.wto.org
7
Source: International Financial Statistic Sept. 1999
An Introduction to Regression Analysis Page: 55
4. Perform the test of the significance of each variable, qualitative or quantitative. State
the null and alternative hypothesis, the decision rule, and for each case separately
your decision and conclusion. Based on the results of this hypothesis testing, drop the
insignificant variables.
6. Write out the revised model with the standard errors placed underneath the estimated
coefficients in parenthesis.
7. Redo steps 2 and 4 if your final model is different than the original.
9. Based on the analysis performed so far are the estimators obtained from this model
BLUE? Explain. Are your estimates reliable? If not what can you do to improve upon
your current estimates?
The design and conduct of auctions has occupied the attention of many people over
thousands of years. One of the earliest reports of an auction dates back to the fifth century
B.C. During the recent years, in the United States, auctions account for an enormous
volume of economic activity. Every week the U.S. Treasury sells a large amount of bills
and notes. The Department of Interior sells mineral rights on federally-owned properties
at auction. Billions of dollars worth of spectrum licenses have being sold by the U.S.
An Introduction to Regression Analysis Page: 56
government since 1994. Auctions are popular means of selling items not only by the
public but also by the private sector. Sellers auction antiques and artwork, flowers and
livestock, publishing rights and timber rights, property, stamps, wine, books and other
items. Recently, it has been very popular to auction off items over the internet. E-Bay is
one of the most popular companies conducting such auctions and the value of its stock
has tripled in the past year.
Finally we included categorical variables that describe the type of project (Bridgework,
resurfacing etc.). Based on the information included in the appendix, perform the
following analysis:
Run a regression with the winning bid as the dependent variable and the number of
bidders, type of project and days for completion as the independent variables.
Run a regression with the variance of bids as the dependent variable and the estimate,
the number of bidders and the days for completion as the independent variables.
Run a regression with the difference between the lowest and the second lowest bids as
the dependent variable and the estimate, number of bidders and days for completion
as the independent variables.
An Introduction to Regression Analysis Page: 57
R&D is a key element of a strong technology based economy. It is one of the major
driving forces for economic development in the United States. Through innovation and
technological development, the pharmaceutical, telecommunication, information
technology and aerospace industries have broken new ground. For almost two decades
the U.S. has ranked consistently higher in R&D expenditures than other countries (in
large part due to enormous defense budgets).
Definitions:
yi = R&D expenditures in 1989,
x1i = total sales of a firm in the same period,
x2i = total profit of a firm in 1989.
With this information in mind and the available data perform the following analysis:
Do you think a decision to drop the insignificant variables is the right one in this model?
Why? Why not?
Plot R&D expenditures as a function of sales and profit. What do you observe? What
problem could you be facing in your data?
On the original model, perform at the 5% significance level, a formal test for the
existence of heteroskedasticity. In particular, assume that the variance of the error terms
is a function of the expected value of the dependent variable (i.e., V(i) = f(yi) ).
Based on the analysis performed so far are the estimators obtained from this model
BLUE? Explain.
If we assume that V(i) = 2yi2, describe how you would correct for the problem of
heteroskedasticity.
An Introduction to Regression Analysis Page: 58
Finally, it has been suggested in class that one way to get rid of the problem of
heteroscedasticity in your data is to transform the variables in the original model in a
logarithmic form. Make this transformation and re-estimate the model. Write out the
sample regression equation that expresses lny as a function of lnx1 and lnx2. Do you still
have the same problem? (Plot lny against lnx1 and lnx2). Is it as severe as in the original
model? Comment.
The idea that the aggregate economy does not climb steady trend but experiences
occasional booms of activity and recessions is very old. Virtually every economist
recognized the existence of strong fluctuations in the general level of economic activity.
But the idea that it exhibits a regular cyclical pattern, that these fluctuation were recurrent
in a precise periodic way, was only put forward late in the last century by William
Stanley Jevons and Clement Juglar.
W.S. Jevons (1884) related economic cycles to sunspots. He argued that sunspots
affected tangible things such as harvest and/or intangible such as peoples’ mood. These in
turn were creating the fluctuations in economic activity. The cyclical nature of sunspots
could be employed to explain the existence of economic cycles.
To analyze this theory, we will use data on U.S. industrial production data (IP) and
number of sunspots (SPOT) for 120 months (from 1971.1 to 1975.12).
Based on this information perform the following analysis:
2. How large is the t-statistics? Does this provide support for the sunspots theory?
3. Plot the residuals and produce a scatter diagram of IP vs. SPOT. Does it seem to you
that there might be a serial correlation problem in this case? Explain!!
4. Test for the existence of first-order serial correlation using Durbin-Watson statistics.
5. How would you correct for the problem of autocorrelation in this model?
6. One explanation for the high t-statistics in (1) and the low DW statistics is the
omission of variables. Let’s proxy factor input (Materials, Labor, and Capital) using a
21
We thank Sangeeta Bishop for providing this data set to us.
An Introduction to Regression Analysis Page: 59
time trend and ask whether including this trend in the regression changes the results.
Estimate the following equation:
7. Has this eliminate the problem? Is the coefficient on spot still significant? Is serial
correlation still a problem?
8. Perform the hypothesis testing where null hypothesis is that sunspots do not affect
industrial production.
Dell Computer Corporation, headquartered in Round Rock, Texas, near Austin, is the
world's leading direct computer systems company. Company revenue for the last four
quarters totaled $23.6 billion. Dell is the No. 2 and fastest growing among all major
computer systems companies worldwide, with more than 33,200 employees around the
globe. The company ranks No. 1 in the United States, where it is a leading supplier of
PCs to business customers, government agencies, educational institutions and consumers.
The company was founded in 1984 by Michael Dell, now the computer industry's
longest-tenured chief executive officer, on a simple concept: that by selling personal
computer systems directly to customers, Dell could best understand their needs, and
provide the most effective computing solutions to meet those needs. Today, Dell is
enhancing and broadening the fundamental competitive advantages of the direct model
by increasingly applying the efficiencies of the Internet to its entire business
In this study, we will use the stock prices of DELL from January 16, 2001 to April 16,
2001. Based on this information perform the following analysis:
2. Arrange the observations in ascending order to find the median and the number of
runs R, in the data set.
3. Use the large-sample variant of the Runs test to test this series for randomness against
the alternative of non-randomness.
4. Compute a simple centered 5-point, 13-point, and 25-point moving average series for
the Dell stock price.
22
The data for this case came from Yahoo Finance. The data set was collected by Kimberly Maggi.
An Introduction to Regression Analysis Page: 60
10.16. Exxon23.
Exxon corporation, one of the world’s first multinational companies, traces its roots to
John D. Rockefeller. Exxon is engaged in the exploration, production, manufacture,
transportation and sale of crude oil, natural gas, and petroleum products. Exxon has a
business presence on every continent except Antarctica.
We will use Exxon’s monthly stock price over the past five years (April, 1996 to March,
2001) to illustrate the seasonal component found in the oil and gas industry. Basically, we
assume seasonal activities such as the payment of dividends; earnings announcements
and placement of orders affect the stock price. We will incorporate dummy variables to
represent the quarters in which seasonal components are present.
1. Plot the original data on a time series graph with the stock price as the dependent
variable (Y).
2. Run a regression with the stock price as the dependent variable (Y t) and time (t) as
the independent variable.
In order to incorporate seasonal fluctuations in the price of stock introduce the following
three dummy variables:
Q2 =1 if in 2nd quarter (i.e., months 4-6)
= 0 otherwise
Q3 =1 if in 3rd quarter (i.e., months 7-9)
= 0 otherwise
Q4 =1 if in 4th quarter (i.e., months 10-12)
=0 otherwise
3. Seasonal differences in the price of stock may be due to placement of orders in the
second quarter and earnings announcements as well as dividends payout in the
fourth quarter. Test the significance of seasonal fluctuations. State the null and
alternative hypothesis, the decision rule, and for each case separately state your
decision and conclusion. Based on the results of this hypothesis testing what is your
conclusion about the effect of these factors upon the price of stock?
Starbucks Coffee brought new meaning to the phrase “Let’s go for coffee.” Starbucks is
the leading retailer, roaster and brand of specialty coffee in the world. They purchase,
roast and sell high quality whole bean, rich-brewed coffees, Italian-style beverages and a
variety of pastries. Starbucks has a presence in North America, the United Kingdom, the
Pacific Rim and the Middle East.
23
The data for this case came from Yahoo Finance. The data set was originally collected by Kimberly
Maggi.
24
The data for this case came from Yahoo Finance. The data set was collected by Kimberly Maggi.
An Introduction to Regression Analysis Page: 61
We will use the Starbucks’ monthly stock price for the past five years (1996-2001) to
illustrate the seasonal index method. Basically, we assume that for any given month, in
each year, the effect of seasonality is to raise or lower the observations by a constant
proportionate amount, compared with what they would have been in the absence of
seasonal influences.
With this information in mind and the available data perform the following analysis:
1. Plot the original data using the time series plot with the stock price as the independent
variable (Y).
2. Compute a simple centered 5-point moving average for the stock price and store the
moving averages as Xt*.
6. Using the original data, estimate the autoregressive models of orders 1 through 3:
X1 = + 1 X t – 1 + a t
X2 = + 1 X t – 1 + 2 X t –2 + a t
X3 = + 1 X t –1 + 2 X t – 2 + 3 X t – 3 + a t
Where: , 1, 2, 3 are autoregressive parameters and a t is a random variable
that has mean zero and constant variance for all t.
7. For each model test the hypothesis that the last autoregressive parameter is
insignificant (Ho: p = 0; H1: p 0) starting from the third order autoregressive
model.
An Introduction to Regression Analysis Page: 62
CHAPTER 11
HYPOTHESIS TESTING
When you are testing your hypothesis, you have to be explicit with your procedures.
Your null and alternative hypotheses, decision rule, decision and conclusion have to be
clearly stated. In this chapter, we put together, for your convenience, the tests that are
most frequently performed in this course.
There are two ways you can test the significance of an independent variable: one is to use
the t-test (as outline below) or you can use the p-value. Remember that p-value is the
smallest significance level which allows you to reject your null hypothesis. MINITAB
automatically computes the p-value of a two tail test based on your data. One advantage
of using the p-value for your hypothesis testing is that you can make your decision
without using a table. The decision rule for a two-tail test is very simple when you use the
p-value. The two ways to test the significance of an independent variable in the model are
described below:
To test H0: i = 0
H1: i 0
MINITAB automatically computes the F-ratio and p-value associated with the hypothesis
testing on the significance of a group of independent variables. This numbers are reported
in the ANOVA table, so it is not necessary to calculate the F-ratio using the formula you
learned in class.
To test H0: 1 = 2 = 3 = … = k = 0
H1: At least one of the independent variables 0
D.R.: Reject H0 if F = (SSR/k) / (SSE/(n-k-1)) > F k,,n-k-1,
An Introduction to Regression Analysis Page: 63
To test H0: 1 = 2 = 3 = … = k = 0
H1: At least one of the independent variables 0
When you decide to drop a subset of independent variables from your model you have to
test and see if those variables were significant as a group. If it turns out that they had a
significant impact on the dependent variable as a group then you have to re-evaluate your
decision to drop them from the model.
Once again, there are two ways to perform the test. You use either the SSE or the R 2. The
values SSE* and R*2 are the error sum of squares and the coefficient of determination
respectively of the revised regression.
SSE * SSE n k 1
F k
SSE 1
You have a problem of multicollinearity when some of your independent variables are
related to one another. To test for the presence of multicollinearity we have to test how
significant the relationship is between pairs of independent variables. If the correlation
coefficient between any pair of independent variables is significantly different from zero,
then your model will have multicollinearity problem.
There are two ways you can test for the presence of multicollinearity: (i) t-ratio or (ii) p-
value.
(i) t-ratio:
(ii) p-value:
To test H0: ij = 0
H1: ij 0
You need to run an auxiliary regression to obtain the appropriate R 2 to test for the
presence of heteroscedasticity.
To test for the presence of positive autocorrelation, we use the residuals from the
regression to estimate the Durbin-Watson statistic.
One sided:
Two sided:
An Introduction to Regression Analysis Page: 66
R ( n / 2) 1
z / 2
n 2 2n
4( n 1)
If you construct a lot of confidence intervals in the same fashion, using different samples
with the same number of observations, 100(1-) % of the intervals will contain the true
population parameter with probability one.
An Introduction to Regression Analysis Page: 67
CHAPTER 12
USEFUL FORMULAS
n n
cov( x, y )
(xi 1
i x )( yi y ) x y
i 1
i i nx y
r
SxSy n n n n
(x
i 1
i x ) . ( y i y )
2
i 1
2
( xi2 nx 2 )( yi2 ny 2 )
i 1 i 1
r
t
1 r2 ti
bi i
S x2
(x i x)2
x 2
i
nx 2
S bi n 1 n 1
n2
n n
(x
i 1
i x) 2
x
i 1
2
i
nx 2
a y bx
SSE e y a y b x y
i
2 2
2 i i i
SSE S e2 S e2
S
2
S
2
( xi x ) 2 x 2i nx 2
e b
n k 1
b t n 2 , / 2 .S b 2 R2 A
yˆ n 1 a bx n 1
An Introduction to Regression Analysis Page: 68
1 ( xn1 x ) 2 2 1 (x x)2 2
yˆ n1 t n2, / 2 1 _ 2 2 S e yˆ n 1 t n 2, / 2 _ n 12 S
2 e
n xi nx
n xi nx
d
(e e t t 1 )2
2(1 r ) r 1
d
hr
n
e t
2
2 1 ns c2
q1i P0i
Quantity indices: 100
q P
0i 0i
Moving Averages:
x
j m
t j
where t m 1,..., n m
x
*
t
2m 1
S/2
x t j
j ( S / 2 ) 1 where t
S S
, 1,...., n
S
x *
t 0.5 2 2 2
S
xt*0.5 xt*0.5 S S
xt* where t 1,...., n
2 2 2
xt 1 xt 1 2 xt 2 ... p xt p at
ˆ p
Z ; xˆ n h ˆ ˆ1 xˆ n h 1 ... ˆ p x n h p (h 1,2,3,...)
sp
An Introduction to Regression Analysis Page: 70
CHAPTER 13
General guidelines
In order to perform regression analysis in Microsoft Excel, you have to use the “Data
Analysis” tool available in Microsoft Office 97 or Office 2000. The “Data Analysis” tool
is not the default in Microsoft Excel system, so you need to activate it first. To check
whether you already have this tool in your Excel program or not, you can go to the
“Tools” main menu option and see whether you can find the “Data Analysis” option. If
you can find it go directly to step 2, otherwise, go to step 1 to activate the “Data
Analysis” tool in Excel.
Step 1.
Figure 1.
Go back to the “tools” menu and check once more whether the “Data Analysis” option
has been installed.
An Introduction to Regression Analysis Page: 71
Step 2
Select “Tools” and then “Data Analysis”. Select “Regression” from the list of “Analysis
Tools” and the click OK. You will get the Regression dialog box as it appears in Figure
2. In order to perform regression analysis you have to mark some important edit and
check boxes. Here are some of the things you need to check:
Input Y Range. You need to enter the range for the dependent variable in this edit box.
Another way to enter the data on the dependent variable is highlighting the column
where you have the dependent variable data.
Input X Range. You need to enter the range for the independent variable in this edit box.
Again, you can also enter the independent variable by highlighting the column where
you have the independent variable data. When you do multiple regression analysis
you need to highlight the columns that correspond to all you independent variables at
once. Highlight all your data including the name of the variable and check on the box
“label” before you do further analysis.
New Worksheet Ply. You need to select the check box and enter some name in the edit
box to have the Regression analysis output in a different worksheet under a different
name.
Residuals and Residuals Plot. If you need to test for the problems of heteroscedasticity
and autocorrelation you need to check these two boxes. Excel will automatically
produce the plot of residuals and give the residuals and estimated values of the
dependent variable in the regression.
Line Fit Plots. Select this option if you wish to obtain the scatter diagram with a fitted
regression line in a simple regression analysis.
Figure 2.
An Introduction to Regression Analysis Page: 72
The easiest way to detect the problem of autocorrelation in a set of data is to plot the
residuals through time. Go back to step 2 to recall how to produce the plot of the
residuals. The formal way to test the existence of autocorrelation is using the Durbin-
Watson statistic. The Durbin-Watson statistic (D) is defined as follows:
n
(e t et 1 ) 2
D t 2
n
e
t 1
2
t
Unfortunately, Excel does not produce this statistic automatically. To generate the
Durbin-Watson statistic you need to calculate its value from the column of residuals.
An Introduction to Regression Analysis Page: 73
APPENDIX
Case Study 1.
Worldwide Fertility Rate.
Case Study 2.
Norman Housing Prices.
No Asking Price Price Beds Living Full Half AGE Age 0 Age 3 Age 10
Areas Baths Baths
1 34950 34950 3 1 2 0 30 0 0 0
2 34950 34950 3 1 2 0 30 0 0 0
3 79900 72500 2 1 2 1 20 0 0 0
4 47500 41000 2 1 1 0 20 0 0 0
5 189900 185000 2 2 2 0 10 0 0 1
6 29000 29000 3 1 1 0 60 0 0 0
7 39500 37000 2 1 1 0 60 0 0 0
8 139900 130000 5 3 2 0 61 0 0 0
9 34500 34500 2 1 1 0 60 0 0 0
10 67000 68000 2 1 2 0 10 0 0 1
11 71000 71000 3 1 2 0 0 1 0 0
12 82500 79000 3 1 2 1 3 0 1 0
13 89900 89900 3 1 2 0 10 0 0 1
14 98500 93500 4 1 2 0 10 0 0 1
15 103900 101000 3 1 2 0 20 0 0 0
16 116000 114000 4 1 2 0 3 0 1 0
17 120000 120000 3 1 2 0 0 1 0 0
18 89900 89900 3 1 2 0 60 0 0 0
19 92500 91500 3 2 2 0 30 0 0 0
20 174500 169900 3 2 2 1 60 0 0 0
21 226000 219000 4 3 4 0 60 0 0 0
22 34900 34900 3 1 2 0 30 0 0 0
23 44500 44500 3 1 2 0 20 0 0 0
24 64950 58900 3 1 2 0 30 0 0 0
25 78900 78900 4 1 2 0 0 1 0 0
26 79900 79200 3 1 2 0 10 0 0 1
27 91900 87500 3 1 2 0 3 0 1 0
28 169000 165000 4 2 2 1 60 0 0 0
29 52000 48000 3 1 1 0 30 0 0 0
30 52500 50000 3 1 1 0 30 0 0 0
31 60000 60000 2 2 1 1 60 0 0 0
32 67500 67500 3 1 2 0 20 0 0 0
33 81500 79500 3 2 1 1 60 0 0 0
34 84500 81000 3 2 1 0 60 0 0 0
35 83000 83000 3 1 2 0 10 0 0 1
36 94900 94000 3 1 2 0 10 0 0 1
37 98585 97722 3 1 2 0 0 1 0 0
38 109000 105000 5 2 2 1 60 0 0 0
39 107500 107500 3 1 2 0 20 0 0 0
40 119900 117000 2 1 2 0 3 0 1 0
41 124900 124900 3 2 2 0 0 1 0 0
42 125000 125000 3 2 2 0 60 0 0 0
43 139900 135000 4 1 2 0 60 0 0 0
44 255000 248000 3 2 2 1 20 0 0 0
45 53900 49000 3 2 1 0 60 0 0 0
An Introduction to Regression Analysis Page: 77
46 67000 65737 4 1 2 0 30 0 0 0
47 73900 73500 3 1 2 0 10 0 0 1
48 47950 45000 3 1 2 0 20 0 0 0
49 48500 47000 3 1 1 1 20 0 0 0
50 99000 95000 3 2 2 0 20 0 0 0
51 88500 88000 3 1 2 0 20 0 0 0
52 88500 88500 3 1 2 0 10 0 0 1
53 93500 93500 3 1 2 0 20 0 0 0
54 109500 106000 3 2 2 0 10 0 0 1
55 109900 107000 3 1 2 0 3 0 1 0
56 115000 111500 3 1 2 0 20 0 0 0
57 129900 126000 3 2 2 0 30 0 0 0
58 147900 143500 3 2 2 0 10 0 0 1
59 149900 149900 3 2 2 0 10 0 0 1
60 162900 161250 4 2 2 1 3 0 1 0
61 162000 162000 4 1 2 0 0 1 0 0
62 226500 217500 4 2 4 0 10 0 0 1
63 295000 281000 4 3 4 1 10 0 0 1
64 95997 95997 3 1 2 0 0 1 0 0
65 119637 119637 4 1 2 0 0 1 0 0
66 59400 52000 3 1 1 1 60 0 0 0
67 59900 60000 2 2 0 1 60 0 0 0
68 80400 76400 3 1 2 0 20 0 0 0
69 119900 117500 3 2 2 0 20 0 0 0
70 122900 118500 3 1 2 0 10 0 0 1
71 118500 119500 3 1 2 0 0 1 0 0
72 149900 144900 3 2 2 0 20 0 0 0
73 189900 188900 4 1 3 0 0 1 0 0
74 204900 207500 4 2 3 0 0 1 0 0
75 209900 209900 4 2 3 0 0 1 0 0
76 212000 210000 3 2 2 1 10 0 0 1
77 225000 223000 4 2 3 0 0 1 0 0
78 89900 87500 3 1 1 1 60 0 0 0
79 89900 89900 4 2 2 1 61 0 0 0
80 149500 147000 4 3 3 0 20 0 0 0
81 85000 75000 3 2 2 0 30 0 0 0
82 385000 375000 4 3 3 1 20 0 0 0
83 86900 85500 3 2 2 0 60 0 0 0
84 122650 114000 4 2 2 1 20 0 0 0
Continues…
An Introduction to Regression Analysis Page: 78
No Age Age Age Garage ZONE ZONE Zone 1 Zone 2 Zone 3 Zone 4 SQ FT DAYS ON
20 30 60 (ORIG) MARKET
1 0 1 0 0 CSE 1 1 0 0 0 1150 156
2 0 1 0 0 CSE 1 1 0 0 0 1150 137
3 1 0 0 2 NWI 2 0 1 0 0 1540 293
4 1 0 0 1 CSE 1 1 0 0 0 950 244
5 0 0 0 2 SWI 2 0 1 0 0 2250 27
6 0 0 1 1 CCE 1 1 0 0 0 1020 29
7 0 0 1 1 CCE 1 1 0 0 0 820 107
8 0 0 0 2 CCE 1 1 0 0 0 2980 54
9 0 0 1 0 CNE 1 1 0 0 0 720 1
10 0 0 0 2 CNE 1 1 0 0 0 1250 35
11 0 0 0 2 CNE 1 1 0 0 0 1130 260
12 0 0 0 2 CNE 1 1 0 0 0 1570 111
13 0 0 0 2 CNE 1 1 0 0 0 1770 308
14 0 0 0 2 CNE 1 1 0 0 0 1700 76
15 1 0 0 2 CNE 1 1 0 0 0 1940 55
16 0 0 0 3 CNE 1 1 0 0 0 1810 144
17 0 0 0 2 CNE 1 1 0 0 0 530 5
18 0 0 1 2 CNW 2 0 1 0 0 1600 3
19 0 1 0 2 CNW 2 0 1 0 0 1790 214
20 0 0 1 2 CNW 2 0 1 0 0 2800 109
21 0 0 1 0 CNW 2 0 1 0 0 4200 93
22 0 1 0 0 CSE 1 1 0 0 0 1150 118
23 1 0 0 2 CSE 1 1 0 0 0 1250 28
24 0 1 0 2 CSE 1 1 0 0 0 1700 84
25 0 0 0 2 CSE 1 1 0 0 0 1400 1
26 0 0 0 2 CSE 1 1 0 0 0 1650 10
27 0 0 0 2 CSE 1 1 0 0 0 1710 78
28 0 0 1 2 CSE 1 1 0 0 0 2630 41
29 0 1 0 1 CSW 2 0 1 0 0 990 5
30 0 1 0 1 CSW 2 0 1 0 0 850 32
31 0 0 1 2 CSW 2 0 1 0 0 1350 389
32 1 0 0 2 CSW 2 0 1 0 0 1230 29
33 0 0 1 2 CSW 2 0 1 0 0 1550 55
34 0 0 1 0 CSW 2 0 1 0 0 1750 7
35 0 0 0 2 CSW 2 0 1 0 0 1500 108
36 0 0 0 2 CSW 2 0 1 0 0 1620 122
37 0 0 0 2 CSW 2 0 1 0 0 1550 235
38 0 0 1 0 CSW 2 0 1 0 0 2280 56
39 1 0 0 2 CSW 2 0 1 0 0 2170 117
40 0 0 0 2 CSW 2 0 1 0 0 1720 603
41 0 0 0 2 CSW 2 0 1 0 0 1960 162
42 0 0 1 2 CSW 2 0 1 0 0 2000 3
43 0 0 1 0 CSW 2 0 1 0 0 2320 240
44 1 0 0 3 CSW 2 0 1 0 0 2930 33
45 0 0 1 0 MOR 3 0 0 1 0 1150 5
46 0 1 0 2 MOR 3 0 0 1 0 1500 93
47 0 0 0 2 MOR 3 0 0 1 0 1450 25
48 1 0 0 1 NOB 5 0 0 0 0 1600 15
An Introduction to Regression Analysis Page: 79
Case Study 3.
Apartment Hunting.
Case Study 4.
Advertising and Sales.
An Introduction to Regression Analysis Page: 81
Y X
1103 339
1266 562
1473 745
1423 749
1767 862
2161 1034
2336 1054
2602 1164
2518 1102
2637 1145
2177 1012
1920 836
1910 941
1984 981
1787 974
1689 766
1866 920
1896 964
1684 811
1633 789
1657 802
1569 770
1390 639
1387 644
1289 564
Case Study 5.
School Lunch Program.
Case Study 7.
Academic Dishonesty.
Times Hrs Class Hrs Alcohol $ Hrs TV Times Freshman sophomore Junior
Cheated Studied Attended Worked Witnessed
0 12 12 0 0 6 1 1 0 0
4 2 9 9 35 12 3 1 0 0
2 6 12 6 16 8 6 1 0 0
0 8 12 5 8 9 2 1 0 0
0 4 11 12 5 5 0 1 0 0
0 8 12 0 30 10 0 1 0 0
5 4 10 12 48 15 8 1 0 0
2 4 10 6 30 8 6 1 0 0
3 7 12 5 25 16 4 1 0 0
1 8 12 0 12 7 1 1 0 0
4 2 8 24 16 13 5 1 0 0
1 8 11 5 25 2 2 1 0 0
0 6 12 0 10 4 1 1 0 0
1 4 8 0 60 9 0 1 0 0
4 4 10 16 100 20 6 1 0 0
6 2 9 25 50 15 8 0 1 0
0 7 12 4 15 6 0 0 1 0
0 10 12 0 0 4 2 0 1 0
8 0 7 25 35 25 8 0 1 0
2 5 11 9 72 14 3 0 1 0
10 0 5 30 48 24 10 0 1 0
2 4 12 16 55 6 3 0 1 0
2 9 10 8 4 8 2 0 1 0
0 8 11 0 12 4 0 0 1 0
1 6 11 9 30 12 2 0 1 0
1 6 12 3 35 2 1 0 1 0
0 8 12 0 28 1 4 0 1 0
1 4 10 4 10 10 2 0 1 0
An Introduction to Regression Analysis Page: 83
2 8 12 8 45 11 2 0 1 0
2 7 8 12 70 5 4 0 1 0
0 12 12 8 25 0 0 0 1 0
0 10 12 0 5 6 0 0 1 0
0 8 10 0 50 9 3 0 1 0
6 2 7 28 35 15 7 0 1 0
4 4 8 18 46 10 5 0 1 0
2 6 11 6 10 12 2 0 1 0
3 4 10 15 0 20 4 0 1 0
7 0 9 22 50 11 7 0 1 0
1 8 12 5 15 9 2 0 1 0
0 10 12 0 10 5 1 0 0 1
0 9 10 0 6 16 0 0 0 1
2 5 10 10 18 7 2 0 0 1
5 6 7 20 25 10 6 0 0 1
3 4 9 16 35 8 2 0 0 1
2 7 10 6 38 20 3 0 0 1
0 4 12 10 8 1 5 0 0 1
1 11 12 0 5 12 1 0 0 1
0 8 11 2 20 9 3 0 0 1
4 8 10 6 30 16 2 0 0 1
6 2 8 24 45 5 7 0 0 1
2 3 11 12 25 10 5 0 0 1
1 7 12 6 10 2 2 0 0 1
0 9 11 0 5 4 1 0 0 1
0 10 12 0 15 1 0 0 0 1
0 6 12 3 25 6 3 0 0 1
0 8 10 4 8 5 1 0 0 1
5 2 8 16 32 8 7 0 0 1
7 1 9 18 40 12 6 0 0 1
2 6 11 9 20 5 2 0 0 1
1 9 12 2 20 5 1 0 0 1
1 9 12 0 5 3 3 0 0 1
3 5 8 15 15 7 3 0 0 1
5 3 9 6 60 15 6 0 0 0
1 7 12 0 12 10 3 0 0 0
0 10 11 0 5 6 1 0 0 0
0 12 12 0 0 4 0 0 0 0
7 1 7 16 80 11 9 0 0 0
10 0 5 30 100 10 10 0 0 0
6 3 11 9 52 22 7 0 0 0
3 5 10 12 20 9 4 0 0 0
3 6 9 15 25 7 3 0 0 0
4 5 12 3 45 10 5 0 0 0
5 5 11 10 30 16 6 0 0 0
2 7 10 5 20 9 4 0 0 0
0 12 12 0 0 5 2 0 0 0
Case Study 8.
Restaurant Tips.
An Introduction to Regression Analysis Page: 84
18.04 3 0 0 6 1 2
12.54 2.5 0 0 6 1 2
10.29 2.6 1 0 6 1 2
34.81 5.2 1 0 6 1 4
9.94 1.56 0 0 6 1 2
25.56 4.34 0 0 6 1 4
19.49 3.51 0 0 6 1 2
38.01 3 0 1 5 1 4
26.41 1.5 1 0 5 1 2
11.24 1.76 0 1 5 1 2
48.27 6.73 0 0 5 1 4
20.29 3.21 0 1 5 1 2
13.81 2 0 1 5 1 2
11.02 1.98 0 1 5 1 2
18.29 3.76 0 1 5 1 4
17.59 2.64 0 0 5 1 3
20.08 3.15 0 0 5 1 3
16.45 2.47 1 0 5 1 2
3.07 1 1 1 5 1 1
20.23 2.01 0 0 5 1 2
15.01 2.09 0 1 5 1 2
12.02 1.97 0 0 5 1 2
17.07 3 1 0 5 1 3
26.86 3.14 1 1 5 1 2
25.28 5 1 1 5 1 2
14.73 2.2 1 0 5 1 2
10.51 1.25 0 0 5 1 2
17.92 3.08 0 1 5 1 2
27.2 4 0 0 3 0 4
22.76 3 0 0 3 0 2
17.29 2.71 0 0 3 0 2
19.44 3 0 1 3 0 2
16.66 3.4 0 0 3 0 2
10.07 1.83 1 0 3 0 1
32.68 5 0 1 3 0 2
15.98 2.03 0 0 3 0 2
34.83 5.17 1 0 3 0 4
13.03 2 0 0 3 0 2
18.28 4 0 0 3 0 2
24.71 5.85 0 0 3 0 2
21.16 3 0 0 3 0 2
28.97 3 0 1 4 1 2
22.49 3.5 0 0 4 1 2
5.75 1 1 1 4 1 2
16.32 4.3 1 1 4 1 2
22.75 3.25 1 0 4 1 2
40.17 4.73 0 1 4 1 4
27.28 4 0 1 4 1 2
12.03 1.5 0 1 4 1 2
21.01 3 0 1 4 1 2
12.46 1.5 0 0 4 1 2
An Introduction to Regression Analysis Page: 86
11.35 2.5 1 1 4 1 2
15.38 3 1 1 4 1 2
44.3 2.5 1 1 5 1 3
22.42 3.48 1 1 5 1 2
20.92 4.08 1 0 5 1 2
15.36 1.64 0 1 5 1 2
20.49 4.06 0 1 5 1 2
25.21 4.29 0 1 5 1 2
18.24 3.76 0 0 5 1 2
14.31 4 1 1 5 1 2
14 3 0 0 5 1 2
7.25 1 1 0 5 1 1
38.07 4 0 0 6 1 3
23.95 2.55 0 0 6 1 2
25.71 4 1 0 6 1 3
17.31 3.5 1 0 6 1 2
29.93 5.07 0 0 6 1 4
10.65 1.5 1 0 3 0 2
12.43 1.8 1 0 3 0 2
24.08 2.92 1 0 3 0 4
11.69 2.31 0 0 3 0 2
13.42 1.68 1 0 3 0 2
14.26 2.5 0 0 3 0 2
15.95 2 0 0 3 0 2
12.48 2.52 1 0 3 0 2
29.8 4.2 1 0 3 0 6
8.52 1.48 0 0 3 0 2
14.52 2 1 0 3 0 2
11.38 2 1 0 3 0 2
22.82 2.18 0 0 3 0 3
19.08 1.5 0 0 3 0 2
20.27 2.83 1 0 3 0 2
11.17 1.5 1 0 3 0 2
12.26 2 1 0 3 0 2
18.26 3.25 1 0 3 0 2
8.51 1.25 1 0 3 0 2
10.33 2 1 0 3 0 2
14.15 2 1 0 3 0 2
16 2 0 1 3 0 2
13.16 2.75 1 0 3 0 2
17.47 3.5 1 0 3 0 2
34.3 6.7 0 0 3 0 6
41.19 5 0 0 3 0 5
27.05 5 1 0 3 0 6
16.43 2.3 1 0 3 0 2
8.35 1.5 1 0 3 0 2
18.64 1.36 1 0 3 0 3
11.87 1.63 1 0 3 0 2
9.78 1.73 0 0 3 0 2
7.51 2 0 0 3 0 2
14.07 2.5 0 0 6 1 2
An Introduction to Regression Analysis Page: 87
13.13 2 0 0 6 1 2
17.26 2.74 0 0 6 1 3
24.55 2 0 0 6 1 4
19.77 2 0 0 6 1 4
29.85 5.14 1 0 6 1 5
48.17 5 0 0 6 1 6
25 3.75 1 0 6 1 4
13.39 2.61 1 0 6 1 2
16.49 2 0 0 6 1 4
21.5 3.5 0 0 6 1 4
12.66 2.5 0 0 6 1 2
16.21 2 1 0 6 1 3
13.81 2 0 0 6 1 2
17.51 3 1 1 6 1 2
24.52 3.48 0 0 6 1 3
20.76 2.24 0 0 6 1 2
31.71 4.5 0 0 6 1 4
10.59 1.61 1 1 5 1 2
10.63 2 1 1 5 1 2
50.81 10 0 1 5 1 3
15.81 3.16 0 1 5 1 2
7.25 5.15 0 1 6 1 2
31.85 3.18 0 1 6 1 2
16.82 4 0 1 6 1 2
32.9 3.11 0 1 6 1 2
17.89 2 0 1 6 1 2
14.48 2 0 1 6 1 2
9.6 4 1 1 6 1 2
34.63 3.55 0 1 6 1 2
34.65 3.68 0 1 6 1 4
23.33 5.65 0 1 6 1 2
45.35 3.5 0 1 6 1 3
23.17 6.5 0 1 6 1 4
40.55 3 0 1 6 1 2
20.69 5 0 0 6 1 5
20.9 3.5 1 1 6 1 3
30.46 2 0 1 6 1 5
18.15 3.5 1 1 6 1 3
23.1 4 0 1 6 1 3
15.69 1.5 0 1 6 1 2
19.81 4.19 1 1 3 0 2
28.44 2.56 0 1 3 0 2
15.48 2.02 0 1 3 0 2
16.58 4 0 1 3 0 2
7.56 1.44 0 0 3 0 2
10.34 2 0 1 3 0 2
43.11 5 1 1 3 0 4
13 2 1 1 3 0 2
13.51 2 0 1 3 0 2
18.71 4 0 1 3 0 3
12.74 2.01 1 1 3 0 2
An Introduction to Regression Analysis Page: 88
13 2 1 1 3 0 2
16.4 2.5 1 1 3 0 2
20.53 4 0 1 3 0 4
16.47 3.23 1 1 3 0 3
26.59 3.41 0 1 5 1 3
38.73 3 0 1 5 1 4
24.27 2.03 0 1 5 1 2
12.76 2.23 1 1 5 1 2
30.06 2 0 1 5 1 3
25.89 5.16 0 1 5 1 4
48.33 9 0 0 5 1 4
13.27 2.5 1 1 5 1 2
28.17 6.5 1 1 5 1 3
12.9 1.1 1 1 5 1 2
28.15 3 0 1 5 1 5
11.59 1.5 0 1 5 1 2
7.74 1.44 0 1 5 1 2
30.14 3.09 1 1 5 1 4
12.16 2.2 0 1 4 0 2
13.42 3.48 1 1 4 0 2
8.58 1.92 0 1 4 0 1
15.98 3 1 0 4 0 3
13.42 1.58 0 1 4 0 2
16.27 2.5 1 1 4 0 2
10.09 2 1 1 4 0 2
20.45 3 0 0 5 1 4
13.28 2.72 0 0 5 1 2
22.12 2.88 1 1 5 1 2
24.01 2 0 1 5 1 4
15.69 3 0 1 5 1 3
11.61 3.39 0 0 5 1 2
10.77 1.47 0 0 5 1 2
15.53 3 0 1 5 1 2
10.07 1.25 0 0 5 1 2
12.6 1 0 1 5 1 2
32.83 1.17 0 1 5 1 2
35.83 4.67 1 0 5 1 3
29.03 5.92 0 0 5 1 3
27.18 2 1 1 5 1 2
22.67 2 0 1 5 1 2
17.82 1.75 0 0 5 1 2
18.78 3 1 0 3 1 2
Case Study 9.
Research and Development Expenditures.
R&D
expenses Sales Profits
62.5 6375 185.1
An Introduction to Regression Analysis Page: 89
State Officer/ BAC Seat Max Funding Time to Pop/ %Metro Gas Tax Licenced Total
10K Limit Belt Speed work sqmi Pop Rate Drivers Crashes
AL 23 1 0 70 1064 21.2 85.1 67.7 18 3138 1022
AK 21 1 0 65 453 16.7 1.1 41.3 8 440 71
AZ 23 1 0 75 1532 21.6 40.1 87.6 18 2727 857
AR 23 1 0 65 755 19 48.4 48.3 19 1752 539
CA 22 0 1 70 531 24.6 207 96.6 18 20249 3576
CO 26 1 0 65 922 20.7 37.5 84 22 2757 555
CT 26 1 1 55 1202 21.1 675 95.6 34 2344 296
DE 23 1 0 65 452 20 374 81.9 23 529 105
DC 72 1 0 55 163 27.1 8615 100 20 333 58
FL 26 0 0 70 3472 21.8 272 92.9 12 11400 2496
GA 26 1 0 70 1675 22.7 129 68.5 7.5 4966 1403
HI 25 0 1 55 405 23.8 185 73.6 16 733 134
ID 21 1 0 75 359 17.3 14.6 37.8 21 820 228
IL 32 1 0 65 3097 25.1 214 84.1 19 7610 1312
IN 19 1 0 65 1444 20.4 164 71.7 15 3704 872
IA 18 1 1 65 1128 16.2 51.1 44.3 20 1956 411
KS 24 0 0 70 1162 17.2 31.7 55.4 18 1788 443
*
Data collected from: US Census Bureau – Statistical Abstract of the US
An Introduction to Regression Analysis Page: 91
Country FDI GNP/PPP R tax political Ex. Rate. Vol Literacy Elec. Railroad Highway Airport WTO
rate risk Rate Consumption
Argentina 11489 8970 6.6775 33 13.73 0.000238347 96.2 1837.6 0.01382 0.07613 0.50207 1
Austria 3838 26850 1.745 34 24.19 0.466487267 99 6892.49 0.07069 1.55988 0.66475 1
Austrialia 33676 20300 3.68 36 22.6 0.07342276 100 8873.88 0.00506 0.11985 0.05356 1
Belgium 18920 25380 2.0125 39 23.72 1.359769267 99 6979.55 0.11181 4.73619 1.38935 0
Bolivia 328 1000 8.0825 25 11.26 0.07596122 83.1 369.292 0.0034 0.04815 1.04206 1
Botswana 21 3600 -0.385 15 15.9 0.343349442 69.8 1144 0.00166 0.03157 0.15717 1
Brazil 37802 4570 24.935 15 11.69 0.026770083 83.3 1880.76 0.00341 0.23414 0.38609 1
Cameroon 238 610 2.5 39 6.31 22.01135171 63.4 176.629 0.00235 0.07307 0.11077 1
An Introduction to Regression Analysis Page: 92
Canada 10390 20020 4.195 38 22.98 0.049380827 97 16499.4 0.00735 0.09893 0.15129 1
8
Chile 9132 4810 10.9125 15 18.56 8.536221677 95.2 2391.5 0.00906 0.10657 0.50481 1
China 6348 750 13.6925 30 17.08 0.000891166 81.5 797.934 0.00696 0.12974 0.02209 0
Colombia 4317 2600 17.56 35 13.31 90.91734867 91.3 1370.08 0.00325 0.11126 1.07827 1
Costa Rica 2126 2780 0.8925 30 13.48 7.709686852 94.8 1341.95 0.01875 0.70266 3.07935 1
Cote 229 700 -2.5 35 8.73 22.48792979 48.5 118.851 0.00208 0.15849 0.11321 1
D'Ivoire'Dji
bouti
Czech 543 5040 -2.72 35 17.58 1.991887651 99 5852.24 0.12003 0.70556 0.87736 1
Republic
Denmark 2628 33260 1.345 34 23.74 0.257600836 99 6572.53 0.07838 1.68892 2.78341 1
Dominican 535 1770 12.425 25 14.34 0.458878059 82.1 824.135 0.01565 0.26044 0.74411 1
Republic
Ecuador 952 1530 -1.6375 25 8.13 750.6349495 90.1 672.637 0.00349 0.15487 0.66103 1
Egypt 1955 1290 5.7025 40 14.83 0.007006364 51.4 683.772 0.00477 0.06429 0.08941 1
El Salvador 599 1850 7.5725 25 10.95 0.00336941 71.5 607.459 0.02905 0.48403 4.15058 1
France 39188 24940 2.435 33 24.22 0.220332823 99 6981.28 0.0587 1.63646 0.86872 1
Germany 42853 25850 8.415 30 24.89 0.06615193 99 6206.29 0.13247 1.87707 1.76814 1
Ghana 321 390 4.01 35 10.61 19.09120667 64.5 311.315 0.00414 0.17133 0.05217 1
Greece 660 11650 6.6825 35 18.03 12.59637463 95 3865.46 0.01948 0.8945 0.59633 1
Guatamela 429 1640 -0.8875 30 9.31 0.141230961 55.6 251.306 0.00815 0.12082 4.40837 1
Honduras 186 730 3.9525 15 8.37 0.200119515 72.7 455.87 0.00532 0.12667 1.09036 1
Hong Kong 20802 23670 3.635 17 19.32 0.003599807 92.2 4176.64 0.03263 1.7572 2.87908 1
Indonesia 6932 680 -34.343 30 8.89 2403.654346 83.8 309.104 0.00354 0.18763 0.24255 1
Ireland 15936 18340 -2.1 32 23.02 0.024349236 98 4883.92 0.02826 1.34272 0.6387 1
Italy 14638 20250 1.1025 37 21.66 62.56302439 97 4653.33 0.03154 1.07816 0.46255 1
Jamaica 2105 1680 5.8575 33 16.36 0.373295565 85 2309.19 0.03416 1.72669 3.3241 1
Japan 38153 32380 -0.64 38 23.39 8.864378691 99 7517.38 0.06316 3.09545 0.45364 1
Kenya 238 330 15.445 35 8.96 1.134899583 78.1 138.326 0.00466 0.11208 0.40755 1
Korea, 7365 7970 4.555 28 15.11 141.4675659 98 4141.28 0.06355 0.64671 1.04899 1
Republic of
Latvia -32 2430 0.94 25 12.65 0.009878048 100 2625.46 0.03734 0.86612 0.77413 1
Lithuania 42 2440 0.825 29 11.6 0.000897902 98 2672.27 0.03203 1.09058 1.536 0
Malaysia 6193 3600 2.98 28 15.25 0.220977994 83.5 2244426 0.00547 0.28763 0.35002 1
Mexico 25877 3970 -4.5875 34 14.25 0.711567319 89.6 1539.95 0.01615 0.13104 0.93862 1
Namibia 2 1940 6.1425 35 15.25 0.518987106 38 673.433 0.00289 0.0785 0.16355 1
Netherland 79386 24760 1.09 35 24.84 0.074086453 99 5716602 0.08301 0.37475 0.82623 1
s
New 6136 14700 5.3675 33 22.23 0.097091554 99 9702.74 0.01479 0.34317 0.41315 1
Zealand
Norway 7609 34330 4.0125 28 22.91 0.117205284 99 25317.7 0.01303 0.29617 0.33457 1
Panama 26957 3080 5.3575 15 12.56 0 90.8 1255.34 0.00467 0.14607 1.44756 1
Papua New 120 890 2.4475 15 10.48 0.193867569 72.2 361.308 0 0.04328 1.08643 1
Guinea
Paraguay 204 1760 1.9175 30 10.1 182.0772537 92.1 877.423 0.00244 0.07425 2.36849 1
Peru 2587 2460 8.72 30 10.89 0.124470621 88.7 608.873 0.00159 0.05636 0.19063 1
Philiphines 3192 1050 1.9 34 13.69 1.956142523 94.7 405.819 0.00301 0.54101 0.25153 1
Poland 1698 3900 6.8325 36 17.6 0.075719325 99 34193.2 0.07984 1.23821 0.24301 1
Portugal 1474 10690 0.4925 37 21.76 6.694341299 85 3218.38 0.03341 0.74749 0.71777 1
Russia 1101 2300 -65.605 35 5.35 5.547440566 98 5383 0.00883 0.05578 0.1481 1
Senegal 67 530 1.7 35 7.74 22.48792979 33.1 72.6229 0.00471 0.07592 0.10417 1
Singapore 19783 30060 4.96 26 23.29 0.053503308 91.1 7928.42 0.06055 4.73255 14.1177 1
South 2363 2880 8.1625 35 14.96 0.525834743 81.8 4177.28 0.01757 0.27155 0.11804 1
An Introduction to Regression Analysis Page: 93
Africa
Spain 12807 14080 0.84 35 22.24 5.447225436 96 4201.62 0.03019 0.69455 0.19824 1
Sri Lanka 24 810 3.7 35 10.53 2.166877811 90.2 263.778 0.02319 1.53228 0.2008 1
Sweden 6053 25620 0.1575 28 23.25 0.141819826 99 15866.6 0.03265 0.33583 0.62055 1
Switzerland 37616 40080 0.915 45 25 0.061403186 99 7389.9 0.11262 1.78642 1.68469 1
Tanzania 26 210 -10 30 6.21 13.64270001 67.8 58.2012 0.00403 0.09954 0.14559 1
Thailand 5721 2200 6.0075 30 14 4.493464936 93.8 1362.19 0.00903 0.12623 0.20908 1
Turkey 1069 3160 9.465 25 13.48 27254.40765 82.3 1389.65 0.01348 0.49613 0.1518 0
Ukraine 92 850 3.155 30 5.63 0.786598652 98 3493.19 0.03868 0.28585 1.16946 0
United 17864 21400 1.7925 31 25 0.008186231 99 5520.27 0.06986 1.5398 2.0572 1
Kingdom 8
Uruguay 567 6180 6.115 30 13.06 0.244762331 97.3 2485.4 0.01724 0.0485 0.37438 1
Zambia 36 330 -30.523 35 4.98 246.9949177 78.2 661.559 0.00292 0.0536 0.1512 1
Zimbabwe 103 610 -2.28 38 9.4 8.190968019 85 964.691 0.00714 0.04743 1.20775 1
Enhance Bridge H-Light t-Signals Erosion Resurf Resurf-a MRB Landsc Surf-a Guard Depot Grade
1 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0
An Introduction to Regression Analysis Page: 95
0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
IP Spot
78.6 2576.96
78.4 2582.67
78.4 2439.17
78.8 2494.10
79.2 2446.63
79.5 2562.63
79.3 2378.22
78.8 2407.07
80.1 2480.48
80.7 2530.85
81.1 2618.55
82.0 2686.20
83.8 2699.74
84.4 2594.36
85.1 2758.82
86.5 2888.10
86.3 3028.40
86.5 3138.27
86.4 3116.82
87.6 3145.43
88.5 3059.57
89.8 3151.60
90.9 3248.15
91.8 3274.4
91.8 3338.81
93.1 3435.49
93.1 3482.76
93.4 3287.14
93.8 3432.44
94.8 3557.74
95.1 3587.15
95.1 3624.74
95.8 3657.14
96.1 3592.16
96.2 3495.45
94.7 3641.65
93.3 3713.79
93.0 3719.93
93.4 3972.32
93.2 3964.05
94.3 3973.36
94.6 4009.09
94.2 4050.24
93.9 4099.67
94.2 3932.73
93.6 3960.86
90.9 3842.59
87.1 3778.68
84.8 3731.44
83.5 3800.93
An Introduction to Regression Analysis Page: 97
82.0 3675.79
82.7 3586.29
82.5 3625.46
83.6 3538.69
84.1 3574.25
85.6 3541.78
86.4 3484.80
86.9 3584.60
87.7 3534.74
88.4 3683.37
An Introduction to Regression Analysis Page: 98
27-Mar-01 27.0000
28-Mar-01 26.4375
29-Mar-01 26.9375
30-Mar-01 25.6875
2-Apr-01 24.0625
3-Apr-01 23.4375
4-Apr-01 22.1875
5-Apr-01 25.1875
6-Apr-01 24.8125
9-Apr-01 24.8900
10-Apr-01 26.2600
11-Apr-01 26.7400
12-Apr-01 27.9200
a
Adjusted for 2:1 stock split by dividing stock price before stock split by 2.
An Introduction to Regression Analysis Page: 100
b
Adjusted for 2 different 2:1 stock splits by dividing stock price prior to first split by 2 and all stock prices
before the second split by 2.