0% found this document useful (0 votes)
14 views

Data Analysis Using STATA Software

The document provides a comprehensive guide on using STATA for data analysis, detailing its interface, operating modes, and essential commands for data management and analysis. It covers how to enter and export data, manage variables, and perform descriptive and inferential statistics. Additionally, it includes examples of STATA codes and their explanations for various data operations.

Uploaded by

skjim4607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Data Analysis Using STATA Software

The document provides a comprehensive guide on using STATA for data analysis, detailing its interface, operating modes, and essential commands for data management and analysis. It covers how to enter and export data, manage variables, and perform descriptive and inferential statistics. Additionally, it includes examples of STATA codes and their explanations for various data operations.

Uploaded by

skjim4607
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Analysis of Data using STATA

- Prepared by

Fazlur Rahman
Lecturer
Department of Accounting and Information Systems,
Jashore University of Science and Technology,
Jashore-7408, Bangladesh.
E-mail: [email protected]
Introduction to STATA
Description of STATA interface: The STATA interface contains five different types of windows.

1. Results window: Result window is the main and large window of STATA interface. All outputs or
results from STATA commands is displayed here (excluding graphs, which are
displayed in their own window).
2. Command window: This window is used to type and execute the STATA commands which is located
below the result window or at the bottom middle of the STATA interface.
3. Review window: All STATA commands are recorded in this window, which is located on the left
side of the STATA interface. You can retype a previous command by double-
clicking it in the Review window or by pressing the Page-Up key on your keyboard.
4. Variables window: When a dataset is loaded or imported into STATA, the variable names, and their
labels are displayed in the upper right corner of the interface named as variables
window. By selecting this ( ) arrow on the left side of each variable, we can
transfer each variable to the command line in the command window.
5. Properties window: This window is located in the lower right corner of the STATA interface and
displays the variable's properties when the variables in the variable window are
selected. In addition, the properties of the dataset are displayed in the properties
window's lower portion.

Variables
Review window

Results window window

Properties
window
Command window
Working directory
Ways of Operating STATA

There are three modes available for operating or running STATA software:

1. Interactive mode: In this mode, codes are typed directly in the Command window and then executed
the command by pressing Enter key.
2. Batch mode: In this mode, codes are typed in a separate file called a do-file and all commands are
executed together in one step. This do-file can be preserved for using later and can be executed in other
computers as well. A do-file can be created by clicking this icon in the toolbar of STATA interface
or by executing doedit command in the command window. The command can be executed by clicking
this icon in the last toolbar of do-file editor.
3. Drop-down menu: STATA software can be operated by drop-down menus.

Basic Information of Operating STATA Software

Directory: A directory is a location for storing files on the computer. In the STATA software, a directory
is a folder that is used to store data and other associated files. Before running further commands, it is
essential that the working directory of the STATA program should be identified in the first step. The
working directory lies at the bottom left corner (below review window) of the STATA interface.
Sometimes it is more essential to change the working directory into the expected folder of our computer
for importing or exporting data or other files. If we do not change the working directory to our expected
folder, we must need to type the whole path of the folder for importing, exporting data or other activities.

Log files: Log file is similar to a STATA’s built-in recorder which supposed to preserve all the STATA
commands and its results. In other words, everything that runs through the result window can be recorded
in the log file. Therefore, it is more essential to create a log file after setting our working directory and
when we are not trying to use a do-file editor for running STATA command. In short, when we want to
preserve our work, we need to create a log file and when we want to preserve the STATA codes only,
we need to create and save a do-file.

STATA is case sensitive, therefore, the main keywords and the variable names in a STATA command
must be the same. In the following table, red italic text in the STATA commands indicate that the text
should be changed based on the requirements of the users or procedures, while the black text must be
remained unchanged to run the commands correctly.
STATA code Explanation of the STATA codes
Finds the working directory in which STATA software remains as well as
pwd
to locate the working folder for importing or exporting the required files.
cd “Folder path” Changes the working directory to the Folder path such as D:\AIS 4202 that
cd “D:\AIS 4202” we are expecting to set as working directory.
Creates a log file named as filename in our working directory and all the
log using filename.log
results of our further work will be automatically stored in this log file.
Close the existing log file and after this command, the results will not be
log close
saved on our log file.
Adds more output to the existing log file (filename) in the working
log using filename.log, append
directory after closing it by the previous command.
dir This command will show all the files of the working directory.

Data Analysis using STATA

To perform data analysis using STATA software, the following topics must be learned empirically:

1. Entering data and exporting data


2. Data management
3. Exploring data (Descriptive Statistics)
4. Analyzing data (Inferential Statistics)

1. Entering data and Exporting data


Entering data: The following table presents main keywords of STATA code/syntax for entering
different formatted (along with their extension) data into the STATA for analysis.

We must store the datasets in the working directory before importing into STATA. Suppose, we have
our STATA data file in the folder named E:\AIS 4102\STATA Data then we must need to set our
working directory to this folder using the code: cd “E:\AIS 4102\STATA Data”
File format Extension Keywords Example of STATA codes
STATA format .dta use use data.name.dta, clear
Website .dta use use website link or hyperlink, clear
Excel format .xls or .xlsx import excel import excel using data.name.xlsx, firstrow clear
Comma separated .csv insheet insheet using data.name.csv, clear
Text data (Notepad) .txt insheet insheet using data.name.txt, clear

Exporting data: The following table presents main keywords of STATA code/syntax (along with
example) for exporting data into different formats.

Keywords File format Extension Example of STATA codes


save STATA format .dta save “data.name.dta”
export excel Excel format .xls or .xlsx export excel using “data.name.xlsx”, firstrow(variables)
export delimited Comma separated .csv export delimited using “data.name.csv”
export delimited Text data (Notepad) .txt export delimited using “data.name.txt”

2. Data Management

Variable type: Some basic features of our imported data files should be extracted before proceeding to
the data management part. In addition, an important step is to make sure that the variables are in their
expected format. STATA has a color-coded system for each type of variable such as black is for number,
red is for string or text, and blue is for labeled variables. The missing values in a variable shows dot (.)
in STATA. Labeled variable means that the variable is a qualitative or categorical in nature and STATA
reads number (values for each category) and shows text (labels or name of each category). Furthermore,
if a categorical variable is not labeled for its values, then the STATA will show black colored number.
Sometimes, a continuous variable may show a red color meaning that STATA reads the variable as a
string variable and we cannot do any analysis other than the frequency distribution with this variable.
That’s why, first, we need to identify all the categorical and continuous variable of our dataset. The
following table presents some standard codes along with their explanation:
STATA codes Purposes of the codes
ds Provides the list of all variables
Provides the size of the data file such as number of row (values) and the
describe, short
number of column (variables)
codebook Provides the detailed contents and summary statistics of data file.
codebook var.name(s) Provides details of the selected variable(s).
labelbook Provides the information of value labels for each categorical variable.
Summarizes all the variables by estimating number of observations, mean,
sum standard deviation, minimum and maximum value. The number of
observations will estimate zero for the string variable.

The data management part includes the following task:


i) Applying variable labels
ii) Renaming a variable
iii) Applying value labels to a categorical variable
iv) Sorting data files based on a variable
v) Recoding a categorical variable
vi) Creating a categorical variable by recoding a continuous variable
vii) Creating a new variable using some arithmetic operations
viii) Creating a standardizing variable (Z-score)
ix) Creating a numeric variable from a string variable
x) Adding data values (append) to an existing dataset
xi) Adding variables (merge) to an existing dataset

Data management operates and manages variables of the datafile based on the requirements of analysis.
Some points should be kept in mind while writing the variable names, viz., the use of upper case, lower
case, underscore instead of using spaces between two words of a variable name, and finally using short
name of variables instead of long name and providing variable labels to understand the variable names.
The following table represents the example STATA codes and their explanation. The black letters must
be remained unchanged and red italic letter can be changed based on the requirements of the users.
Example of STATA codes Explanation
This command will apply a label (text in the double quotation marks) to
label variable var.name “label name”
the variable var.name
rename var.old.name var.new.name Renames the variable var.old.name to the new name var.new.name
Two commands are used to apply a value label in a categorical variable.
label define label.name num.code1 category1
First command will create a general label name (label.name) for applying
num.code2 category2
to one or more categorical variables, viz., two number codes for two
categories with 1 for Male and 2 for Female. Then, the second command
label values cate.var label.name
will apply the created value label to the categorical variable named cate.var
sort var.name Sorts the full datasets based on the ascending order of the variable and gsort
gsort -var.name along with a minus in front of the variable invokes descending order.
recode old.var (old.num1 = new.num1) Creates a new categorical variable (new.var) by recoding the number codes
(old.num2 = new.num2), gen(new.var) of categories of the old categorical variable (old.var)
Creates a new categorical variable (new.var) with three categories (for
recode old.var (min/num1=1) (num2/num3=2)
example) by recording a continuous variable (old.var) from minimum
(num3/max=3), gen(new.var)
value to number1, number2 to number2, and number3 to maximum value.
gen new.var = var1 + var2 + var3 Creates a new variable (new.var) by adding three variables.
gen new.var = ln(old.var) Creates a new variable by using natural logarithm transformation of
gen new.var = exp(old.var) another variable old.var. Similarly, square (^2), square root (sqrt),
gen new.var = sqrt(old.var) exponential (exp) etc., transformation can be used to create a new variable
gen new.var = old.var^2 in STATA.
egen new.var = rowmean(var1 var2 var3) Creates a new variable (new.var) using the average of three variables.
Creates a new variable using logical expression of two categorical
variables. Only logical reasoning cases (after if command in the code) will
egen new.var = 1 if var1==1 & var2==1
produce numbers (1 and 2 for this case) and other values will be missing.
replace new.var = 2 if var1==2 & var2==1
We can procced to creating numbers by the replace command for all of our
expected reasoning case.
Creates a new variable (new.var) using the mean value of a continuous
egen new.var = mean(cont.var), by(cate.var) variable (cont.var) for each category/group of a categorical variable
(cate.var).
Create a new variable (new.var) by the standardized values (Z-score) of the
egen new.var = std(old.var)
variable old.var.
encode str.var = gen(num.var) Create a new numeric variable (num.var) from the string variable (str.var)
Added the values two datafile when they are not loaded in STATA. Since,
append using data.file1.dta data.file2.dta append command adds row, therefore the column names i.e., the variable
names in these two datafiles must be the similar.
Added the rows of datafile2 to the existing datafile in STATA. The variable
append using data.file2.dta
names of these two datafiles must be similar.
Datafile2 from current directory are combined or merged with datafile1 in
STATA by the key variable or identification variable (id.var). The name
merge 1:1 id.var using data.file2.dta
and values of the key variable should be the similar and 1:1 merge means
that there is no repeating data in the key variable of both datafiles.
This are similar task but must be used when the key variable of the existing
merge 1:m id.var using data.file2.dta data in STATA contains unique value and the data in current directory
(data.file2.dta) contain repeated values.
For this, the key variable of existing data contains repeated value and
merge m:1 id.var using data.file2.dta
data.file2.dta contain unique value.

3. Exploring Data (Descriptive Statistics)

Exploring data represents the initial analysis of data covering summary statistics and descriptive
statistics. Summary statistics presents estimation of frequencies and percentage along with some
graphical presentation of data. Descriptive statistics presents the estimation of the measures of central
tendency and the measures of dispersion. However, some experts consider the frequency distribution,
measure of central tendency, and dispersion along with their graphical display as the descriptive
statistics. The following table presents the classification of proper descriptives and graphical presentation
of data based on the combination and types of the variable.

Variable type Statistics Theoretical measures Graphical presentation


Descriptives Frequencies, Percentage Bar diagram, pie diagram
Categorical variable
Inferential Test of single proportion
Minimum, Maximum, Mean, Median,
Percentile, Decile, Quartile, Variance, Standard Histogram with normal
Descriptives deviation (SD), Inter quartile range (IQR), curve, Box-and-Whisker
Continuous variable Coefficient of variation (CV), Skewness, and plot, Normal probability
Kurtosis plot, Quantile-Quantile (Q-
Test of single mean, Q) plot.
Inferential
Test of normality (Shapiro-Wilk test)
Two-way table or cross-table for frequency and Composite Bar diagram i.e.,
Combination two Descriptives
percentage bar diagram by groups.
categorical variables
Inferential Test of independence or Chi-square test
Descriptives Correlation coefficient Scatter plot
Combination of two Test of correlation, Paired t-test (special case of
continuous variable Inferential before-after comparison), Simple linear
regression.
Common descriptives (min, max, mean ± SD, Box-and-Whisker plot by
Combination of
Descriptives median ± IQR, skewness, kurtosis, CV) by groups, Individual profile
categorical
groups plot (mean plot).
(Independent variable)
Homogeneity of variance test, Equality of two
and continuous
mean test (Independent sample t-test), One-way
variable (Dependent Inferential
analysis of variance (ANOVA), Simple linear
variable)
regression analysis.
More than two Common descriptive statistics, pairwise
Descriptives Scatterplot matrix
continuous variables correlation matrix
Inferential Multiple classical linear regression model
*Classical linear regression analysis includes model diagnostics or residual analysis or assumption
checking theoretically and graphically and we will learn it in the Data Analysis part.

In this section, we are going to estimate the descriptive statistics and to construct their associated graphs.
The following table presents STATA codes and their explanation for exploring study variables.
Example of STATA codes Explanation
table cate.var Estimates frequency distribution for the categorical variable.
graph bar, over(cate.var) Draws a vertical bar diagram for a categorical variable.
graph hbar, over(cate.var) Draws a horizontal bar diagram for a categorical variable.
ssc install catplot Installs catplot package because it is a user defined program.
catplot cate.var, percent() blabel(bar) recast(bar) Draws a bar diagram with its percentage for each category
and recast option allows to draw a bar or hbar or dot diagram.
Draws pie diagram for a categorical variable and percent
graph pie, over(cate.var) plabel(_all percent)
option allows sum (count), percent (%), and name (category)
tabstat cont.var(s), stat(n min max mean median Estimates the descriptives for a continuous variable.
var sd q iqr semean cv skew kurt)
hist cont.var, normal Draws a histogram with the normal probability curve.
graph box cont.var Draws a Box-and-Whisker plot for a continuous variable.
pnorm cont.var Draws a normal probability plot for a continuous variable.
qnorm cont.var Draws a quantile-quantile plot for a continuous variable.
tab cate.var1 cate.var2, col row Estimates two-way table using two categorical variables
along with their column and row percentage.
tab cate.var1 cate.var2, nofreq col Estimates two-way table with column percentage only.
graph bar, over(cate.var1) over(cate.var2) Draws a composite vertical bar diagram
Draws a composite bar diagram for two categorical variables
catplot cate.var1 cate.var2, percent() blabel(bar)
and by(cate.var3) is allowed at the end of the command to
recast(bar)
draw the composite bar for each category of cate.var3
pwcorr cont.var1 cont.var2 . . . cont.varn Estimates the parametric Pearson’s correlation coefficient
spearman cont.var1 cont.var2 . . . cont.varn Estimates and test the nonparametric Spearman correlation.
scatter cont.var1 cont.var2 Draws a scatter plot for the two continuous variables
twoway scatter cont.var1 cont.var2 || lfitci Draws a scatter plot with fitted line.
cont.var1 cont.var2
Estimates the descriptives of the continuous variable(s) for
tabstat cont.var(s), by(cate.var) stat(n min max
each category of the categorical variable. col(stat) command
mean sd cv median iqr) long
at the end produces another display of descriptives.
graph box cont.var, over(cate.var) Draws a box plot for each category of the categorical variable
Draws a mean plot of continuous variable(s) for each
graph bar cont.var(s), blabel(bar) over(cate.var)
category of the categorical variable.
pwcorr cont.var1 cont.var2 . . . cont.var(n) Estimates a correlation matrix for all continuous variables
graph matrix cont.var1 . . . cont.var(n), half Draws a scatter plot matrix of all continuous variables.
4. Analyzing Data (Inferential Statistics)

This part focuses on the analysis of data to make informed decision based on the suitable inferential
statistics by categorical and continuous variables. The following table presents the STATA code and
their explanation for estimating inferential statistics.

Example of STATA codes Explanation


Tests the equality of a single proportion of a categorical
prtest cate.var test.value
variable to a specific proportion such as 0.60 (test.value)
Tests the equality of a single proportion (P) using summary
prtesti sam.size sam.prop test.value statistics such as sample size (n), sample proportion (p) and
testable proportion (P0).
Tests the equality of proportions of two categorical
prtest cate.var1 == cate.var2 variables (having two groups each and coded by 1 and 0) in
which their proportion for group code 1 is compared.
Tests the equality of two proportions (P1 and P1) using
summary statistics such as 1st sample size (n1), 1st sample
prtesti sam.size1 sam.prop1 same.size2 sam.prop2
proportion (p1) and 2nd sample size (n2), 2nd sample
proportion (p2)
Tests (Test of independence or Chi-square test) the
significance of association between two categorical
variables. exp command will estimate the expected value
tab cate.var1 cate.var2, exp chi2 exact
for each cell which can be used to check the test assumption
so that we can choose the appropriate test procedure
between chi-square test and Fisher’s exact test.
swilk cont.var(s) Shapiro-Wilk test for the normality of a continuous variable.
Estimate the theoretical value of Box-Cox family
boxcox cont.var transformation for the indication of possible transformation
so that the non-normal continuous variable can be normal.
Graphically identifies which transformation can be used for
gladder cont.var
transforming a non-normal variable to normal variable.
Tests the equality of a single mean of a continuous variable
ttest cont.var == test.value
to a specific mean value such as 20 (test.value)
Tests the equality of a single mean (µ) using summary
ttesti sam.size sam.mean sam.sd test.value statistics such as sample size (n), sample mean (x̅), sample
standard deviation (s) and testable mean (µ0).
signtest cont.var = test.value Nonparametric test of equality of a single mean.
Tests the homogeneity of two variances i.e., tests the
sdtest cont.var, by(cate.var) equality of variance of continuous variable between two
(must be) groups of a categorical variable.
Tests the homogeneity of two variances robustly. Robust
robvar cont.var, by(cate.var) test is considered as stronger than the classical test when
the model assumption is violated slightly.
Tests the homogeneity of two variances using summary
sdtesti sam.size1 sam.mean1 same.sd1 sam.size2
statistics such as sample sizes (n1 and n2), sample means (x̅1
sam.mean2 same.sd2
and x̅2), and sample standard deviation (s1 and s2).
Tests the equality of means (independent sample t-test) of
ttest cont.var, by(cate.var) a continuous variable between two (must be) groups of a
categorical variable with equal variances between groups.
Tests the equality of two means (independent sample t-test)
ttest cont.var, by(cate.var) unequal
assuming unequal variances between groups.
Tests the equality of two means using summary statistics
ttesti sam.size1 sam.mean1 same.sd1 sam.size2
such as sample sizes (n1 and n2), sample means (x̅1 and x̅2),
sam.mean2 same.sd2
and sample standard deviation (s1 and s2).
Nonparametric test of equality of two means familiar with
ranksum cont.var, by(cate.var)
Mann Whitney U test or Wilcoxon Rank Sum test.
Tests the equality of two paired means (Paired t-test) of the
ttest cont.var.before == cont.var.after continuous variables having before and after case such as
blood pressures before taking drug and after taking drug.
Nonparametric test of equality of two paired means familiar
signrank cont.var.before = cont.var.after
with Wilcoxon Signed Rank test.
Test the significance of Pearson’s correlation coefficients
pwcorr cont.var1 . . . cont.var(n), sig
for two or more continuous variables.
Nonparametric test of Spearman rank correlation where rho
spearman cont.var1 . . . cont.var(n), stats(rho p)
presents correlation coefficients and p presents p-value.
Tests (One-way ANOVA) the equality of more than two
group means of a continuous variable (groups are the
categories of a categorical variable). If this test concludes
oneway cont.dep.var cate.ind.var
that there is significant difference of means among groups,
the post-hoc or multiple comparison test should be used to
identify the significantly different pairs of mean.
One way ANOVA with descriptives (t command) statistics,
oneway cont.dep.var cate.ind.var, t bon sid sch and post-hoc test using Bonferroni’s test (bon), Sidak’s test
(sid), and Scheffe’s test (sch).
Nonparametric test of equality of more than two group
kwallis cont.dep.var, by(cate.ind.var) means which is familiar with Kruskal-Walli’s test
(nonparametric one-way ANOVA).
Two-way ANOVA model which used to test the equality of
anova cont.dep.var cate.ind.var1##cate.ind.var2 more than two group means of a continuous variable under
the same group of another categorical variable.
This command includes installing emh packages to run
ssc install emh
non-parametric repeated measure two-way ANOVA
familiar with Friedman test. In this command, 2nd
emh cont.dep.var cate.ind.var1,
categorical variable should be the repeated measure
strata(cate.ind.var2) anova transformation(rank)
variable i.e., under which 1st categorical variables is nested.
Fits linear regression model using both continuous
independent and categorical independent variables. When
we are incorporating categorical variable in a regression
model, it is essential to create dummy variables of that
regress cont.dep.var cont.ind.var(s) i.cate.ind.var(s)
variable. Fortunately, STATA uses i.cate.var command to
create it dummy internally and creates first category as
reference category and shows the results for other
categories of the categorical variable.
regress cont.dep.var cont.ind.var(s) Fits linear regression model with robust standard error.
i.cate.ind.var(s), vce(robust)
regress cont.dep.var cont.ind.var(s) Fits linear regression model with robust standard error as
c.con.var#c.cont.var i.cate.ind.var(s), vce(robust) well as interaction effect of two continuous variables.
Residual Analysis or Model Diagnostics

The main assumptions of classical regression analysis:


1. The observations are independent of each other (that’s why we use regression analysis).
2. The relationship between dependent variable and independent variables are linear.
3. The residual follows an approximate normal distribution.
4. The residual has homoscedastic variance i.e., the variance of residual is equal or constant across all
level of independent variables.
5. There residual has no autocorrelation.
6. There is no or little multicollinearity among the independent variables of the model.
7. There is no endogeneity problem, i.e., there is no relationship between residual and independent
variables.

The residual analysis can be performed when a regression model was not fitted with robust standard
error. The following table summarizes the name of formal (theoretical) and informal (graphical) test
procedures for each of the underlying assumption of the classical linear regression model:

Test name (Key words) Theoretical tests Graphical test


1. Residual versus fitted plot.
Linearity of model 2. Augmented component-plus-
residual plot
1. Quantile-Quantile (Q-Q) plot.
2. Normal probability plot.
Normality of residual Shapiro-Wilk test
3. Histogram with normal curve.
4. Box and Whisker plot
Homoscedastic variance 1. Breusch-Pagan test. 1. Residual versus fitted plot.
of residual 2. White’s test 2. Residual versus predictor plot
No autocorrelation among 1. Durbin Watson test.
residuals 2. Breusch-Godfrey (B-G) test
Little multicollinearity of Values of variance inflation 1. Correlation matrix with significance
predictors factor (VIF) 2. Scatter plot matrix of the predictors
Detection of high leverage
Leverage versus residual square plot
and influential points
It is to be noted that the cross-sectional data collects the observation that are independent of each other
and that’s why we use classical regression analysis. In business and economics, the endogeneity problem
is more frequent and are to be identified at the earlier stage of analysis and must be analyzed using
advanced econometric models such as instrumental variable (IV) regression model, Two-stage Heckman
model, Two-stage least squares (2SLS), linear mixed effect model (LMM), and vector autoregressive
(VAR) model. That’s why, we will test the main five (2 to 6) underlying assumptions of the classical
linear regression model.

The following table presents STATA codes and their explanation for performing the residual analysis:

Example of STATA codes Explanation


Augmented component-plus-residual plot to detect the linear
acprplot cont.ind.var, mspline msopts()
relationship of the independent variable with the model.
Residual versus fitted plot to detect the linearity and to identify
rvfplot, yline(0)
homoscedastic variance of residual.
predict resid.name, resid The test of normality residual requires initial command to save the
residual under any name such as resid.name.
swilk resid.name Shapiro-Wilk test of normality of residual.
qnorm resid.name Quantile-Quantile (Q-Q) plot of residual for checking normality.
pnorm resid.name Normal probability plot of residual for checking normality.
hist resid.name, normal Histogram of residual for checking normality.
graph box resid.name Box plot of residual for checking normality and to detect outliers.
estat hettest resid.name Breusch-Pagan test of homoscedasticity of residual.
estat imtest, white White’s test of homoscedasticity of residual.
Residual versus fitted plot to identify the homoscedastic variance
rvfplot, yline(0)
of residual.
Residual versus predictor plot to identify the homoscedastic
rvpplot cont.ind.var, yline(0)
variance of residual for the independent variable of the model.
Initially, tsset should be identified for serial correlation of residual
tsset id.var
and based on the unique id variable, we refer id variable as time
variable, and then perform Durbin-Watson test of autocorrelation
estat dwatson
assuming independent variable as strongly exogeneous.
Breusch-Godfrey (B-G) test of autocorrelation assuming that the
estat bgodfrey
independent variables of the model are weakly exogeneous.
Estimates the variance inflation factor to detect multicollinearity
estat vif
among the independent variables of the model.
Pairwise correlation matrix and their significance to test significant
pwcorr cont.ind.var(s), sig
correlation among the independent variables of the model.
Scatter plot matrix to identify linear relationship among the
graph matrix cont.ind.var(s), half
independent variables of the model.
Leverage versus squared residuals plot to detect the high leverage
lvr2plot
points of the dataset.
Identifies the high leverage point occurred in the identification
lvr2plot, mlabel(id.var) mlabp(1)
variable (data values) which may help to omit high leverage values.

You might also like