0% found this document useful (0 votes)

449 views22 pages

Data Analysis With STATA - Sample Chapter

Chapter No.1 Introduction to Stata and Data Analytics Explore the big data field and learn how to perform data analytics and predictive modelling in STATA For more information: http://bit.ly/1NLw4Qz

Uploaded by

Packt Publishing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

449 views22 pages

Data Analysis With STATA - Sample Chapter

Uploaded by

Packt Publishing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Fr

Stata is an integrated software package that provides

you with everything you need for data analysis, data
management, and graphics. Stata also provides you with
a platform to efciently perform simulation, regression
analysis (linear and multiple), and custom programming.

What you will learn from this book

Data Analysis with Stata

pl
e

Perform important statistical tests to become

This book covers data management, graphic visualization,
and programming in Stata. Starting with an introduction
to Stata and data analysis, you'll move on to Stata
programming and data management. Next, the book
takes you through data visualization and all the important
statistical tests in Stata. Linear and logistic regression in
Stata is also covered.
As you progress through the book, you will explore
a few analyses, including survey analysis, time series
analysis, and survival analysis in Stata. You'll also
discover different types of statistical modeling techniques
and learn how to implement these techniques in Stata.

Who this book is written for

Be guided through how to program

in Stata
Implement logistic and linear
regression models

professional expertise distilled

E x p e r t i s e

D i s t i l l e d

Visualize and program data in Stata

Analyze survey data, time series data,

Data Analysis with Stata

and survival data

Perform database management in Stata

$ 34.99 US
22.99 UK

P U B L I S H I N G

P r o f e s s i o n a l

Prasad Kothari

This book is for all the professionals and students who want
to learn Stata programming and apply predictive modeling
concepts. This book is also very helpful for experienced
Stata programmers as it introduces advanced statistical
modeling concepts and shows how to apply them.

a Stata data scientist

Explore the big data eld and learn how to perform data analytics
and predictive modeling in Stata

Prices do not include

local sales tax or VAT
where applicable

Visit www.PacktPub.com for books, eBooks,

code, downloads, and PacktLib.

Prasad Kothari

professional expertise distilled

P U B L I S H I N G

In this package, you will find:

The author biography

A preview chapter from the book, Chapter 1 'Introduction to Stata
and Data Analytics'
A synopsis of the books content
More information on Data Analysis with Stata

About the Author

Prasad Kothari is an analytics thought leader. He has worked extensively with

organizations such as Merck, Sanofi Aventis, Freddie Mac, Fractal Analytics, and
the National Institute of Health on various analytics and big data projects. He has
published various research papers in the American Journal of Drug and Alcohol
Abuse and American Public Health Association.
Prasad is an industrial engineer from V.J.T.I. and has done his MS in management
information systems from the University of Arizona. He works closely with
different labs at MIT on digital analytics projects and research.
He has worked extensively on many statistical tools, such as R, Stata, SAS, SPSS, and
Python. His leadership and analytics skills have been pivotal in setting up analytics
practices for various organizations and helping them in growing across the globe.
Prasad set up a fraud investigation team at Freddie Mac, which is a world-renowned
team, and has been known in the fraud-detection industry as a pioneer in cuttingedge analytical techniques. He also set up a sales forecasting team at Merck and Sanofi
Aventis and helped these pharmaceutical companies discover new groundbreaking
analytical techniques for drug discovery and clinical trials. Prasad also worked with
the US government (the healthcare department at NIH) and consulted them on various
healthcare analytics projects. He played pivotal role in ObamaCare.
You can find out about healthcare social media management and analytics at
hp://www.amazon.in/Healthcare-Social-Media-Management-Analycs-ebook/dp/B00VPZFOGE/
ref=sr_1_1?s=digital-text&ie=UTF8&qid=1439376295&sr=1-1.

Preface
This book covers data management, visualization of graphs, and programming
in Stata. Starting with an introduction to Stata and data analytics, you'll move on
to Stata programming and data management. The book also takes you through
data visualization and all the important statistical tests in Stata. Linear and logistic
regression in Stata is covered as well. As you progress, you will explore a few
analyses, including survey analysis, time series analysis, and survival analysis in
Stata. You'll also discover different types of statistical modeling techniques and
learn how to implement these techniques in Stata. This book will be provided with
a code bundle, but the readers would have to build their own datasets as they
proceed with the chapters.

What this book covers

Chapter 1, An Introduction to Stata and Data Analytics, gives an overview of Stata
programming and the various statistical models that can be built in Stata.
Chapter 2, Stata Programming and Data Management, teaches you how to manage
data by changing labels, how to create new variables, and how to replace existing
variables and make them better from the modeling perspective. It also discusses
how to drop and keep important variables for the analysis, how to summarize the
data tables into report formats, and how to append or merge different data files.
Finally, it teaches you how to prepare reports and prepare the data for further
graphs and modeling assignments.
Chapter 3, Data Visualization, discusses scatter plots, histograms, and various
graphing techniques, and the nitty-gritty involved in the visualization of data
in Stata. It showcases how to perform visualization in Stata through code and
graphical interfaces. Both are equally effective ways to create graphs
and visualizations.

Preface

Chapter 4, Important Statistical Tests in Stata, discusses how statistical tests, such as
t-tests, chi square tests, ANOVA, MANOVA, and Fisher's test, are significant in
terms of the model-building exercise. The more tests you conduct on the given
data, the better an understanding you will have of the data, and you can check
how different variables interact with each other in the data.
Chapter 5, Linear Regression in Stata, teaches you linear regression methods and their
assumptions. You also get a review of all the nitty-gritty, such as multicollinearity,
homoscedasticity, and so on.
Chapter 6, Logistic Regression in Stata, covers how to build a logistic regression model
and what the best business situations in which such a model can be applied are. It
also teaches you the theory and application aspects of logistic regression.
Chapter 7, Survey Analysis in Stata, teaches you different sampling concepts and
methods. You also learn how to implement these methods in Stata and how to apply
statistical modeling concepts, such as regression to the survey data.
Chapter 8, Time Series Analysis in Stata, covers time series concepts, such as seasonality,
cyclic behavior of the data, and autoregression and moving averages methods. You
also learn how to apply these concepts in Stata and how to conduct various statistical
tests to make sure that the time series analysis that you performed is correct.
Chapter 9, Survival Analysis in Stata, teaches survival analysis and different statistical
concepts associated with it in detail.

Introduction to Stata and

Data Analytics
These days, many people use Stata for econometric and medical research purposes,
among other things. There are many people who use different packages, such as
Statistical Package for the Social Sciences (SPSS) and EViews, Micro, RATS/CATS
(used by time series experts), and R for Matlab/Guass/Fortan (used for hardcore
analysis). One should know the usage of Stata and then apply it in one's relative
fields. Stata is a command-driven language; there are over 500 different commands
and menu options, and each has a particular syntax required to invoke any of the
various options. Learning these commands is a time-consuming process, but it is not
hard. At the end of each class, your do-file will contain all the commands that we
have covered, but there is no way we will cover all of these commands in this short
introductory course.
Stata is a combined statistical analytical tool that is intended for use by research
scholars and analytics practitioners. Stata has many strengths, but we are going to
talk about the most important one: managing, adjusting, and arranging large sets of
data. Stata has many versions, and with every version, it keeps on improving; for
example, in Stata versions 11 to 14, there are changes and progress in the computing
speed, capabilities and functionalities, as well as flexible graphic capabilities. Over
a period of time, Stata keeps on changing and updating the model as per users'
suggestions. In short, the regression method is based on a nonstandard feature,
which means that you can easily get help from the Web if another person has written
a program that can be integrated with their software for the purpose of analysis.
The following topics will be covered in this chapter:

Introducing Data analytics

Introducing the Stata interface and basic techniques

[1]

Introduction to Stata and Data Analytics

Introducing data analytics

We analyze data everyday for various reasons. To predict an event or forecast the
key indicators, such as the revenue for a given organization, is fast becoming a major
requirement in the industry. There are various types of techniques and tools that can
be leveraged to analyze the data. Here are the techniques that will be covered in this
book using Stata as a tool:

Stata programming and data management: Before predicting anything,

we need to manage and massage the data in order to make it good enough
to be something through which insights can be derived. The programming
aspect helps in creating new variables to treat data in such a way that finding
patterns in historical data or predicting the outcome of given event becomes
much easier.

Data visualization: After the data preparation, we need to visualize the data
for the the following:

To view what patterns in the data look like

To check whether there are any outliers in the data

To understand the data better

To draw preliminary insights from the data

Important statistical tests in Stata: After data visualization, based on

observations, you can try to come up with various hypotheses about the
data. We need to test these hypotheses on the datasets to check whether they
are statistically significant and whether we can depend on and apply these
hypotheses in future situations as well.

Linear regression in Stata: Once done with the hypothesis testing, there
is always a business need to predict one of the variables, such as what the
revenue of the financial organization will be in specific conditions, and so on.
These predictions about continuous variables, such as revenue, the default
amount on a credit card, and the number of items sold in a given store, come
through linear regression. Linear regression is the most basic and widely
used prediction methodology. We will go into details of linear regression in a
later chapter.

[2]

Chapter 1

Logistic regression in Stata: When you need to predict the outcome of a

particular event along with the probability, logistic regression is the best
and most acknowledged method by far. Predicting which team will win
the match in football or cricket or predicting whether a customer will
default on a loan payment can be decided through the probabilities
given by logistic regression.

Survey analysis in Stata: Understanding the customer sentiment and

consumer experience is one of the biggest requirements of the retail industry.
The research industry also needs data about people's opinions in order to
derive the effect of a certain event or the sentiments of the affected people.
All of these can be achieved by conducting and analyzing survey datasets.
Survey analysis can have various subtechniques, such as factor analysis,
principle component analysis, panel data analysis, and so on.

Time series analysis in Stata: When you try to forecast a time-dependent

variable with reasonable cyclic behavior of seasonality, time series analysis
comes handy. There are many techniques of time series analysis, but we
will talk about a couple of them: Autoregressive Integrated Moving
Average (ARIMA) and Box Jenkins. Forecasting the amount of rainfall
depending on the amount of rainfall in the past 5 years is a classic time
series analysis problem.

Survival analysis in Stata: These days, lots of customers attrite from telecom
plans, healthcare plans, and so on, and join the competitors. When you
need to develop a churn model or attrition model to check who will attrite,
survival analysis is the best model.

[3]

Introduction to Stata and Data Analytics

The Stata interface

Let's discuss the location and layout of Stata. It is very easy to locate Stata on a
computer or laptop: after installing the software, go to the start menu, go to the
search menu, and type Stata. You can find the path where the file is saved. This
depends on which version has been installed. Another way to find Stata on the
computer is through the quick launch button as well as through Start programs.

[4]

Chapter 1

The preceding diagram represents the Stata layout. The four types of processors in
Stata are multiprocessor (two or four), special edition processor (flavors), intercooled,
and small processor. The multiprocessor is one of the most efficient processors.
Though all processor versions function in a similar fashion, only the variables'
repressors frequency increases with each new version. At present, Stata version 11
is in demand and is being used on various computers. It is a type of software that
runs on commands. In the new versions of Stata, new ways, such as menus that can
search Stata, have come in the market; however, typing a command is the simplest
and quickest way to learn Stata. The more you use the functionality of typing the
command, the better your understanding becomes. Through the typing technique,
programming becomes easy and simple for analytics. Sometimes, it is difficult to
find the exact syntax in commands; therefore, it is advisable that the menu command
be used. Later on, you just copy the same command for further use. There are three
ways to enter the commands, as follows:

Use the do-file program. This is a type of program in which one has to inform
the computer (through a command) that it needs to use the do-file type.

Type the command manually.

Enter the command interactively; just click on the menu screen.

Though all the three types discussed in the preceding bullets are used, the do-file
type is the most frequently used one. The reason is that for a bigger file, it is faster as
compared to manual typing. Secondly, it can store the data and keep it in the same
format in which it was stored. Suppose you make a mistake and want to rectify it;
what would you do? In this case, the do-file is useful; one can correct it and run
the program again. Generally, an interactive command is used to find out the
problem and later on, a do-file is used to solve it. The following is an example of
an interactive command:

[5]

Introduction to Stata and Data Analytics

Data-storing techniques in Stata

Stata is a multipurpose program, which can serve not only its own data, but also
other data in a simple format, for example, ASCII. Regardless of the data type format
(Excel/statistical package), it gets automatically exported to the ASCII file. This
means that all the data can now easily be imported to Stata.
The data entered in Stata is in different types of variables, such as vectors with
individual observations in every row; it also holds strings and numeric strings.
Every row has a detailed observation of the individual, country, firm, or whatever
information is entered in Stata.
As the data is stored in variables, it makes Stata the most efficient way to store
information. Sometimes, it is better to save the data in a different storage form,
such as the following:

Matrices

Macros

Matrices should be used carefully as they consume more memory than variables,
so there might be a possibility of low space memory before work is started.
Another form is macros; these are similar to variables in other programming
languages and are named containers, which means they contain information of any
type. There are two flavors of macros: local/temporary and global. Global macros
are flexible and easy to manage; once they are defined in a computer or laptop, they
can be easily opened through all commands. On the other hand, local macros are
temporary objects that are formed for a particular environment and cannot be used
in another area. For example, if you use a local macro for a do-file, that code will only
exist in that particular environment.

Directories and folders in Stata

Stata has a tree-style structure to organize directories as well as folders similar to
other operating systems, such as Windows, Linux, Unix, and Mac OS. This makes
things easy and folders can be retrieved later on dates that are convenient. For
example, the data folder is used to save entire datasets, subfolders for every single
dataset, and so on. In Stata, the following commands can be leveraged:

Dos

Linux

Unix

[6]

Chapter 1

For example, if you need to change the directory, you can use the CD command,
as follows:
CD C:\Stataforlder

You can also generate a new directory along with the current directory you have
been using. For example:
mkdir "newstata".

You can leverage the dir command to get the details of the directory. If you need
the current directory name along with the directory, you can utilize the pwd or
CD command.
The use of paths in Stata depends on the type of data. Usually, there are two paths:
absolute and relative. The absolute path contains the full address, denoting the
folder. In the command you have seen in the earlier example, we leveraged the
CD command using the path that is absolute. On the contrary, the relative path
provides us with the location of the file. The following example of mkdir has
used the relative path:
mkdir "E\Stata|Stata1"

The use of the relative path will be beneficial, especially when working on different
devices, such as a PC at home or a library or server. To separate folders, Windows
and Dos use a backslash (\), whereas Linux and Unix use a slash (/). Sometimes,
these connotations might be troublesome when working on the server where Stata
is installed. As a general rule, it is advisable that you use slashes in the relative path
as Stata can easily understand a slash as a separator. The following is an example
of this:
mkdir "/Stata1/Data" this is how you create the new folder for your
STATA work.

Reading data in Stata

Whenever data is inserted in Stata, it's copied into the RAM memory of the
computer. Generally, some of the changes are not on the permanent side and are not
saved. So, these changes are lost when you reopen the Stata session. You can enter
the data into Stata in various ways. One of the most effective way is as follows:
Use E:\Stata1\t1

less

India pwt

80-2010.dta,

clear

The option at the end of the code, clear, makes Stata read the dataset again before
you open another data file.

[7]

Introduction to Stata and Data Analytics

Another option with limited variables in the dataset is as follows:

use

country

year

using

"t1

less India

pwt

80-2010 . dta" ,

clear

Insheet
In order to read data in Stata, it has to be converted into a format other than Excel.
Also, save the data in one of the following formats:

Excel

CSV (comma separated values)

Text (where the delimiter is a tab or comma)

You need to take into consideration certain rules and regulations while working
on Stata:

Suppose that the first row in the Excel file contains the name of the variables
or headers, that is, the sheet contains variable names (series/code/names).
Then, the second row must have data. The title of the first row must be
removed before saving the file.

In Stata, every single word is read; therefore, any additional lines below or to
the right of the data, for example, footnotes or endnotes, should be deleted
before saving it. If essential, delete the entire bottom row or the column on
the right-hand side.

You should not put numbers in the beginning of the variable name. In Stata,
a problem might occur when the file is arranged with years (1980, 1985) in
the top row. In such cases, placing an underscore before numbers will be
helpful, and this can be done by selecting the row, using the spreadsheet
package, and finding replace tools; for example, 1980 becomes _1980,
and so on.

The most important thing to note is the deletion of commas from the data
because Stata won't be able to understand the starting point and finishing
point of columns and rows. You can do this by leveraging the first find then
replace option.

Notations such as double dots (..) or hyphens (-) might trouble Stata and
will create confusion because Stata can read a single dot (.) as double dots or
hyphens as text.

[8]

Chapter 1

After saving the data in the CSV format, it can be read in Stata, as shown in the
following code snippet:
insheet using "E:\Stata1|t1 less India pwt 80-2010.

txt",

clear

If any changes are made to the data by applying the CD command, then it can be read
as follows:
insheet using "t1 less India pwt 80-2010.

txt",

clear

Many ways are available for the insheet command. Options are defined as
additional qualities of standard commands, which are generally added once the
command ends, should have commas in between, and so on. The following are
some of the options used in Stata:

The clear option: This can be used to insert a new file, insheet, regardless
of the selected data: insheet using "E:\ Stata1\t1 less India pwt
80-2010 . txt" , clear

The option name: This provides insights of data (usually from the first row),
which helps Stata remember the file automatically. However, in certain cases,
if this option does not work, then Stata uses variable names; an example is
as follows:
insheet using "E:\Stata1 classes\t1 less India pwt 80-2010 .
txt" , names clear

The delimiter option: This gives instructions to Stata regarding data

insertion to insheet. Stata has the ability to recognize tab as well as
comma-delimited data, yet often other delimiters such as ; are used
in datasets. Here is an example:
insheet using "E:\Ind-samp.txt", delimiter (";")

Infix
Along with insheet, you can use the infix command, as shown later.
Most times, CSV or tab-delimited datasets are utilized, and the ASCII format is still
used to save older data. Let's take the example of a survey taken by the government.
This example represents two lines from 2010:
10862226023331
10001222228332

06 022
06 022

3
3

[9]

02220155500666600777000003331
02555553006666000000000044441

Introduction to Stata and Data Analytics

A codebook or data dictionary usually comes in the PDF or text file format. It
explains the data that shows us that the first two numbers, the row ID, and the other
two numericals are survey records (2010 from the previously mentioned dataset),
and the fifth number is the quarter (the first quarter in this case) of the interview,
among other things. infix is required to read such types of data and provides
information to Stata from the codebook. The following is an example:
infix rowtype 1-2 yr 3-4 quart 5 [] using
"E:\ Stata1\Survey2010.dat", clear

In order to save many files, the dictionary file is used; it will save the codebook
information and mark it as a separate file. The file can be seen as follows:
infix dictionary using Survey2010.dat
{
dta
rowtype 1-2
yr 3-4 quart5 []
}

The infix command is used after saving the data as Survey2010.dct. As a relative
path is used in the dictionary file (Survey2010), it is believed that raw data will be
inside the same file set that is either a dictionary or a catalogue file. This being the
case, then referring data is not required. The file will look like this:
infix using "H:\ECStata\NHIS1986.dct", clear

Defining and constituting a dictionary file in a proper way is a tedious job. However,
NHIS has a dictionary that can be read through the SAS program; this can be
converted into Stata using the Stat/Transfer program.

The Stat/Transfer program

This program is used to convert various dataset formats into well-defined industry
formats, such as SAS, R, SPSS, Excel, and so on. Before converting, the data should
be examined thoroughly. As it is an extremely user-friendly tool, it can be used
to change the data between various packages as well as formats. This is shown
as follows:

[ 10 ]

Chapter 1

Manual typing or copy and paste

Typing or copying and pasting is the same as in other programs, but here, it can be
done through the Stata editor. Just select the required data columns in Excel and
paste them in the Stata editor. However, this has some drawbacks; many times, data
inaccuracy or missing values don't have any fixed procedure, and in certain cases,
language problems may arise. For example, in selected countries, a comma is used
instead of a decimal point.
Typing is an extremely tough job, especially when electronic data is unavailable
because in that case, we have to type the data. This job becomes easy in Stata through
the edit command as it will take you to a spreadsheet-like feature where new data
can be entered and old data can be edited.

Variables and data types

There are different types of variables and data types, which we are going to see in
this section.

[ 11 ]

Introduction to Stata and Data Analytics

Indicators or data variables

To find the insights and the data conclusions, the browse/edit command is helpful.
Data variables store the fundamental data. As shown in the following table, the
income data for different nations is stored in the Cccgdp variable and the country
(Countrycode) data is stored in the pop variable. If we want to get an idea about the
details of all kinds of data, then one indicator variable is needed. In the following
case, Countrycode and yr will provide information regarding the country, the year,
the country's GDP, and the population data (pops). The data might be as follows:
Country

Countrycode

Pops

Cccgdp

Openss

India

IND

2010

23452.9

10897.23

23.11111

U.S.

USA

2010

22222.1

23987.23

90.42231

Pakistan

PAK

2010

11111.2

23675.21

10.22291

China

CHN

2010

98765

97654.94

30.98765

Russia

RUS

2010

19876

65745.11

43.34343

Germany

GER

2010

23467

23874.35

23.74747

After importing the data in Stata, it is always a good practice to examine the data.
It gives you an advantage in any modeling or visualization exercise.

Examining the data

Examining the data is always recommended. It is a good idea to examine your data
when you first read it into Stata; you should check whether all the variables and
observations are present and are in the correct format.
While the browse/edit command is used to examine the raw data, the list
command is used to see the results of the data. Listing small data is possible through
this command. For bigger datasets, options are used to track the data. An example is
shown as follows:
List country* yr pops
Country
India
U.S.
Pakistan
China
Russia
Germany

countrycode
IND
USA
PAK
CHN
RUS
GER

yr
2010
2010
2010
2010
2010
2010

[ 12 ]

pops
23452.9 |
22222.1 |
11111.2 |
98765 |
19876 |
23467 |

Chapter 1

In the preceding table, the star is called the placeholder, and it instructs Stata to
incorporate the entire data with the country. Alternatively, we could focus on all
variables but list only a limited number of observations, for example, the observation
from 14th to 19th row:
The following table contains the country, country code, year, and pops 14/19:
Country

Countrycode

Popscon

Cccgdps

kOpenss

India

IND

2010

23452.9

10897.23

23.11111

U.S.

USA

2010

22222.1

23987.23

90.42231

Pakistan

PAK

2010

11111.2

23675.21

10.22291

China

CHN

2010

98765

97654.94

30.98765

Russia

RUS

2010

19876

65745.11

43.34343

Germany

GER

2010

23467

23874.35

23.74747

How to subset the data file using IN and IF

In the previous part, the in qualifier was used; it makes sure that the subset pertains
to selected data. A lot of observations follow after this, for example:

The list in 14/19

The list in 90/l

The list in 30/l

As is clear from the preceding example, there are three observations:

The first command lists observations from 14 to 19

The second command lists 90 observations

The third command lists observations from 30 till the last observation

The if statement is the other way of subsetting data; it generally has values of true
or false. The following is an example from the observation of the year 2010, where the
variable name is yr:
list if yr == 2010

In order to examine the raw data, the browse window is used. However, a problem
occurs when only selected variables are to be viewed; this happens in big datasets.
So, in this condition, create a list of the variables you want to examine before
browsing. This is done through the following command:
browse country yr popscon
[ 13 ]

Introduction to Stata and Data Analytics

It is important to note that this edit command will help change the dataset
manually. The assert command helps Stata examine the observation. This is
because when the bigger data (or big data, as it is called in today's world) arrives,
checking single data through browse or edit commands becomes difficult. In this
case, the assert command is helpful. There are a couple of advantages: it helps
identify whether a data statement is right or wrong. For example, in the case of the
population of the country (popscon), it will tell us that the values are positive:
assert popscon>0,
assert popscon<0

If the preceding command results in the value true, then assert does not give any
output. However, if the command value is false, then an error message will appear.
The describe command accounts for various fundamental information regarding
datasets and variables, such as the total size of the dataset and the variable, the total
number of variables in the dataset, and different formats of the variables. This can
be denominated as describe. It can only be applied to an unread file in Stata.
An example is given as follows:
describe using "E:\Ind-Health-sample.dta"

Codebook can give information on variables in the dataset without the list of
variables; an example of this is codebook country.
The summarize command delivers the statistics summary: means, standard
deviation, and so on. The following table represents this tab:
summarize table
Variable

Obs

Mean

Std. Dev.

Min

Max

Cntry

countrycode

2000

2.156

1990

2010

Popscon

87634.46

8374.33

29383.9

93830

ccCgdps

67544.23

4100.682

15890.71

98739.67

kOpenss

Chi-ppl

23.6

3.56

10.456

40.8796

Fdhsa

19.56

9.567

12.456

34.98765

Gdkliyu

1.987456

1.2

-3.238917

6.46896

[ 14 ]

Chapter 1

As we can see in the preceding table, string variables such as Cntry and
Countrycode do not have numbers; this is why no summary details are available.
Yr is a numeric variable; therefore, we can see that it has a statistics summary.
For more details, the summarize detail option can be used.
The wide range of graphic qualities makes Stata a unique tool. One can easily get
help by typing the help command in Stata. A histogram graph can be created
through the following command:
graph twoway histogram cccgdps

For a scatter plot, you have to leverage the following command:

graph two-way scatter ccccgdps popscon

Even though there is some benefit of having advanced graphs in Stata, this makes it
work slowly. In certain cases, it is better to use version 7 graphics because they help
visualize the data properly without using papers or presentations. This can be seen
as follows:
graph7 cccgdps popscon

Saving the dataset is a very easy command, and it is represented as follows:

Save "E:\Stata1\t1 less India pwt 80-2010.dta", replace

If we have sets of files of the same content, then the replace tab/option can be
helpful. It will swap the last version and save it. If the old version is to be stored
for some reason, then save it with a different name. One thing that should be kept
in mind is that the original file content can be changed if it is saved with revised
datasets. Therefore, after changes are made to the revised file, in order to open the
file and restart it, just reopen it.
There are two ways to preserve and store the data. One option is to save the current
data and revise it, and later, if you don't want to keep the data, then reopen the
saved data version. Another option is to use the preserve and restore functions/
commands; they will take an image of the data, and the data will come back after
you type restore.

[ 15 ]

Introduction to Stata and Data Analytics

Summary
We discussed lots of basic commands, which can be leveraged while performing
Stata programming. The next chapter will discuss data management techniques and
programming in detail. This chapter is basic and will help any beginner-level Stata
programmer start working on Stata.
As you learn more about Stata, you will understand the various commands and
functions and their business applications.

[ 16 ]

Get more information Data Analysis with Stata

Where to buy this book

You can buy Data Analysis with Stata from the Packt Publishing website.
Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and most internet
book retailers.
Click here for ordering and shipping details.

www.PacktPub.com

Stay Connected:

[Ebooks PDF] download Statistical Analysis with Missing Data 3rd Edition Roderick J. A. Little full chapters
100% (3)
[Ebooks PDF] download Statistical Analysis with Missing Data 3rd Edition Roderick J. A. Little full chapters
40 pages
Functional Data Analysis With R
100% (1)
Functional Data Analysis With R
338 pages
[Ebooks PDF] download Statistics Using Stata An Integrative Approach Sharon Lawner Weinberg full chapters
100% (1)
[Ebooks PDF] download Statistics Using Stata An Integrative Approach Sharon Lawner Weinberg full chapters
55 pages
Managerial Statistics A Case-Based Approach (Stata Edition)
No ratings yet
Managerial Statistics A Case-Based Approach (Stata Edition)
414 pages
Extending Power BI With Python and R: Perform Advanced Analysis Using The Power of Analytical Languages, (2nd Edition) Luca Zavarella
100% (10)
Extending Power BI With Python and R: Perform Advanced Analysis Using The Power of Analytical Languages, (2nd Edition) Luca Zavarella
52 pages
Missing and Modified Data in Nonparametric Estimation
100% (1)
Missing and Modified Data in Nonparametric Estimation
465 pages
The Stata Survival Manual by Pevalin D., Robson K.
No ratings yet
The Stata Survival Manual by Pevalin D., Robson K.
389 pages
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
No ratings yet
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
239 pages
Nissan Otti Manual
No ratings yet
Nissan Otti Manual
274 pages
The Demography of Health and Health Care: Second Edition
No ratings yet
The Demography of Health and Health Care: Second Edition
385 pages
JIRA 7 Administration Cookbook Second Edition - Sample Chapter
No ratings yet
JIRA 7 Administration Cookbook Second Edition - Sample Chapter
35 pages
Stuff Cheats For Harvest Moon: The Tale of Two Towns
No ratings yet
Stuff Cheats For Harvest Moon: The Tale of Two Towns
15 pages
67 Golden Rules Ebook PDF
No ratings yet
67 Golden Rules Ebook PDF
31 pages
Sophia Rabe-Hesketh, Brian S. Everitt, - A Handbook of Statistical Analyses Using Stata, Fourth Edition-Chapman and Hall - CRC (2006)
No ratings yet
Sophia Rabe-Hesketh, Brian S. Everitt, - A Handbook of Statistical Analyses Using Stata, Fourth Edition-Chapman and Hall - CRC (2006)
345 pages
Biostatistics in Public Health Using STATA-2016
100% (3)
Biostatistics in Public Health Using STATA-2016
202 pages
Stata 14 Tutorial PDF
No ratings yet
Stata 14 Tutorial PDF
44 pages
Data Analysis With STATA
100% (1)
Data Analysis With STATA
270 pages
Stata Guide To Accompany Introductory Econometrics For Finance PDF
No ratings yet
Stata Guide To Accompany Introductory Econometrics For Finance PDF
175 pages
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
No ratings yet
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
281 pages
Epidemiology with R
No ratings yet
Epidemiology with R
246 pages
Sharon Lawner Weinberg - Sarah Knapp Abramowitz - Statistics Using Stata - An Intergrative Approach-Cambridge University Press (2016)
No ratings yet
Sharon Lawner Weinberg - Sarah Knapp Abramowitz - Statistics Using Stata - An Intergrative Approach-Cambridge University Press (2016)
1,429 pages
SPSS Commands in STATA
100% (4)
SPSS Commands in STATA
4 pages
Non Parametrical Statics Biological With R PDF
No ratings yet
Non Parametrical Statics Biological With R PDF
341 pages
Análisis de Supervivencia
100% (1)
Análisis de Supervivencia
441 pages
Regression Analysis: Unified Concepts, Practical Applications, and Computer Implementation
100% (2)
Regression Analysis: Unified Concepts, Practical Applications, and Computer Implementation
280 pages
Gary King, Ori Rosen, Martin A. Tanner - Ecological Inference - New Methodological Strategies (Analytical Methods For Social Research) (2004)
100% (1)
Gary King, Ori Rosen, Martin A. Tanner - Ecological Inference - New Methodological Strategies (Analytical Methods For Social Research) (2004)
433 pages
Interpreting and Visualizing Regression Models Using Stata 2nd Edition Michael Mitchell download
100% (2)
Interpreting and Visualizing Regression Models Using Stata 2nd Edition Michael Mitchell download
82 pages
Complete Download Maximum Likelihood Estimation with Stata Fourth Edition William Gould PDF All Chapters
100% (4)
Complete Download Maximum Likelihood Estimation with Stata Fourth Edition William Gould PDF All Chapters
81 pages
Tutorial How To Run Panel Data Analysis by Using Stata
No ratings yet
Tutorial How To Run Panel Data Analysis by Using Stata
21 pages
Bayesian Cost Effectiveness Analysis With The R Package BCEA PDF
No ratings yet
Bayesian Cost Effectiveness Analysis With The R Package BCEA PDF
181 pages
_OceanofPDF.com_Data_Visualization_in_R_and_Python_-_Marco_Cremonini
No ratings yet
_OceanofPDF.com_Data_Visualization_in_R_and_Python_-_Marco_Cremonini
977 pages
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
100% (2)
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
243 pages
Michael N. Mitchell - Data Management Using Stata - A Practical Handbook-STATA Press (2010)
100% (1)
Michael N. Mitchell - Data Management Using Stata - A Practical Handbook-STATA Press (2010)
405 pages
Multilevel Modeling Using R - Finch Bolin Kelley
No ratings yet
Multilevel Modeling Using R - Finch Bolin Kelley
82 pages
Advanced Stata
No ratings yet
Advanced Stata
54 pages
R Handout Statistics and Data Analysis Using R
No ratings yet
R Handout Statistics and Data Analysis Using R
91 pages
A Course in Mathematical Statistics George G. Roussas p593 T
No ratings yet
A Course in Mathematical Statistics George G. Roussas p593 T
593 pages
Practical Guide To Logistic Regression - Joseph M. Hilbe (2017)
No ratings yet
Practical Guide To Logistic Regression - Joseph M. Hilbe (2017)
170 pages
Stata Ts Introduction To Time-Series Commands
No ratings yet
Stata Ts Introduction To Time-Series Commands
6 pages
Statistics Easy Scientific Datasets
100% (1)
Statistics Easy Scientific Datasets
76 pages
Econometrics in R: Grant V. Farnsworth October 26, 2008
No ratings yet
Econometrics in R: Grant V. Farnsworth October 26, 2008
50 pages
Webpdf
100% (1)
Webpdf
365 pages
Beautiful Graphics in R
No ratings yet
Beautiful Graphics in R
238 pages
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
No ratings yet
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
521 pages
Graphing Stata (MIT)
No ratings yet
Graphing Stata (MIT)
56 pages
Modelos de Fragilidad en El Análisis de Supervivencia PDF
No ratings yet
Modelos de Fragilidad en El Análisis de Supervivencia PDF
320 pages
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
No ratings yet
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
658 pages
Michael Grogan - Python Vs R For Data Science-O'Reilly Media (2018)
50% (2)
Michael Grogan - Python Vs R For Data Science-O'Reilly Media (2018)
14 pages
Programming For Data Science With R Syllabus
No ratings yet
Programming For Data Science With R Syllabus
12 pages
Xin Ma - Using Classification and Regression Trees - A Practical Primer-Information Age Publishing (2018)
No ratings yet
Xin Ma - Using Classification and Regression Trees - A Practical Primer-Information Age Publishing (2018)
166 pages
UsefulStataCommands PDF
No ratings yet
UsefulStataCommands PDF
51 pages
Applied Categorical and Count Data Analysis (PDFDrive)
50% (2)
Applied Categorical and Count Data Analysis (PDFDrive)
380 pages
2016 - Gordon A. Carmichael - Fundamentals of Demographic Analysis - Concepts, Measures and Methods-Springer
No ratings yet
2016 - Gordon A. Carmichael - Fundamentals of Demographic Analysis - Concepts, Measures and Methods-Springer
405 pages
Survival Analysis For Epidemiologic
100% (2)
Survival Analysis For Epidemiologic
297 pages
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
No ratings yet
Damon Berridge - Robert Crouchley - Multivariate Generalized Linear Mixed Models Using R-CRC Press (2011)
284 pages
Data Analytics, Data Visualization and Big Data
No ratings yet
Data Analytics, Data Visualization and Big Data
25 pages
Using Stata For Quantitative Analysis Kyle C Longest instant download
100% (3)
Using Stata For Quantitative Analysis Kyle C Longest instant download
88 pages
STATA
No ratings yet
STATA
170 pages
A Comprehensive Guide To Coding And Programming In Stata 1st Edition Gafoor Rafael pdf download
No ratings yet
A Comprehensive Guide To Coding And Programming In Stata 1st Edition Gafoor Rafael pdf download
57 pages
Applications of Statistical Software For Data Analysis
80% (5)
Applications of Statistical Software For Data Analysis
5 pages
IS BF 2024 25 Week 1
No ratings yet
IS BF 2024 25 Week 1
45 pages
Download ebooks file (Ebook) A handbook of statistical analyses using Stata by Brian S. Everitt, Sophia Rabe-Hesketh ISBN 9781584884040, 1584884045 all chapters
100% (5)
Download ebooks file (Ebook) A handbook of statistical analyses using Stata by Brian S. Everitt, Sophia Rabe-Hesketh ISBN 9781584884040, 1584884045 all chapters
53 pages
Basics in Tax Accounting_wafula
No ratings yet
Basics in Tax Accounting_wafula
12 pages
Data Analysis with STATA: Explore the big data field and learn how to perform data analytics and predictive modelling in STATA
From Everand
Data Analysis with STATA: Explore the big data field and learn how to perform data analytics and predictive modelling in STATA
Prasad Kothari
4.5/5 (7)
Mastering Mesos - Sample Chapter
No ratings yet
Mastering Mesos - Sample Chapter
36 pages
Moodle 3.x Teaching Techniques - Third Edition - Sample Chapter
No ratings yet
Moodle 3.x Teaching Techniques - Third Edition - Sample Chapter
23 pages
Python Geospatial Development - Third Edition - Sample Chapter
No ratings yet
Python Geospatial Development - Third Edition - Sample Chapter
32 pages
RESTful Web API Design With Node - Js - Second Edition - Sample Chapter
0% (1)
RESTful Web API Design With Node - Js - Second Edition - Sample Chapter
17 pages
Practical Digital Forensics - Sample Chapter
100% (3)
Practical Digital Forensics - Sample Chapter
31 pages
Modular Programming With Python - Sample Chapter
No ratings yet
Modular Programming With Python - Sample Chapter
28 pages
Android UI Design - Sample Chapter
No ratings yet
Android UI Design - Sample Chapter
47 pages
Expert Python Programming - Second Edition - Sample Chapter
57% (7)
Expert Python Programming - Second Edition - Sample Chapter
40 pages
Internet of Things With Python - Sample Chapter
100% (1)
Internet of Things With Python - Sample Chapter
34 pages
Unity 5.x Game Development Blueprints - Sample Chapter
No ratings yet
Unity 5.x Game Development Blueprints - Sample Chapter
57 pages
Mastering Drupal 8 Views - Sample Chapter
0% (1)
Mastering Drupal 8 Views - Sample Chapter
23 pages
Flux Architecture - Sample Chapter
No ratings yet
Flux Architecture - Sample Chapter
25 pages
Angular 2 Essentials - Sample Chapter
0% (1)
Angular 2 Essentials - Sample Chapter
39 pages
Cardboard VR Projects For Android - Sample Chapter
No ratings yet
Cardboard VR Projects For Android - Sample Chapter
57 pages
Puppet For Containerization - Sample Chapter
No ratings yet
Puppet For Containerization - Sample Chapter
23 pages
QGIS 2 Cookbook - Sample Chapter
100% (1)
QGIS 2 Cookbook - Sample Chapter
44 pages
Practical Mobile Forensics - Second Edition - Sample Chapter
No ratings yet
Practical Mobile Forensics - Second Edition - Sample Chapter
38 pages
Mastering Hibernate - Sample Chapter
No ratings yet
Mastering Hibernate - Sample Chapter
27 pages
Troubleshooting NetScaler - Sample Chapter
No ratings yet
Troubleshooting NetScaler - Sample Chapter
25 pages
Learning Probabilistic Graphical Models in R - Sample Chapter
No ratings yet
Learning Probabilistic Graphical Models in R - Sample Chapter
37 pages
Odoo Development Cookbook - Sample Chapter
100% (1)
Odoo Development Cookbook - Sample Chapter
35 pages
Sitecore Cookbook For Developers - Sample Chapter
No ratings yet
Sitecore Cookbook For Developers - Sample Chapter
34 pages
Practical Linux Security Cookbook - Sample Chapter
100% (1)
Practical Linux Security Cookbook - Sample Chapter
25 pages
RStudio For R Statistical Computing Cookbook - Sample Chapter
100% (1)
RStudio For R Statistical Computing Cookbook - Sample Chapter
38 pages
Apache Hive Cookbook - Sample Chapter
100% (1)
Apache Hive Cookbook - Sample Chapter
27 pages
Machine Learning in Java - Sample Chapter
100% (1)
Machine Learning in Java - Sample Chapter
26 pages
Sass and Compass Designer's Cookbook - Sample Chapter
No ratings yet
Sass and Compass Designer's Cookbook - Sample Chapter
41 pages
Canvas Cookbook - Sample Chapter
No ratings yet
Canvas Cookbook - Sample Chapter
34 pages
3D Printing Designs: Design An SD Card Holder - Sample Chapter
100% (1)
3D Printing Designs: Design An SD Card Holder - Sample Chapter
16 pages
Assignment On Environmental Law
No ratings yet
Assignment On Environmental Law
21 pages
KBMM DC Drive Series Manual
No ratings yet
KBMM DC Drive Series Manual
32 pages
View PHP
No ratings yet
View PHP
2 pages
Important Facts and Trivia
No ratings yet
Important Facts and Trivia
85 pages
Renewable Energies
No ratings yet
Renewable Energies
17 pages
22 Safety Checklist (1) 18
No ratings yet
22 Safety Checklist (1) 18
1 page
Problems On DC Motor Drives
No ratings yet
Problems On DC Motor Drives
2 pages
S4d560ab0101 Ba Eng
No ratings yet
S4d560ab0101 Ba Eng
10 pages
Trencor T1460 Screen
No ratings yet
Trencor T1460 Screen
2 pages
Sayantan Kumar Kar - Interim Report
No ratings yet
Sayantan Kumar Kar - Interim Report
15 pages
Unit 7y8 - Time For A New Look - My Life Would Be Great - Activity
100% (1)
Unit 7y8 - Time For A New Look - My Life Would Be Great - Activity
4 pages
Yes Bank Corp Govn.
No ratings yet
Yes Bank Corp Govn.
25 pages
Please Read Chapters 5, 6 and 7 of Your Vaccine Text For Next Wednesday's Chapters 9, 17 and 8 For Next Friday's Lectures
No ratings yet
Please Read Chapters 5, 6 and 7 of Your Vaccine Text For Next Wednesday's Chapters 9, 17 and 8 For Next Friday's Lectures
42 pages
Open Issued GO
No ratings yet
Open Issued GO
2 pages
Kicker: DWG STR A 08-2
No ratings yet
Kicker: DWG STR A 08-2
1 page
Tonepad - Rangemaster Brian May Treble Booster
No ratings yet
Tonepad - Rangemaster Brian May Treble Booster
1 page
Benefit of Investing in MGS
No ratings yet
Benefit of Investing in MGS
2 pages
Factura/Invoice
No ratings yet
Factura/Invoice
1 page
Letters For Validators 1
No ratings yet
Letters For Validators 1
4 pages
Social Policy_UNPP Call for Expressions of Interest (1)
No ratings yet
Social Policy_UNPP Call for Expressions of Interest (1)
2 pages
AccountStatement 01 NOV 2024 to 23 APR 2025
No ratings yet
AccountStatement 01 NOV 2024 to 23 APR 2025
94 pages
Property Law Aravind.a 20181bal0009
No ratings yet
Property Law Aravind.a 20181bal0009
7 pages
V02 0409en - DS - HCPL 7860 - 2015 03 061 908727
No ratings yet
V02 0409en - DS - HCPL 7860 - 2015 03 061 908727
18 pages
Quitclaim
No ratings yet
Quitclaim
1 page
8% Income Tax
No ratings yet
8% Income Tax
6 pages
JMFSL Counter - 07.08.2023
No ratings yet
JMFSL Counter - 07.08.2023
23 pages
Fuel Charging and Controls - Turbocharger - TD4 2.2L Diesel - Turbocharger - Removal
No ratings yet
Fuel Charging and Controls - Turbocharger - TD4 2.2L Diesel - Turbocharger - Removal
4 pages

Data Analysis With STATA - Sample Chapter

Uploaded by

Data Analysis With STATA - Sample Chapter

Uploaded by

Fr

Stata is an integrated software package that provides

What you will learn from this book

Data Analysis with Stata

Data Analysis with Stata

Perform important statistical tests to become

Who this book is written for

Be guided through how to program

professional expertise distilled

Visualize and program data in Stata

Data Analysis with Stata

and survival data

a Stata data scientist

Prices do not include

Visit www.PacktPub.com for books, eBooks,

professional expertise distilled

In this package, you will find:

The author biography

About the Author

What this book covers

Introduction to Stata and

Introducing Data analytics

Introducing the Stata interface and basic techniques

Introduction to Stata and Data Analytics

Introducing data analytics

Stata programming and data management: Before predicting anything,

To view what patterns in the data look like

To check whether there are any outliers in the data

To understand the data better

To draw preliminary insights from the data

Important statistical tests in Stata: After data visualization, based on

Logistic regression in Stata: When you need to predict the outcome of a

Survey analysis in Stata: Understanding the customer sentiment and

Time series analysis in Stata: When you try to forecast a time-dependent

Introduction to Stata and Data Analytics

The Stata interface

Type the command manually.

Enter the command interactively; just click on the menu screen.

Introduction to Stata and Data Analytics

Data-storing techniques in Stata

Directories and folders in Stata

Reading data in Stata

Introduction to Stata and Data Analytics

Another option with limited variables in the dataset is as follows:

CSV (comma separated values)

Text (where the delimiter is a tab or comma)

The delimiter option: This gives instructions to Stata regarding data

Introduction to Stata and Data Analytics

The Stat/Transfer program

Manual typing or copy and paste

Variables and data types

Introduction to Stata and Data Analytics

Indicators or data variables

Examining the data

How to subset the data file using IN and IF

The list in 14/19

The list in 90/l

The list in 30/l

As is clear from the preceding example, there are three observations:

The first command lists observations from 14 to 19

The second command lists 90 observations

Introduction to Stata and Data Analytics

For a scatter plot, you have to leverage the following command:

Saving the dataset is a very easy command, and it is represented as follows:

Introduction to Stata and Data Analytics

Get more information Data Analysis with Stata

Where to buy this book

You might also like