Unit One Software Aplication in Economics
Unit One Software Aplication in Economics
1. Introduction
Currently the question of people in the world is the problems arising in the daily lives due to the
dynamic nature of the world. To answer these questions, the collection, organization, analysis
and interpretation of data is critical. Data are the information that you collect to learn, draw
conclusions, and test hypotheses. This data can be collected and stored in numerous ways,
depending on the type of data, source & context, study design, data volume & turnaround time
and data security.
The field of economic statistics and econometrics is rapidly changing. Increasing data
availability combined with powerful computing and advanced software allows research to
address issues of statistical inference and analysis in innovative ways. Statistical skills enable
you to intelligently collect, analyze and interpret data relevant to their decision-making.
Statistical concepts enable us to solve problems in a diversity of contexts. Statistical thinking
enables you to add substance to your decisions. We will apply the basic concepts and methods of
statistics you've already learned in the previous statistics and econometrics course to the real
world problems. This course is tailored to meet your needs in your research data collection using
electronic data collection systems such as Excel, Epi info, Epi data and data analysis using
widely available statistical computer packages such as STATA, SPSS, R, EViews and SAS. This
chapter will mainly give an introduction on Excel for data collection and STATA and SPSS for
data analysis. The steps of data analysis in software can be depicted as follows.
Some of the software packages for analysis and collection of data are depicted below
2. Introduction to Stata
Stata is a general-purpose statistical software package created in 1985 that uses a graphical
interface to manage, analyze, and graph data. Stata is a statistical analysis package, used for
exploring, graphing, summarizing and manipulating data files. The word Stata is a combination
of the words `statistics' and `data.' Stata is not an acronym and should not appear will all letters
capitalized. Stata is an integrated statistical analysis packaged designed for research
professionals and handling and manipulating large data sets. It is a multi-purpose statistical
package to help you explore, summarize and analyze datasets. A dataset is a collection of several
pieces of information called variables (usually arranged by columns). A variable can have one or
several values (information for one or several cases). It has ever-growing capabilities for
handling panel and time-series regression analysis. Stata utilizes command line interface so users
can type commands to perform specific tasks. Users can also run commands in batch using a do-
le. In addition, Stata has menus and dialog boxes that give the user access to nearly all built-in
commands. User-written commands can be added to Stata using ado- les. Stata is case-sensitive;
thus, it distinguishes between lower and upper case letters. Most Stata built-in commands are
lower case, a convention most programmers follow. There are commands built into Stata that
allow the user to do statistical analysis such as cross-tabulation and regression on data sets. It is a
multi-purpose statistical package to help you explore, summarize and analyze datasets.
There are four types of Stata packages; these are Stata MP (multi-processor) which is the most
powerful, Stata SE (special edition), Intercooled Stata, and Small Stata. Note: The main
difference between these versions is maximum number of variables, repressors, and observations
that can be handled.
Most features are shared by the other flavors of Stata.The version differ in basically in terms of
Looking at Stata, you'll see four principle window/ boxes: Results, Command, Variables, and
Re- view. Results window displays your input and output, which includes output procedures. If a
command generates lengthy output that one does not want to display, the user can type quietly in
front of that command. Command window is where the user enters a command. To run a
command, press enter. Stata understands most abbreviations for commands and variable names,
as long as the abbreviation is unique. For example, the user can abbreviate the command regress
to reg. However, imagine two variables named perseat and percabinet. Stata would be unable to
distinguish between the two variables if the abbreviation per was used. Further, the user could
call both variables using an asterisk, per*. Variables window displays the variables listed in the
data set. This will be blank when there is no data in Stata's memory. The user can click on
variables to include them on the command line. Review window records all previously entered
commands. The user can click on any past command to include it on the command line. Or, the
user can page-up or page-down to access past commands in the command box (Figure 1).
History Variables
Results
Buttons
Stata has a Graphical User Interface (GUI) that allows almost all commands to be accessed via
point-and-click. Simply start by clicking into the Data, Graphics, or Statistics menus, make
the relevant selections, fill in a dialog box, and click OK. Stata then behaves exactly as if the
corresponding command had been typed with the command appearing in the Stata Results and
Review windows and being accessible via PgUp and PgDn.
Datasets
Stata datasets have the .dta extension and can be loaded into Stata in the usual way through the
File menu
Data is a set of numbers and/or text describing specific phenomena. Mortality, drug
effectiveness, economy, weather, traffic, pollution levels, etc.
Stata uses and creates many types of files, which are distinguished by extensions at the end
of the filename. The extensions used by Stata are
.ado Programs that add commands to Stata, such as the SPost commands.
.do Batch files that execute a set of Stata commands.
.hlp The text displayed when you use the help command. For example, fitstat.hlp has
help for fitstat.
.smcl Output saved in the SMCL format by the log using command.
The dataset may be viewed as a spreadsheet by opening the Data Browser with the button and
edited by clicking to open the Data Editor
Stata command:
A command is typed in the Stata Command window and executed by pressing the Return (or
Enter) key.
Do-files
The crucial advantage of using the command line instead of point-and-click menus is that it
allows for the replication of results. However, all typed commands are lost once Stata is closed
(unless you manually start a command log). This can be avoided by using socalled “do-files”
where Stata commands are saved as a script in a simple text file with the ending “.do”. When the
do-file is run using the do-file editor all commands are executed subsequently. If all steps of a
project have been documented in one or more do-files, all analyses and results can be reproduced
and the whole process can be retraced by third party people. However, saving all commands for a
(bigger) project in a single do-file should be avoided. Rather, it is recommended to split up
commands in several do-files named according to the respective step in the process (e.g., data
import, data management, data analysis).
It is build up for containing the commands necessary to carry out a particular data
analysis
Double click
Editor window
A subset of commands can be highlighted and executed by clicking into. The do-file can
be saved for use in a future Stata
Log files
Log allows you to make a full record of your Stata session. A log is a file containing what you
type and Stata's output.At the beginning of a Stata session, Press the button , type
a filename into the dialog box, and choose Save.By default, this produces a SMCL (Stata
Markup and Control Language, pronounced „smicle‟) file with extension .smcl, but an ordinary
ASCII text file can be produced by selecting the .log extension.
Log files can also be opened, viewed, and closed by selecting Log from the File menu, followed
by Begin..., View..., or Close.
log using mylog, replace
log using mylog2, name(mylog2)
. log using firstfile, name(log1) text
. log using secondfile, name(log2) smcl
log close
Getting help
Select Stata Command
Keywords search and press OK fromFrequently Asked Questions (FAQs) are available
search keywords
help Keywords
Data input and output
Stata has its own data format with default extension .dta. Reading and saving a Stata file are
straightforward.
There are essentially two kinds of variables in Stata: string and numeric.The storage types are
byte, int, long, float, and double for numeric variables and str1 to str80 for string variables of
different lengths.Besides the storage type, variables have associated with them a name, a label,
and a format.Data in other formats can convert it into Stata‟s format using Stat/Transfer. Data
can also be entered by hand using a spreadsheet-style editor.
Entering Data
Insheet: Read ASCII (text) data created by a spreadsheet (.csv files only)
Infile: Read unformatted ASCII (text) data (space delimited files)
Input: Enter data from keyboard
Describe: Describe contents of data in memory or on disk
Compress: Compress data in memory
Save: Store the dataset currently in memory on disk in Stata data format
Count: Show the number of observations
List: List values of variables
Clear: Clear the entire dataset and everything else
Memory: Display a report on memory usage
Set memory: Set the size of memory
Exploring data
Describe: Describe a dataset
List List the contents of a dataset
Codebook: Detailed contents of a dataset
Log: Create a log file
Summarize: Descriptive statistics
Tabstat: Table of descriptive statistics
Table: Create a table of statistics
Stem: Stem-and-leaf plot
Graph: High resolution graphs
Kdensity: Kernal density plot
Sort: Sort observations in a dataset
Histogram: Histogram for continuous and categorical variables
Tabulate: One- and two-way frequency tables
Correlate: Correlations
Pwcorr: Pairwise correlations
Type: Display an ASCII file
Modifying Data
Analyzing Data
Ttest: t-test
Regress: Regression
Must-Know Commands
System
clear
exit
log
set
# delimit
net Data Analysis
search summarize
help correlate
graph
Data Management two way, scatter,…
Use hist
sysuse
Infile, infix Statistical Analysis
list regress
describe predict
keep, drop test
generate, replace, rename dwstat
save, out file hettest
Stata treats lines that begin with an asterisk * or are located between a pair of /* and */ as
comments that are simply echoed to the output
If a command continues over two lines, we use /* at the end of the first line and */ at the
beginning of the second line to make Stata ignore the line break.
An alternative would be to use /// at the end of the line.
Variable names are case-sensitive.
Missing value
A missing values in a numeric variable is represented by a period „.‟ (system missing values),
or by a period followed by a letter, such as .a,.b. etc.
Missing values are interpreted as very large positive numbers with . < .a < .b, etc.
Note that this can lead to mistakes in logical expressions.
Numerical missing value codes (such as „−99’) may be converted to missing values (and vice
versa) using the command mvdecode.
o mvdecode x, mv(-99)
3. Overview of SPSS
Statistical package for social science (SPSS) is a windows based program that can be used to
perform data entry and analysis. It has evolved a lot since then and is now widely used in many
areas. It is a straight forward package with a friendly environment. There is a lot of easy to
access documentation and the tutorials are very good. However, unlike some other statistical
packages, SPSS does not hold your hand all the way through your analysis. You have to make
your own decisions and for that you need to have a basic knowledge of statistics. The down side
of this is that you can make mistakes but the up side is that you actually understand what you are
doing. You are not just answering questions by clicking on window after window, you are doing
your analysis for real, which means that you understand the analytical process but also when it
comes to writing down your results, you will know exactly what to say. SPSS is updated often.
This document was written on version 20, but the differences would not cause any problems. It
performs a wide range of statistical analysis ranging from data management to advanced
modeling. It is also possible to use SPSS for data entry though there are limitations. In order to
perform all the activities by the software, you should install first.
Windows in SPSS
1. DATA EDITOR
SPSS data files are organized by cases (rows) and variables (columns). The Data Editor displays
the contents of the active data file. The information in the Data Editor consists of variables and
cases. The employee data is located under the directory C:\program files: \ SPSSEVAL. If we
open this data set, the data editor window looks like the following.
Column
A single Variable‟s value
Across all study subjects
Rows
(Single study subjects
information)
Displaying values
You can display the values of your categorical variables
as the numeric codes entered (eg 1‟s and 2‟s for gender), or
to view the value labels which you have defined in variable view (eg male and
female; see 1.3)
To view on the menu-bar, and choose value labels.
Use the luggage label button on the toolbar
Luggage label button
B. Variable view
2. Type of variables
3. Width/ Decimals
Name of variables
Name of variables are usually codes
They contain continuous alphabets without interruption (no space in between alphabets)
Example
educational status
2. Type of variables
• There are different types of variables
• It is displayed when clicked upper right corner of type
column
1. Numeric for countable (quantitative) only accepting
numerical (coding of qualitative variables is possible)
3. Width/ Decimals
The width and decimal are used to allow number of characteristic of a value of a single
variable
If numeric type of variable, it will ask to choose number of widths and decimals (as a
default the width comes 8 and decimals of 2)
If date type of variable, it may ask you to choose number of characteristics of the type of
date
If a qualitative data with words, it will ask you to choose number of characteristics you
wanted to add
Decimals
Number of decimals
It has to be less than or equal to 16
If it is date or string variable, it will not
3.14159265…
ask you decimals
4. Label of the variables
Label of a variable is detailed description of the variable name
Labeling value
It is the window in which SPSS commands can be typed and submitted for processing.
Commands saved in files can be opened in a syntax Editor window for processing.
In executing a syntax we are able to do the whole program as whole or by selecting part of
the syntax
To do the whole syntax, select the „run‟ from the pull down menu of the syntax and select
„all‟
The Viewer window displays all statistical results, tables, and charts.
SPSS Viewer
The second important feature is its use of Pull-down menu items and tool bars.
Pull-downMenu
Items
The tool bar provide a quick, easy method of accessing commonly required tasks.
Tool Bar
The pull-menu and tool bar items change from one type of window to another.
In the introduction part, we have raised that the data collection process will be done by using
electronic data collection systems such as Excel, Epi info and Epi data. Off course we can also
directly enter data in to SPSS and Stata. In this course an overview of excel is included as a data
entry tool and a priori basic knowledge about excel is expected from the students.
Microsoft Excel is a spreadsheet application program offered in the Microsoft office software
package and lets the user add information to a spreadsheet/ worksheet. Each workbook starts
with three worksheets, but you can add and delete sheets depending on what you need. Excel do
have the following interface
Worksheets are the pages that you are working with. A Worksheet is also considered a
Spreadsheet. You can see these sheets named on tabs at the bottom of the page. The Worksheets
are called Sheet 1, Sheet 2, and Sheet 3.
To work with a sheet, click your cursor on the tab of the sheet you want to work with. This
action activates the Worksheet so that you may work with that particular sheet.
To add another sheet you must click on the last sheet in the Row and click on the icon shown
below. Once you click on this icon another sheet will open.
The Worksheet Cell
The Excel sheets are divided into grids called “cells” where you can enter the Data. A cell is a
box where you can enter text or other information. It is an intersection of column and row.
The columns are indicated by the letters and the Rows with the numbers. The name of cell
starts with capital letter and follows with number. Observe the following that Cell A2 is made up
of column A and row 2. Whenever you select a particular cell its location is displayed in Name
box nearby Formula Bar.
Then add your data to worksheet.
Enter data manually in worksheet cells
You can enter numbers (with or without fixed decimal points), text, dates, or times in one cell, in
several cells at once, or on more than one worksheet.
On the worksheet, click a cell.
Type the numbers or text that you want, and then press Enter or Tab key.
To start data on a new line within a cell, enter a line break by pressing ALT+ENTER.
By default, pressing ENTER moves the selection down one cell, and pressing Tab moves the
selection one cell to the right.
Filter the Data in Cells
Select the cells to be filtered,
Click on selected cells and point on Filter command.
Now click on the funnel and deselect the data you want to filter out.
Then, click on down arrow present in the first row and deselect the number you wish to filter
out.
Borders are similar to Gridlines. The main difference is that Gridlines always stay the same, but
you can change the way your Borders look. Gridlines will show up around every Cell, but you
can choose which Cells you want to put Borders around and how you want them to look. Borders
will print automatically without having to check a box. To add border to worksheet:
Click on home tab
On font group select Draw Border Grid from drop down list.
Drag and draw border to area you wish to display grid line on the paper OR
Select cells to be bordered
Right click on selected border
Look on the following figure border icon will be pop up as soon as you right click
Filtering a List
When you use the Data, Filter, AutoFilter command, drop‐ down list arrows are displayed next to
each of the column labels in the list. When you open a drop‐ down list, a list of all the unique
entries for that column is displayed. By selecting one of the entries from the drop‐ down list,
called a filter criterion you instruct Excel what to search for. Then Excel filters the list so that
only the sets of data that contain the entry you selected will be displayed. When Filter mode is
active, arrows for the columns with filter criterion selected appear in blue on the worksheet, row
numbers appear in blue, and the status bar displays either the number of rows that meet the
criteria, or the text “Filter mode.” The sets of data that do not meet the criteria remain in the list
but they are hidden.
If you select a single cell in the list before choosing Filter drop‐ down list arrows are applied to
all of the column labels in your list. If you select multiple column labels before choosing Filter
drop‐ down list arrows are displayed only for the selected columns, thus restricting which
columns you can apply filters to. In either case, the entire list is filtered. Also, you can filter only
one list at a time on a worksheet.
Choose FILTER, button from the DATA ribbon, SORT & FILTER group.
Your list column labels will appear with drop‐ down list arrows to the right.
When you select the drop down arrow from the top of a particular column you will have
(depending on the data type) a box at the bottom of the menu with all unique values make
sure the values you wish to be seen are ticked. Select the values you are filtering
for.(Following Pictures)
When all values you wish to see are ticked (this creates OR conditions for that column)
click OK to apply the filter for that column
OR
You have sort order options at the top part of the menu which work in the same manner
as previously discussed if you select a sort order this will close the menu and apply the
filter.
Repeat step 3 until you have set filter criteria for all columns that you wish to filter by.
The list will show only those rows that match your criteria.
Each time you apply criteria to a column you create AND conditions across columns that reduce
the number of records that will be displayed. Using the simple auto filter OR conditions cannot
be applied across columns. (see advanced filter). More AND conditions = less records Whilst a
filter is active, if you print the worksheet, only visible rows will be output, so you can print out
multiple views of your data from an individual list.
When you create a table in Microsoft Office Excel, you can manage and analyze the data in that
table independently of data outside of the table. Table makes it easy to sort, filter, and format
data within a sheet.
Create an Excel Table
To insert table in a sheet:
On a worksheet, select the range of empty cells or data that you want to make a table.
On the Insert tab, in the Tables group, click Table.
If the selected range contains data that you want to display as table headers, select the My
table has headers check box.
Table headers display default names that you can change if you don't select the My table has
headers check box
After you create a table, the Table Tools become available, and a Design tab is displayed. You
can use the tools on the Design tab to customize or edit the table. OR
Select the cells and the press Ctrl + T
Overview of PivotTable
Use a PivotTable to summarize, analyze, explore, and present summary data. Use a PivotChart
to visualize this summary data in a PivotTable report, and to easily see comparisons, patterns,
and trends. Both a PivotTable report and a PivotChart report enable you to make informed
decisions about critical data in your enterprise. The following sections provide an overview of
PivotTable reports.
Pivot Table
A PivotTable is an interactive way to quickly summarize, analyze, explore, and present summary
large amounts of data. It is used to analyze numerical data in depth and to answer unanticipated
questions about your data. A PivotTable is especially designed for:
Querying large amounts of data in many user-friendly ways.
Subtotaling and aggregating numeric data, summarizing data by categories and subcategories,
and creating custom calculations and formulas.
Expanding and collapsing levels of data to focus your results, and drilling down to details
from the summary data for areas of interest.
Moving rows to column or columns to rows (or "pivoting") to see different summaries of the
source data.
Filtering, sorting, grouping, and conditionally formatting the most useful and interesting
subset of data to enable you to focus on the information that you want.
Presenting concise, attractive, and annotated online or printed reports. You often use a
PivotTable report when you want to analyze related totals, especially when you have a long list
of figures to sum and you want to compare several facts about each figure. Example: - To
summarize the number of Male, Female and Total trainees that have taken basic computer in
OICTDA starting from 2000- 2004 the steps will be the following.