Stata Slides
Stata Slides
Stata
What is Stata?
• Stata is a comprehensive statistical software used for
data analysis, data management, and graphics
• What is a Do-File?
• Explanation of a Do-file as a script containing Stata
commands.
Agenda
Introduction to Stata
Introduction to the assignment
Simple step wise guidelines to carry out the assigned tasks.
Input of data in Stata
Creating a self explanatory Do file
Carrying out analysis
Interpretation of results
12
1. Dataset:
You can use real data from publicly available sources.
(e.g., World Bank, UCI Machine Learning
Repository).
2. For simplicity, let’s use a small hypothetical dataset
for this example.
15
Input y x
1
2
3
end
16
Hypothetical Data:
Assignment Tasks:
Task 1: Run the
Regression in Stata
Model: Y=β0+β1X+u
Where Y is household
consumption and X is
household income.
Command
regress consumption income
Task 2: Interpret
the Coefficients
•Explain what the slope (β1) means.
•For instance, if β1=0.5 it means that for every 1 unit
increase in household income, consumption increases
by 0.5 units.
•Interpret the intercept (β0) and the significance levels
(p-values).
19
Conclusion:
In this assignment, students will:
•Learn to run a simple linear regression.
•Understand the interpretation of
regression coefficients.
Thank You
Regards,
Fatima.
Practice:2 Hands On Activity
T W O S M A L L D ATA S E T S
Agenda:
• clear
• . input Y X
• end
2 Data Sets given on Page no: 65
Entering
Data Set
1:
Entering 1st data Set:
• . clear
• . input Y X
• Y X
• 1. 70 80
• 2. 65 100
• 3. 90 120
• 4. 95 140
• 5. 110 160
• 6. 120 180
• 7. 130 200
• 8. 140 220
• 9. 155 240
• 10. 150 260
• 11. end
• .
• . gen sample = 1
• br
Browsing 1st sample Data Set
Now give the command of clear and enter
sample 2
• Clear
• Input Y X
• 55 80
• 60 88
• 70 100
• 80 120
• 95 140
• 110 160
• 118 180
• 145 220
• 150 240
• 175 260
• end
Generating sample 2 and saving it:
• gen sample = 2
• List
• Br
Y X sample
70 80 1
65 100 1
90 120 1
95 140 1
110 160 1
120 180 1
130 200 1
140 220 1
155 240 1
150 260 1
55 80 2
60 88 2
70 100 2
80 120 2
95 140 2
110 160 2
118 180 2
145 220 2
150 240 2
175 260 2
Using List Command
• Command syntax:
• List
| Y X|
• |-----------------|
• 1. | 15000 25000 |
• 2. | 18000 30000 |
• 3. | 30000 50000 |
• 4. | 35000 60000 |
• 5. | 40000 70000 |
• |-----------------|
• 6. | 50000 80000 |
• 7. | 55000 100000 |
• 8. | 600000 110000 |
• 9. | 70000 115000 |
• 10. | 80000 125000 |
• |-----------------|
• 11. | . .
• 12. | . .
Next set of commands to be executed:
•Now running individual regression analysis on each sample to obtain estimates and predict yhat:
•reg y x
•predict yhat1
•gen residuals_square=res^2
•scatter y x
•use temp_sample2,clear
•reg y x
•predict yhat2
•scatter y x
• Input y x
• 1
• 2
• 3
• 4. .
• End
Using List if missing (X) Command:
. list if missing(X)
• |Y X|
• |-------|
• 11. | . . |
• 12. | . . |
• +-------+
Drop if X is missing:
• edit
• Data editor will open, we will manually enter the values and then save every
individual value before entering new value or we can use drop command to drop
the missing value
• List if missing (X)
• summarize X
• Summarize Y
• br