0% found this document useful (0 votes)
25 views

stata commands

Uploaded by

Tilahun Wegene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

stata commands

Uploaded by

Tilahun Wegene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Use the dataset CEOSAL1.

dta to answer the following questions

1. How many variables are there in the dataset? [STATA has a command to count variables.
It also shows the count somewhere. Search the command and use it]
2. List all the variables [STATA has a command to list variables. Search the command and
use it]
3. List salary and sales for the first 5 observations only
4. List salary and sales for the last 5 observations only
5. Report the mean and the number of observations for each variable
6. How many financial firms are included in the dataset?
7. How many non-financial firms are included in the dataset?
8. Which industry has the highest average sales value? Which command did you use?
9. Are there any missing observations in the dataset? How many? How do you know that?
10. What is the average salary of CEOs for those working in finance or utility firms?
11. What is the average salary of CEOs for those working in finance and utility firms?
12. Make a table with the number of financial and non-financial firms on the one hand and
the CEO salary on the other hand.
13. Plot a relationship between salary and sales (search and use the stata command for plot
graph)
14. Have a histogram graph for salary
15. Make a table of correlations for all tables. (search and use the stata command for
correlation)
16. What is the correlation between salary and sales?
17. What is the correlation between salary and finance?
18. Generate a dummy variable called “large” which is 1 if the CEO works in a firm whose
sales exceeds the average sales value in the dataset and 0 otherwise.
19. Label the variable “large” as “=1 if the firm is large”
20. How many financial firms are “large”?
21. What is the percentage financial firms in the “large” firms?
22. Find the percentage of financial firms and utility firms in the data-set
23. Generate a new variable called “benefit” which is 0 for those CEOs who earn a salary
below 4000, 1 for those who earn a salary between 4,000 and 10,000 (including 4,000),
2 for those who earn a salary above 10,000 (including 10,000)
24. Generate a variable called “logsalary” which is the natural logarism of the variable
“salary” and label it “log of salary”
25. Generate a variable called “logsalary” which is the natural logarism of the variable
“pcsalary” and label it “log of pcsalary”. How many missing observations do you have?
Why?
26. Generate a variable called “roesqr” which is the square of the variable “roe”
27. Generate a variable called “ssratio” which is salary divided by sales?
28. In the dataset the first variable is “salary”. Move this variable as last in the list.
29. Move the variable “ros” above the variable “roe”
30. Save your dataset with name CEOSAL1NEW.dta

use CEOSAL1.dta // Load the dataset

// 1. Count the number of variables in the dataset

ds // Display the dataset structure; it shows the number of variables

// 2. List all the variables

ds // Display all variables in the dataset

// 3. List salary and sales for the first 5 observations only


list salary sales in 1/5

// 4. List salary and sales for the last 5 observations only

list salary sales in -5/l

// 5. Report the mean and the number of observations for each variable

summarize

// 6. Count how many financial firms are included in the dataset

count if industry == "finance"

// 7. Count how many non-financial firms are included in the dataset

count if industry != "finance"

// 8. Find the industry with the highest average sales value

collapse (mean) sales, by(industry) // Computes the mean sales for each industry

// 9. Check for missing observations in the dataset


desc, count // Shows the count of missing observations for each variable

// 10. Calculate the average salary of CEOs for those working in finance or utility firms

summarize salary if industry == "finance" | industry == "utility"

// 11. Calculate the average salary of CEOs for those working in finance and utility firms

summarize salary if industry == "finance" & industry == "utility"

// 12. Make a table with the number of financial and non-financial firms along with CEO salary

tabulate industry, summarize(salary)

// 13. Plot a relationship between salary and sales

scatter salary sales

// 14. Histogram graph for salary

histogram salary

// 15. Make a table of correlations for all variables


pwcorr

// 16. Calculate the correlation between salary and sales

pwcorr salary sales

// 17. Calculate the correlation between salary and finance

gen finance_dummy = (industry == "finance")

pwcorr salary finance_dummy

// 18. Generate a dummy variable "large" based on sales exceeding the average sales value

summarize sales

gen large = (sales > r(mean))

// 19. Label the variable "large"

label variable large "=1 if the firm is large"

// 20. Count how many financial firms are "large"

count if industry == "finance" & large == 1


// 21. Calculate the percentage of financial firms in the "large" category

tabulate large industry, row

// 22. Calculate the percentage of financial firms and utility firms in the dataset

tabulate industry, all row

// 23. Generate a new variable "benefit" based on CEO salary ranges

gen benefit = cond(salary < 4000, 0, cond(salary <= 10000, 1, 2))

// 24. Generate a variable "logsalary" which is the natural log of "salary" and label it

gen logsalary = log(salary)

label variable logsalary "log of salary"

// 25. Generate a variable "logsalary" for "pcsalary" and identify missing observations

gen logsalary_pcsalary = log(pcsalary)

count if missing(logsalary_pcsalary)
// 26. Generate a variable "roesqr" which is the square of "roe"

gen roesqr = roe^2

// 27. Generate a variable "ssratio" which is salary divided by sales

gen ssratio = salary / sales

// 28. Move the variable "salary" to the last position in the dataset

order salary last

// 29. Move the variable "ros" above the variable "roe"

order ros roe

// 30. Save the modified dataset

save CEOSAL1NEW.dta, replace

You might also like