stata commands
stata commands
1. How many variables are there in the dataset? [STATA has a command to count variables.
It also shows the count somewhere. Search the command and use it]
2. List all the variables [STATA has a command to list variables. Search the command and
use it]
3. List salary and sales for the first 5 observations only
4. List salary and sales for the last 5 observations only
5. Report the mean and the number of observations for each variable
6. How many financial firms are included in the dataset?
7. How many non-financial firms are included in the dataset?
8. Which industry has the highest average sales value? Which command did you use?
9. Are there any missing observations in the dataset? How many? How do you know that?
10. What is the average salary of CEOs for those working in finance or utility firms?
11. What is the average salary of CEOs for those working in finance and utility firms?
12. Make a table with the number of financial and non-financial firms on the one hand and
the CEO salary on the other hand.
13. Plot a relationship between salary and sales (search and use the stata command for plot
graph)
14. Have a histogram graph for salary
15. Make a table of correlations for all tables. (search and use the stata command for
correlation)
16. What is the correlation between salary and sales?
17. What is the correlation between salary and finance?
18. Generate a dummy variable called “large” which is 1 if the CEO works in a firm whose
sales exceeds the average sales value in the dataset and 0 otherwise.
19. Label the variable “large” as “=1 if the firm is large”
20. How many financial firms are “large”?
21. What is the percentage financial firms in the “large” firms?
22. Find the percentage of financial firms and utility firms in the data-set
23. Generate a new variable called “benefit” which is 0 for those CEOs who earn a salary
below 4000, 1 for those who earn a salary between 4,000 and 10,000 (including 4,000),
2 for those who earn a salary above 10,000 (including 10,000)
24. Generate a variable called “logsalary” which is the natural logarism of the variable
“salary” and label it “log of salary”
25. Generate a variable called “logsalary” which is the natural logarism of the variable
“pcsalary” and label it “log of pcsalary”. How many missing observations do you have?
Why?
26. Generate a variable called “roesqr” which is the square of the variable “roe”
27. Generate a variable called “ssratio” which is salary divided by sales?
28. In the dataset the first variable is “salary”. Move this variable as last in the list.
29. Move the variable “ros” above the variable “roe”
30. Save your dataset with name CEOSAL1NEW.dta
// 5. Report the mean and the number of observations for each variable
summarize
collapse (mean) sales, by(industry) // Computes the mean sales for each industry
// 10. Calculate the average salary of CEOs for those working in finance or utility firms
// 11. Calculate the average salary of CEOs for those working in finance and utility firms
// 12. Make a table with the number of financial and non-financial firms along with CEO salary
histogram salary
// 18. Generate a dummy variable "large" based on sales exceeding the average sales value
summarize sales
// 22. Calculate the percentage of financial firms and utility firms in the dataset
// 24. Generate a variable "logsalary" which is the natural log of "salary" and label it
// 25. Generate a variable "logsalary" for "pcsalary" and identify missing observations
count if missing(logsalary_pcsalary)
// 26. Generate a variable "roesqr" which is the square of "roe"
// 28. Move the variable "salary" to the last position in the dataset