0% found this document useful (0 votes)

145 views6 pages

Stata Session 1 KA (Class)

The document discusses basic data management tasks in Stata including describing datasets, inspecting missing values, computing new variables, recoding and labeling variables. It also discusses importing data, opening datasets, listing variables, dropping and keeping observations, and defining value labels.

Uploaded by

jmn 06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views6 pages

Stata Session 1 KA (Class)

Uploaded by

jmn 06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data management and manipulation: use C:\filepath.

dta, clear

Describe all data set in memory:

Stata Session 1
describe
The objectives of this session is to get introduced to Stata’s
basic data management commands which will enable you to Describe specific variables:
explore your data set. Therefore by the end of the session you describe [varlist]
will be able to: describe your dataset, inspect missing values,
compute new variables, recode and label variables, Another alternative to the command describe is

The Dirty Data Theorem states that “real world” data tends codebook
to come from bizarre and unspecifiable distribution of
codebook examines the variable names, labels, and data to
highly correlated variables and have unequal sample sizes, produce a codebook describing the dataset.
missing data points, non-independent observations and an
indeterminate number of inaccurately recorded values. codebook [varlist]

-Unknown inputting data into stata

input patient_id age weight

Basic stata code structure:
1 12 45
command varlist, options
2 18 67
Use stata help files:
3 13 50
help stata_command
4 26 77
 Importing data sets into stata:
5 25 75
Safest way is to import .csv files as the .csv format is recognized by
6 13 56
most statistical platforms.
7 36 85
import delimited C:\filepath.csv\
8 19 52
 Opening a regular .dta (Stata format)
end encode creates a new variable named newvar based on the
string variable varname, creating, adding to, or just using
List all variables for all observations
(as necessary) the value label newvar or, if specified, name.
list
destring converts variables in varlist from string to
List all variables for specific number of observations numeric.
list in 1/6 ===================================

list specific variables for specific number of observations webuse hbp2, clear

list [varlist] in 1/6 codebook

delete the first 6 observations encode sex, gen (gender)

drop in 1/6 codebook

===================================
keep the first 10 observations
webuse destring1, clear
keep in 1/10
destring id, replace
same logic applies to variables
replace total = "toto" in 2
drop [varlist]
destring total, replace

===================================
Defining labels and values for variables
label define age_cat 1 “less than 20” 2 “20-24” 3
“25-29” 4 “30-34” 5 “35-39” 6 “40-44” 7 “45+”

encode and destring functions

Application 0 that the hourly wage would range from 0 to 300$. What can
you tell? Hint: you can use the option detail
Open wws data set. However before starting to work on
any stata data set we need to open a log file to save our
output and more importantly use a do-file-editor which is
the equivalent of the SPSS syntax file. inspect the variables married and nevermarried. What can
you note?
log using "C:\filepath\wws.log"

use "C:\filepath.dta", clear

use the describe command to inspect your data set.

How many variables are there? How many
observations?

check the variable collgrad. What is this variable

suppose to tell us? Does it have any missing
values?

check the variable race. It should consist of only

3 levels. What is the idcode of the erroneous
entry?

inspect the variable wage, which contains info about hourly

wage in dollars of previous week. Prior knowledge tells us
Application 1 How to find the average (mean), the median and the standard
deviation of the variable bmi?
Open bmi data set. However before starting to work on any
stata data set we need to open a log file to save our output Hint: help summarize
and more importantly use a do-file-editor which is the Now let’s calculate the average bmi seperatly for males and females,
equivalent of the SPSS syntax file. but first we have to assign value labels for gender.
log using "C:\filepath\bmi.log"
A faster way to do it is by using the command bysort
use "C:\filepath.dta", clear help bysort

what is the storage type of the variable “name”? bysort sex: sum bmi

what’s the number of observations? Number of variables?

Give the variable energy the following label “total energy What do you notice? Any extreme observations?
expenditure” and “body mass index” for the variable bmi.
Try and replace the extreme observations by a missing value

Are there any missing observations?

let’s recalculate the bmi for males and females separately.
Using the inspect command provide a quick summary of your
data. can you pinpoint any abnormal entry? We need to categorize bmi into 4 categories: underweight, normal,
overweight and obese using the following ranges for categorization:
Suppose we want to give a unique id number for each respondent
to be able to track him more easily. <20 (underweight), 20-25 (normal), 26-30 (overweight), >30 (obese)

gen id=_n

replace respondent with id # 17 with a positive bmi value. gen bmi_cat=.

replace bmi=26 if id==17 replace bmi_cat=1 if bmi >=0 & bmi<20

replace bmi_cat=2 if bmi >=20 & bmi<=25

another way to do it
replace bmi_cat=3 if bmi >25 & bmi<=30
replace bmi=25 in 17
replace bmi_cat=4 if bmi >30 & bmi!=. Start by describing and inspecting the data set:
label define bmi_cat 1 “underweight” 2 “normal” 3 - How many respondents are there?
“overweight” 4 “obese”
- What’s the number of variables used in this data set?
let’s give a label “bmi categorized” for the variable bmi_cat
- provide appropriate labels for the following variables: age gender
label var bmi_cat "bmi categorized"
marital_stat education height weight
let us check the newly created variable
- What are the values assigned for variables gender and marital
tab bmi_cat status?

suppose we want to merge the 2 categories obese and overweight - Assign the following values for gender 1(males), 2 (females);
together marital status 1 (never married) 2 (married) 3 (divorced)

recode bmi_cat (4=3), gen (bmi_cat1) - how many missing values do we have for the following variables:
age gender marital_stat education height weight.
label define bmi_cat1 1 “underweight” 2 “normal” 3
“overweight” - categorize age into 4 groups 14 to 29, 30 to 49, 50 to 69 and >69

give a label for the newly created variable - appropriately label each category.

check the distribution of bmi_cat1

- using the formula weight/height in meter squared, calculate the
now recreate bmi_cat1 but let’s call it bmi_cat2 using the
BMI of inmates
“generate” method.
However first we have to transform height from cm to meters
gen bmi_cat2=bmi_cat

replace bmi_cat2=3 if bmi_cat2==4 hint: help gen

label define bmi_cat2 1 “underweight” 2 “normal” 3 - knowing that the condition “and” is denoted as “&” and the
“overweight” condition “or” is denoted as “|” categorize the newly
created bmi into 4 categories as follow
Application 2
FOR FEMALE INNAMTES: <18.5 (underweight), 18.5-25
Open dataset inmates.dta. (normal), 26-30 (overweight), >30 (obese)
FOR MALE INMATES: <20 (underweight), 20-25 (normal), 26-30
(overweight), >30 (obese)

Produce the mean bmi for male and female inmates separately in two
different ways.

- generate a variable summarizing whether a respondent has at least 1

chronic disease (diabetes hyperlipidemia anemia asthma
migraine)

Installing XAMPP: Step 1: Download
No ratings yet
Installing XAMPP: Step 1: Download
13 pages
Concept Notes - Computer Lyst2449
No ratings yet
Concept Notes - Computer Lyst2449
234 pages
Stata Notebook
No ratings yet
Stata Notebook
9 pages
Care and Maintenance of Biomedical Equipments Tagudin Gen Hosp
100% (1)
Care and Maintenance of Biomedical Equipments Tagudin Gen Hosp
84 pages
Vacancies The Tala Hospital
No ratings yet
Vacancies The Tala Hospital
2 pages
Knowledge Management Practices and Balanced Scorecard Outcomes - An Organizational Performance Perspective
100% (1)
Knowledge Management Practices and Balanced Scorecard Outcomes - An Organizational Performance Perspective
14 pages
MPharm Professional Numeracy - Diagnostic Test
0% (1)
MPharm Professional Numeracy - Diagnostic Test
8 pages
1.introduction To Pharmacology
No ratings yet
1.introduction To Pharmacology
27 pages
Introduction To STATA
No ratings yet
Introduction To STATA
57 pages
Stata All Command (Jahidul)
No ratings yet
Stata All Command (Jahidul)
13 pages
Nav2013 Enus Csintro 02
No ratings yet
Nav2013 Enus Csintro 02
50 pages
BIT2204 Introduction To Data Communication and Computer Networks PDF
67% (3)
BIT2204 Introduction To Data Communication and Computer Networks PDF
83 pages
Business Communication All Units With Imp Question
No ratings yet
Business Communication All Units With Imp Question
53 pages
Emergency OGs at HWC
No ratings yet
Emergency OGs at HWC
98 pages
SQL Wildcards
No ratings yet
SQL Wildcards
41 pages
Sir Kriss First Pcy 101 Ca Test
No ratings yet
Sir Kriss First Pcy 101 Ca Test
2 pages
Decision Support Systems
No ratings yet
Decision Support Systems
420 pages
Health Informatics Technology in Population Healthcare Analytics
100% (1)
Health Informatics Technology in Population Healthcare Analytics
54 pages
BioTime Installation Guide
No ratings yet
BioTime Installation Guide
12 pages
Practicum Report For Eda 812 Written by
No ratings yet
Practicum Report For Eda 812 Written by
11 pages
Pharmaceutical Financing Strategies - Ch.11
No ratings yet
Pharmaceutical Financing Strategies - Ch.11
18 pages
Reliability, Validity, and Scaling
No ratings yet
Reliability, Validity, and Scaling
16 pages
Should College Students Be Tested For AIDSx - Edited (1) .Edited
100% (1)
Should College Students Be Tested For AIDSx - Edited (1) .Edited
4 pages
Adult STS Lesson 78 - Christ Rejected in His County
No ratings yet
Adult STS Lesson 78 - Christ Rejected in His County
4 pages
Term Paper of Cpu Scheduling in Linux and Unix
100% (2)
Term Paper of Cpu Scheduling in Linux and Unix
11 pages
MINISTRY OF HEALTH - Recruitment
No ratings yet
MINISTRY OF HEALTH - Recruitment
3 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
The Assignment On Medication Errors in A Hospital & Some Examples of Adverse Reactions and Poisoning Incidences
No ratings yet
The Assignment On Medication Errors in A Hospital & Some Examples of Adverse Reactions and Poisoning Incidences
25 pages
Information Technology in Quality Services and Patient Safety (Electronic Medical Record)
No ratings yet
Information Technology in Quality Services and Patient Safety (Electronic Medical Record)
48 pages
SONTU Digital Radiography System (Human Use)
No ratings yet
SONTU Digital Radiography System (Human Use)
10 pages
Electronic Medical Record System Proposal
No ratings yet
Electronic Medical Record System Proposal
6 pages
BUAD 801 Summary
No ratings yet
BUAD 801 Summary
5 pages
Demotivational Factors
No ratings yet
Demotivational Factors
7 pages
Analysis of A Comprehensive Wellness Program's Impact On Job Satisfaction in The Workplace
No ratings yet
Analysis of A Comprehensive Wellness Program's Impact On Job Satisfaction in The Workplace
21 pages
Mesfin Mulu
100% (1)
Mesfin Mulu
103 pages
Model Pharmacy Assignment
100% (2)
Model Pharmacy Assignment
5 pages
Malaria Rapid Diagnostic Test
No ratings yet
Malaria Rapid Diagnostic Test
4 pages
ERP Program Speech Kachaje Henry
100% (3)
ERP Program Speech Kachaje Henry
20 pages
Diatron Communication Protocols r11 - FINAL
No ratings yet
Diatron Communication Protocols r11 - FINAL
43 pages
Health Information Systems Acquisition and Implementation
No ratings yet
Health Information Systems Acquisition and Implementation
10 pages
Case 1
No ratings yet
Case 1
24 pages
Biaka University Institution of Buea (Buib) : "The Audacity To Be Different"
No ratings yet
Biaka University Institution of Buea (Buib) : "The Audacity To Be Different"
24 pages
Chapter 3 Data Representation and Computer Arithmetic
No ratings yet
Chapter 3 Data Representation and Computer Arithmetic
13 pages
What Is Cybersecurity and Networking
No ratings yet
What Is Cybersecurity and Networking
4 pages
University of Sierra Leone (Comahs) : Assignment
No ratings yet
University of Sierra Leone (Comahs) : Assignment
3 pages
Compensation, Reward and Retention Practices in Fast-Growth Companies
No ratings yet
Compensation, Reward and Retention Practices in Fast-Growth Companies
14 pages
Office Management Tools Sylabi
No ratings yet
Office Management Tools Sylabi
3 pages
BSS Inter Net Question
No ratings yet
BSS Inter Net Question
12 pages
Jiranna Healthcare's
No ratings yet
Jiranna Healthcare's
5 pages
DR Embrace en
No ratings yet
DR Embrace en
2 pages
E-MDs Utilities Guide 7.2.1 R00 - 20111018
No ratings yet
E-MDs Utilities Guide 7.2.1 R00 - 20111018
154 pages
AutoREID Operation Manual en V1 9
No ratings yet
AutoREID Operation Manual en V1 9
51 pages
Apsmo: Olympiad
No ratings yet
Apsmo: Olympiad
4 pages
Richard Boateng, PHD.: It Policy in Firms
No ratings yet
Richard Boateng, PHD.: It Policy in Firms
29 pages
Computer Virus
No ratings yet
Computer Virus
7 pages
The Crisis of Spiritual Leadership
No ratings yet
The Crisis of Spiritual Leadership
4 pages
2025 04 11 Meet-Your-New-Household-Robot
No ratings yet
2025 04 11 Meet-Your-New-Household-Robot
8 pages
Di̇lko 12.sinif Deneme Cevap Anahtari PDF
No ratings yet
Di̇lko 12.sinif Deneme Cevap Anahtari PDF
1 page
Health Information
No ratings yet
Health Information
30 pages
Nftables - The Ip (6) Tables Successor
100% (1)
Nftables - The Ip (6) Tables Successor
24 pages
For Final Year Project Overview
No ratings yet
For Final Year Project Overview
11 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
59 pages
MG 1351 - Principles of Management 20 Essay Questions and Answers
No ratings yet
MG 1351 - Principles of Management 20 Essay Questions and Answers
15 pages
Keys To Effective Journalism in The Multimedia Era
No ratings yet
Keys To Effective Journalism in The Multimedia Era
13 pages
Year 3 Reasoning Test Set 2 Paper A
No ratings yet
Year 3 Reasoning Test Set 2 Paper A
8 pages
Remote MEDITECH LAB Support Contractor in Albany NY Resume Mark Snyder
No ratings yet
Remote MEDITECH LAB Support Contractor in Albany NY Resume Mark Snyder
2 pages
Openbravo Obtt2 Platform Course Guide
No ratings yet
Openbravo Obtt2 Platform Course Guide
7 pages
Introduction To The Motherboard Meet 1
No ratings yet
Introduction To The Motherboard Meet 1
8 pages
Rayalaseema University: Rayalaseema University, Kurnool. List of Candidates Exempted From Pre-Ph.D., Course
No ratings yet
Rayalaseema University: Rayalaseema University, Kurnool. List of Candidates Exempted From Pre-Ph.D., Course
9 pages
COMP5110 Lecture 1 - Introduction To Software Engineering - Ethics
No ratings yet
COMP5110 Lecture 1 - Introduction To Software Engineering - Ethics
33 pages
Finance Assignment Help - Capital Project Case Study Part 1
No ratings yet
Finance Assignment Help - Capital Project Case Study Part 1
3 pages
FI01 - Us - Kap07 RD500 - 2015
No ratings yet
FI01 - Us - Kap07 RD500 - 2015
38 pages
AM Infection Control Final
0% (1)
AM Infection Control Final
2 pages
Manual Theta76 en
No ratings yet
Manual Theta76 en
154 pages
Power Suply ICE2AS NCP1200 BIT3105 - FLF1521 LCD Power Supply
No ratings yet
Power Suply ICE2AS NCP1200 BIT3105 - FLF1521 LCD Power Supply
3 pages
Helpdesk CV Template PDF
No ratings yet
Helpdesk CV Template PDF
2 pages
Staad To Afes - Google Search
No ratings yet
Staad To Afes - Google Search
2 pages
Computer Graphics
No ratings yet
Computer Graphics
14 pages
0063 Course PHP Advanced Tutorial
No ratings yet
0063 Course PHP Advanced Tutorial
80 pages
Cisco Packet Tracer Installation Steps: o .Exe o o o .DMG
No ratings yet
Cisco Packet Tracer Installation Steps: o .Exe o o o .DMG
3 pages
Tum 4560
No ratings yet
Tum 4560
32 pages
7941 17755 1 SM
No ratings yet
7941 17755 1 SM
17 pages
Simple Presentation On Artificial Intelligence
No ratings yet
Simple Presentation On Artificial Intelligence
7 pages
Gaurav Kumar Resume
No ratings yet
Gaurav Kumar Resume
3 pages
IT Department Final Examinations Schedule, Semester I, AY 2022-2023 (Draft Version) 2 PDF
No ratings yet
IT Department Final Examinations Schedule, Semester I, AY 2022-2023 (Draft Version) 2 PDF
4 pages
Secure Print Mode Overview and Guide For Windows 10 Users
No ratings yet
Secure Print Mode Overview and Guide For Windows 10 Users
6 pages
Karamchandani Simran Resume
No ratings yet
Karamchandani Simran Resume
1 page
Datasheet MB980 Ibase
No ratings yet
Datasheet MB980 Ibase
1 page
Big Data and Hadoop - Suzanne
No ratings yet
Big Data and Hadoop - Suzanne
5 pages
Digital Systems Design Using VHDL
No ratings yet
Digital Systems Design Using VHDL
1 page

Stata Session 1 KA (Class)

Uploaded by

Stata Session 1 KA (Class)

Uploaded by

Data management and manipulation: use C:\filepath.

Describe all data set in memory:

-Unknown inputting data into stata

input patient_id age weight

list [varlist] in 1/6 codebook

delete the first 6 observations encode sex, gen (gender)

drop in 1/6 codebook

encode and destring functions

use "C:\filepath.dta", clear

use the describe command to inspect your data set.

check the variable collgrad. What is this variable

check the variable race. It should consist of only

inspect the variable wage, which contains info about hourly

what’s the number of observations? Number of variables?

Are there any missing observations?

replace respondent with id # 17 with a positive bmi value. gen bmi_cat=.

replace bmi=26 if id==17 replace bmi_cat=1 if bmi >=0 & bmi<20

replace bmi_cat=2 if bmi >=20 & bmi<=25

check the distribution of bmi_cat1

replace bmi_cat2=3 if bmi_cat2==4 hint: help gen

- generate a variable summarizing whether a respondent has at least 1

You might also like