100% found this document useful (13 votes)
263 views16 pages

A Comprehensive Guide To Coding and Programming in Stata 1st Edition Verified Download

A Comprehensive Guide to Coding and Programming in Stata, authored by Rafael Gafoor, provides an introduction to essential Stata commands and programming techniques for data analysis. The book covers various topics including variable management, loops, data manipulation, and automated reporting, aimed at helping users from diverse fields such as medical statistics and epidemiology. It emphasizes the importance of organized programming environments and offers practical tips for effective data analysis using Stata.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (13 votes)
263 views16 pages

A Comprehensive Guide To Coding and Programming in Stata 1st Edition Verified Download

A Comprehensive Guide to Coding and Programming in Stata, authored by Rafael Gafoor, provides an introduction to essential Stata commands and programming techniques for data analysis. The book covers various topics including variable management, loops, data manipulation, and automated reporting, aimed at helping users from diverse fields such as medical statistics and epidemiology. It emphasizes the importance of organized programming environments and offers practical tips for effective data analysis using Stata.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

A Comprehensive Guide to Coding and Programming in Stata

1st Edition

Visit the link below to download the full version of this book:

https://medipdf.com/product/a-comprehensive-guide-to-coding-and-programming-in-s
tata-1st-edition/

Click Download Now


A Comprehensive Guide
to Coding and
Programming in Stata

Rafael Gafoor
Designed cover image: © Shutterstock, Stock vector ID 1714491562, Vector Contributor
Iurii Motov

First edition published 2024


by CRC Press
2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431

and by CRC Press


4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2024 Rafael Gafoor

Reasonable efforts have been made to publish reliable data and information, but the
author and publisher cannot assume responsibility for the validity of all materials or the
consequences of their use. The author and publisher have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other
means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, access www.
copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC
please contact [email protected]

Trademark notice: Product or corporate names may be trademarks or registered


trademarks and are used only for identification and explanation without intent to
infringe.

ISBN: 978-1-032-77485-5 (hbk)


ISBN: 978-1-032-77565-4 (pbk)
ISBN: 978-1-003-48377-9 (ebk)
DOI: 10.1201/9781003483779

Typeset in Minion
by MPS Limited, Dehradun
Contents

Foreword, vii
About the Author, viii

CHAPTER 1 ■ Introduction 1

CHAPTER 2 ■ Temporary Names, Variables and Files 6

CHAPTER 3 ■ Macros and Other Data Storage


Mechanisms Used by Stata 11

CHAPTER 4 ■ Variables, Variable Names, Value Label


Names, Value Labels and Values 44

CHAPTER 5 ■ Loops 48

CHAPTER 6 ■ Append, Merge and Collapse 56

CHAPTER 7 ■ Reshape 63

CHAPTER 8 ■ Dates 68

CHAPTER 9 ■ Bits and Bobs 73

CHAPTER 10 ■ Helpful Hints When Doing Regression 90

CHAPTER 11 ■ Time Series Operators and Survival


Analyses 105

v
vi ▪ Contents

CHAPTER 12 ■ Exporting Output 108

CHAPTER 13 ■ Stata Programming 110

CHAPTER 14 ■ Tables of Baseline Characteristics 121

CHAPTER 15 ■ Automated Reporting 137

CHAPTER 16 ■ pretty_suite Packages for Easy Tables 140

CHAPTER 17 ■ Tables with Output from Statistical Tests 148

INDEX, 161
Foreword

T HIS BOOK WAS WRITTEN AT A TIME WHEN I HAD JUST CHANGED ROLES FROM
being a medical statistician (primarily concerned with the analysis
of data from randomised controlled trials) to being an analyst for large
datasets. I discovered that I had to learn new suites of commands in Stata
and that there were advanced functions that I had never previously
explored. This book comes from this experience and introduces the
reader to the commands that I found the most useful. This is not to say
that the commands I have chosen to explain are exhaustive in any sense
of the word.
I wanted the book to be a gentle introduction to the most commonly
used commands for the analyses of data obtained from both
experimental as well as observational studies. I have relied on my own
personal experience for guidance. No doubt you will find the book
incomplete; however I hope that this guide will provide a sound platform
from which you can explore further as you become an advanced Stata
programmer.
This book is suitable for a wide range of professions involved in data
analysis (medical statisticians, epidemiologists, data analysts, etc.).

vii
About the Author

Rafael Gafoor is a Chartered Statistician at the


Comprehensive Clinical Trials Unit at University
College London (UCL). Dr. Gafoor obtained both
his master’s degrees in Epidemiology and Medical
Statistics from the London School of Hygiene and
Tropical Medicine and his PhD from the Institute of
Psychiatry (King’s College London). He has worked
as a medical statistician as well as an epidemiologist.
His research interests include epidemiology, clinical sciences, public
health and health services and systems. Dr. Gafoor is a consultant
psychiatrist in the NHS and has a keen interest in the analysis of datasets
with mental health outcomes (from both observational as well as
experimental studies).

viii
CHAPTER 1

Introduction

THE PROGRAMMING ENVIRONMENT


It’s important to keep a very organised programming environment and
to make sure you do not EVER overwrite your primary dataset.
Therefore, you should from the outset, create a directory entitled
something along the lines of “Analysis Folder” and then place three
folders within it.

01_Data_In
02_Programming
03_Data_Out

You can place additional folders within these folders, but this is the main
structure.
NOTE: It is important to use the underscore and not leave blank
spaces when programming. While leaving blank space is not necessarily
an issue for Stata users in Windows, it becomes a major issue for some R
commands and in other programming environments.

DATA DOWNLOAD
You should now place the files for analysis in your file named
“01_Data_In”. This file now should never be overwritten. Any temporary
files or files which you make in the interim should now be placed in the
folder named “03_Data_Out”.

DOI: 10.1201/9781003483779-1 1
2 ▪ A Comprehensive Guide to Coding and Programming in Stata

R AND STATA (FILEPATHS)


Many statisticians and data analysts program in both R and Stata. It is
possible to program entirely in R, and in the future, this may be the
preferable programming language. However, if you are new to analysis
(and to programming) you may find it easier to start in Stata. Depending
on where you study or work, you may have little choice in the program
you use to analyse data. You may be working on a project that someone
else has started in which case it wouldn’t make much sense to start all
over again and reinvent the wheel. Alternatively, you may be required to
program in a given language in your workplace so as to enable more
efficient working across projects.
The most pressing issue at this stage is how to write filepaths so that
they can be understood in both computing environments. This is a
backslash “\”; it is named because of the direction of the TOP of the
hyphen with respect to the bottom. This “/” is a forward slash.
NOTE: The direction of the hyphens. Stata doesn’t really care which
one you use – either backslash or forward slash but R is very rigid and
will only accept forward slash. The Windows operating system produces
file paths using backslashes.
However, if you are going to use R and Stata simultaneously, it’s better
to change them as you go along to forward slashes. You can program R
to accept backslashes, but it’s easier at the beginning to just change the
slashes so that they are in the forward direction.

CHANGING WORKING DIRECTORY


The command pwd will tell you where your current working directory is
located.
You now need to move this working directory to the folder you
created entitled “Analysis_Folder” or equivalent. To change the path
that Stata recognises as the current working directory to match your
folder location, you will use the command cd and the filepath is placed in
double quotes. In the example below, the folder “Analysis_Folder” is at
the root level of the C drive. You can, of course, place your folder
wherever you wish and change the pathway using the “cd” command to
inform Stata of the new location.

cd "C:/Analysis_Folder"

. cd "C:/Analysis_Folder"
C:\Analysis_Folder
Introduction ▪ 3

Everything now is coded in relation to this home directory. It means that


if you move the folder or send it to a colleague for further work, they
only need to change one filepath, and the analysis files should work to
replicate your work. This setting of the filepath is called hardcoding. This
is a very important concept that you should NEVER hard code except at
the beginning of your code and make it explicitly obvious so that it’s easy
for others to easily identify where code has to be amended for your
analysis files to work. This hardcoding is usually placed at the top of the
master file. One obvious hard code is your working directory, but there
may be other occasions in which you want to use hardcoding. Do so very
sparingly.

WORKING FROM YOUR HOME DIRECTORY


This home directory will become your base, and you will very rarely, if
ever, move outside of this location.
One of the few instances in which you will produce output outside
of the home directory is if you are the unblinded statistician in a
study and you are producing output from an analysis which you
do not want the blinded statistician to see. In this instance, you will
hard code for the results to appear elsewhere (outside the home
directory).
If, for example, you wish to create files and/or folders within your
home directory, you will sometimes need to tell Stata where the home
directory is. This location is stored in a macro within Stata. A macro is a
piece of information which Stata stores in its memory. The different
types of macros and the implications for programming will be discussed
in subsequent chapters.
For example, to find a list of the files and folders within the
01_Data_In folder, you can issue the following command:

dir "`c(pwd)'/01_Data_In/"

<dir> 4/30/19 15:15 .


<dir> 4/30/19 15:15 ..
<dir> 4/30/19 15:15 Brish_Elecon_Survey
<dir> 1/27/19 20:07 Regional_Data
<dir> 4/30/19 15:15 Stata_Data
<dir> 4/05/19 10:07 Staon_Data
<dir> 4/30/19 15:15 Tim´s datasets
4 ▪ A Comprehensive Guide to Coding and Programming in Stata

The macro ‘c(pwd)’ is where the contents of your working directory are
stored, and you can use this as a short cut in filepaths so that you do not
have to hard code again (once you have previously set the working
directory). Once you set the working directory, ‘c(pwd)’ will always point
to the correct location.

FOLDER STRUCTURE
Your coding for all projects encompasses several steps in data processing:
production of interim datasets, graphs, tables, etc. If you code all of your
programs for an assignment in one file, it can become very long, and you
can’t easily distinguish the stages in your analysis.
It is essential that you create a master file that you can use to call the
subprograms from. The master file sits in the top level of the folder
structure, and the subdirectories and files for your programming steps
sit in the folder named “Programming”. You may wish to add
additional programming files in your programming folder entitled
“01_Data_Input”, “02_Data_Processing”, “03_Tables”, etc.

CREATING A “MASTER FILE”


At the root level of the analysis folder, place a Stata.do file entitled
Master. This file will call all your subprograms and contains three
additional crucial pieces of information.
At the top of the Master file, you should include some preliminary
information about the date the file was created, the name of the
programmer, the purpose of the file, the version of Stata under which it
was made, the organisation to which the programmer is associated, the
date on each occasion that the program was amended and the reason
why, etc.
The next step is to create a section where you place all the hardcoding
in your analysis. This should be the only place where hardcoding is
present. This allows the analysis to very easily port across computing
environments.
The next step is to place all of your global macros in a section clearly
defined for this purpose. You will learn more about global macros later
and the reasons why this step is so important.
The final step is to set out all of the sub-routines for your analysis so
that they can be called individually and sequentially as needed.
Introduction ▪ 5

Don’t worry if this all sounds a bit complicated. It will, with time,
become second nature. There will be examples of folder layouts and
master files later on in the book for you to copy if you wish. Choose any
file structure you wish (or adopt any you come across). The most
important characteristic of a good file structure is that a reviewer should
relatively easily be able to find the code, the data and the outputs without
too much trouble.

SPECIAL CHARACTERS
Be sure you can identify the backtick, apostrophe, backslash, forward-
slash and curly bracket characters. These will be used extensively in the
course.
CHAPTER 2

Temporary Names,
Variables and Files

S OMETIMES WE WANT to create a temporary variable to store some


information that we will use later in the program we are writing.
Sometimes, we may want to keep this variable for quite a while. We may
create the variable at the beginning of the analysis but not use it until the
end. In some cases, we may create a temporary variable and use the
contents almost immediately (or at least within the same program that
created it). One significant issue that arises is that, in creating a variable,
we might accidentally overwrite another variable with the same name.
For this reason, it is good programming practice to create temporary
variables with temporary names. If you keep track of these and delete
them as soon as necessary, then it is less likely that you will cause errors
in your analysis. Stata allows you to create three types of temporary
entities – namely temporary variables, names and files. Temporary
variables and names are deleted at the end of the program (environment)
in which they were created, so you don’t have to keep track of them once
the program you have created them in has ended.
Let’s imagine that I am programming an analysis for which I need to
use a number which I shall call pi (persistent indigestion). This is a
number to which I have assigned a value of 5.543234. Since I wish to use
this number repeatedly, I wish to store it in memory so that I can easily
call it into the programming environment without having to repeatedly
declare it in the programming script.

6 DOI: 10.1201/9781003483779-2
Temporary Names, Variables and Files ▪ 7

Now, if I create a storage entity within Stata named pi, there is a


chance I could start confusing this with the pi value that is already stored
within Stata. Let’s have a look at this first.
Let’s start with an example. Stata keeps the variable pi stored as one of its
inbuilt variables. We will access this using the creturn function. Don’t worry
about this for the moment other than to observe that Stata has a value stored
within its programming environment. We will be covering creturn list later
in the course. Now, let’s create a value for pi (persistent indigestion), which I
wish Stata to remember so that I can use it several times later in my analysis
(this will reduce my workload of having to find and type in the value
repeatedly, reduce the chances of errors and increase my programming
efficiency). We are using a new entity (a scalar) to hold this value. Don’t
worry too much about this entity for the moment, we are covering this later
in the course. A scalar is simply an entity that holds a single value (like a
scalar in a matrix). Note that the display commands use a potentially
bewildering combination of backticks, apostrophes, brackets and quotation
marks. These are a source of considerable irritation and confusion to
statisticians and programmers of all levels. We will be covering this syntax
later in the course. For the moment, just accept they are correct.
display "`c(pi)´"

. display "`c(pi)'"
3.141592653589793

Now, generate a variable called pi with an assigned value of 87 and a scalar


with a name pi and finally display all three values of “pi” that you have stored.
clear
scalar pi = 5.543234
set obs 20
gen pi = 87
scalar list
display scalar(pi)
display "`c(pi)'"
display pi
clear

clear
. scalar pi = 5.543234
. set obs 20
number of observaons (_N) was 0, now 20
. gen pi = 87
. scalar list
pi = 5.543234
. display scalar(pi)
5.543234
. display "`c(pi)'"
3.141592653589793
. display pi
87
. clear
8 ▪ A Comprehensive Guide to Coding and Programming in Stata

The main take-home message is that it might be quite easy to confuse


these variables and, in some environments, to overwrite them. The scalar
is still in memory and will persist until you delete it.

scalar drop pi

scalar list

I. TEMPORARY NAMES
One way of making sure you don’t overwrite existing scalars is to use a
tempname. Stata keeps track of these and never uses the same name
twice once you load any given name, so you needn’t worry about
confusing names and values. Once the program is run, and it comes to
an end, the temporary entity disappears. Don’t worry too much about
this at this time. Just remember that you should use tempnames
whenever you use a scalar. In the example below, the object is now
saved as “__000001” as a temporary object in Stata, which is deleted once
the program is run.

. tempname pi

. scalar `pi´ = 5.543234

. scalar list

. tempname pi

. scalar `pi' = 5.543234

. scalar list
000001 = 5.543234

Another commonly used method of creating a temporary entity that


holds a scalar that disappears at the end of the program is to create a
local. But here again you run the risk of overwriting the local (and also
locals are much less efficient than scalars). I would recommend against
them. We are covering locals later in this session, so no need to panic.
The best solution is to use tempname in conjunction with scalars when
Temporary Names, Variables and Files ▪ 9

you need to create a temporary entity that holds a single value. When
you close Stata, the scalar will disappear.
A note on the naming of temporary names. You may wish to consider
these as local macros. So, the syntax ` and ’ that surrounds each macro is
used to tell Stata that it is a macro or a scalar. You will learn more about the
syntax of macros later, but it’s important to realise that temporary variables
are in effect local macros. Not only do they carry the syntax of a local macro,
but they also disappear once used (another feature of local macros).

II. TEMPORARY VARIABLES


Now, we move on to temporary variables. If, for example, you needed to
multiply every observation by a coefficient, then you could create a temporary
variable. The command tempvar assigns names to the specified local macro
which may then be used as a temporary variable name in a dataset.

tempvar coefficient

generate `coefficient' = 5.543234

display "`coefficient'"

. tempvar coefficient

. generate `coefficient' = 5.543234

. display "`coefficient'"
000003

Unlike with a scalar, it is not possible to easily display the contents of a


temporary variable.
Even if you had another variable called coefficient in your dataset, this
would not interfere with your other variable. Stata has assigned this
variable a code that uniquely identifies the variable and makes it
impossible to confuse.

III. TEMPORARY FILES


The last of these temporary constructions is tempfile, and we will
encounter this entity in more detail later in this book. Suffice to note
that this creates a temporary file that but which Stata controls in such a

You might also like