100% found this document useful (5 votes)
70 views75 pages

An Introduction To Stata Programming 2nd Edition Christopher F. Baum 2024 Scribd Download

Introduction

Uploaded by

peratnazas77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
70 views75 pages

An Introduction To Stata Programming 2nd Edition Christopher F. Baum 2024 Scribd Download

Introduction

Uploaded by

peratnazas77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Download the full version of the ebook at ebookfinal.

com

An Introduction to Stata Programming 2nd Edition


Christopher F. Baum

https://ebookfinal.com/download/an-introduction-to-stata-
programming-2nd-edition-christopher-f-baum/

OR CLICK BUTTON

DOWNLOAD EBOOK

Download more ebook instantly today at https://ebookfinal.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Simply Java An Introduction to Java Programming


Programming Series 1st Edition James Levenick

https://ebookfinal.com/download/simply-java-an-introduction-to-java-
programming-programming-series-1st-edition-james-levenick/

ebookfinal.com

A Gentle Introduction to Stata Fourth Edition Alan C.


Acock

https://ebookfinal.com/download/a-gentle-introduction-to-stata-fourth-
edition-alan-c-acock/

ebookfinal.com

An introduction to programming with Mathematica 3rd


Edition Paul R. Wellin

https://ebookfinal.com/download/an-introduction-to-programming-with-
mathematica-3rd-edition-paul-r-wellin/

ebookfinal.com

An Introduction to Network Programming with Java 1st


Edition Jan Graba

https://ebookfinal.com/download/an-introduction-to-network-
programming-with-java-1st-edition-jan-graba/

ebookfinal.com
Java An Introduction to Problem Solving and Programming
7th Edition Walter Savitch

https://ebookfinal.com/download/java-an-introduction-to-problem-
solving-and-programming-7th-edition-walter-savitch/

ebookfinal.com

A Friendly Introduction to Mathematical Logic 2nd Edition


Christopher C. Leary

https://ebookfinal.com/download/a-friendly-introduction-to-
mathematical-logic-2nd-edition-christopher-c-leary/

ebookfinal.com

Geochemistry An Introduction 2ed. Edition Albarede F.

https://ebookfinal.com/download/geochemistry-an-introduction-2ed-
edition-albarede-f/

ebookfinal.com

An Introduction to Programming in Emacs Lisp 2 Revised


Edition Robert J. Chassell

https://ebookfinal.com/download/an-introduction-to-programming-in-
emacs-lisp-2-revised-edition-robert-j-chassell/

ebookfinal.com

British Culture An Introduction 3rd Edition David P.


Christopher

https://ebookfinal.com/download/british-culture-an-introduction-3rd-
edition-david-p-christopher/

ebookfinal.com
Second Edition
CHRISTOPHER F. BAUM Department of Economics and School of Social
Work
Boston College
®

A Stata Press Publication StataCorp LP College Station, Texas


®

Copyright © 2009, 2016 by StataCorp LP


All rights reserved. First edition 2009
Second edition 2016

Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845

Typeset in LATEX 2

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Print ISBN-10: 1-59718-150-1

Print ISBN-13: 978-1-59718-150-1

ePub ISBN-10: 1-59718-219-2

ePub ISBN-13: 978-1-59718-219-5

Mobi ISBN-10: 1-59718-220-6

Mobi ISBN-13: 978-1-59718-220-1

Library of Congress Control Number: 2015955595

No part of this book may be reproduced, stored in a retrieval system, or


transcribed, in any form or by any means—electronic, mechanical, photocopy,
recording, or otherwise—without the prior written permission of StataCorp LP.

3
Contents
Figures

Tables

Preface
Acknowledgments

Notation and typography


1 Why should you become a Stata programmer?
Do-file programming
Ado-file programming
Mata programming for ado-files
1.1 Plan of the book
1.2 Installing the necessary software
2 Some elementary concepts and tools
2.1 Introduction
2.1.1 What you should learn from this chapter
2.2 Navigational and organizational issues
2.2.1 The current working directory and profile.do
2.2.2 Locating important directories: sysdir and adopath
2.2.3 Organization of do-files, ado-files, and data files
2.3 Editing Stata do- and ado-files
2.4 Data types
2.4.1 Storing data efficiently: The compress command
2.4.2 Date and time handling
2.4.3 Time-series operators
2.4.4 Factor variables and operators
2.5 Handling errors: The capture command
2.6 Protecting the data in memory: The preserve and restore
commands
2.7 Getting your data into Stata
2.7.1 Inputting and importing data
Handling text files
Free format versus fixed format

5
The import delimited command
Accessing data stored in spreadsheets
Fixed-format data files
2.7.2 Importing data from other package formats
2.8 Guidelines for Stata do-file programming style
2.8.1 Basic guidelines for do-file writers
2.8.2 Enhancing speed and efficiency
2.9 How to seek help for Stata programming
3 Do-file programming: Functions, macros, scalars, and matrices
3.1 Introduction
3.1.1 What you should learn from this chapter
3.2 Some general programming details
3.2.1 The varlist
3.2.2 The numlist
3.2.3 The if exp and in range qualifiers
3.2.4 Missing-data handling
Recoding missing values: The mvdecode and mvencode
commands
3.2.5 String-to-numeric conversion and vice versa
Numeric-to-string conversion
Working with quoted strings
3.3 Functions for the generate command
3.3.1 Using if exp with indicator variables
3.3.2 The cond() function
3.3.3 Recoding discrete and continuous variables
3.4 Functions for the egen command
Official egen functions
egen functions from the user community
3.5 Computation for by-groups
3.5.1 Observation numbering: _n and _N
3.6 Local macros
3.7 Global macros
3.8 Extended macro functions and macro list functions
3.8.1 System parameters, settings, and constants: creturn
3.9 Scalars
3.10 Matrices
4 Cookbook: Do-file programming I
4.1 Tabulating a logical condition across a set of variables
4.2 Computing summary statistics over groups

6
4.3 Computing the extreme values of a sequence
4.4 Computing the length of spells
4.5 Summarizing group characteristics over observations
4.6 Using global macros to set up your environment
4.7 List manipulation with extended macro functions
4.8 Using creturn values to document your work
5 Do-file programming: Validation, results, and data management
5.1 Introduction
5.1.1 What you should learn from this chapter
5.2 Data validation: The assert, count, and duplicates commands
5.3 Reusing computed results: The return and ereturn commands
5.3.1 The ereturn list command
5.4 Storing, saving, and using estimated results
5.4.1 Generating publication-quality tables from stored estimates
5.5 Reorganizing datasets with the reshape command
5.6 Combining datasets
5.7 Combining datasets with the append command
5.8 Combining datasets with the merge command
5.8.1 The one-to-one match-merge
5.8.2 The dangers of many-to-many merges
5.9 Other data management commands
5.9.1 The fillin command
5.9.2 The cross command
5.9.3 The stack command
5.9.4 The separate command
5.9.5 The joinby command
5.9.6 The xpose command
6 Cookbook: Do-file programming II
6.1 Efficiently defining group characteristics and subsets
6.1.1 Using a complicated criterion to select a subset of
observations
6.2 Applying reshape repeatedly
6.3 Handling time-series data effectively
6.3.1 Working with a business-daily calendar
6.4 reshape to perform rowwise computation
6.5 Adding computed statistics to presentation-quality tables
6.6 Presenting marginal effects rather than coefficients
6.6.1 Graphing marginal effects with marginsplot

7
6.7 Generating time-series data at a lower frequency
6.8 Using suest and gsem to compare estimates from nonoverlapping
samples
6.9 Using reshape to produce forecasts from a VAR or VECM
6.10 Working with IRF files
7 Do-file programming: Prefixes, loops, and lists
7.1 Introduction
7.1.1 What you should learn from this chapter
7.2 Prefix commands
7.2.1 The by prefix
7.2.2 The statsby prefix
7.2.3 The xi prefix and factor-variable notation
7.2.4 The rolling prefix
7.2.5 The simulate and permute prefixes
7.2.6 The bootstrap and jackknife prefixes
7.2.7 Other prefix commands
7.3 The forvalues and foreach commands
8 Cookbook: Do-file programming III
8.1 Handling parallel lists
8.2 Calculating moving-window summary statistics
8.2.1 Producing summary statistics with rolling and merge
8.2.2 Calculating moving-window correlations
8.3 Computing monthly statistics from daily data
8.4 Requiring at least n observations per panel unit
8.5 Counting the number of distinct values per individual
8.6 Importing multiple spreadsheet pages
9 Do-file programming: Other topics
9.1 Introduction
9.1.1 What you should learn from this chapter
9.2 Storing results in Stata matrices
9.3 The post and postfile commands
9.4 Output: The export delimited, outfile, and file commands
9.5 Automating estimation output
9.6 Automating graphics
9.7 Characteristics
10 Cookbook: Do-file programming IV
10.1 Computing firm-level correlations with multiple indices

8
10.2 Computing marginal effects for graphical presentation
10.3 Automating the production of LATEX tables
10.4 Extracting data from graph files’ sersets
10.5 Constructing continuous price and returns series
11 Ado-file programming
11.1 Introduction
11.1.1 What you should learn from this chapter
11.2 The structure of a Stata program
11.3 The program statement
11.4 The syntax and return statements
11.5 Implementing program options
11.6 Including a subset of observations
11.7 Generalizing the command to handle multiple variables
11.8 Making commands byable
Program properties
11.9 Documenting your program
11.10 egen function programs
11.11 Writing an e-class program
11.11.1 Defining subprograms
11.12 Certifying your program
11.13 Programs for ml, nl, and nlsur
Maximum likelihood estimation of distributions’
parameters
11.13.1 Writing an ml-based command
11.13.2 Programs for the nl and nlsur commands
11.14 Programs for gmm
11.15 Programs for the simulate, bootstrap, and jackknife prefixes
11.16 Guidelines for Stata ado-file programming style
11.16.1 Presentation
11.16.2 Helpful Stata features
11.16.3 Respect for datasets
11.16.4 Speed and efficiency
11.16.5 Reminders
11.16.6 Style in the large
11.16.7 Use the best tools
12 Cookbook: Ado-file programming
12.1 Retrieving results from rolling
12.2 Generalization of egen function pct9010() to support all pairs of
quantiles

9
12.3 Constructing a certification script
12.4 Using the ml command to estimate means and variances
12.4.1 Applying equality constraints in ml estimation
12.5 Applying inequality constraints in ml estimation
12.6 Generating a dataset containing the longest spell
12.7 Using suest on a fixed-effects model
13 Mata functions for do-file and ado-file programming
13.1 Mata: First principles
13.1.1 What you should learn from this chapter
13.2 Mata fundamentals
13.2.1 Operators
13.2.2 Relational and logical operators
13.2.3 Subscripts
13.2.4 Populating matrix elements
13.2.5 Mata loop commands
13.2.6 Conditional statements
13.3 Mata’s st_ interface functions
13.3.1 Data access
13.3.2 Access to locals, globals, scalars, and matrices
13.3.3 Access to Stata variables’ attributes
13.4 Calling Mata with a single command line
13.5 Components of a Mata function
13.5.1 Arguments
13.5.2 Variables
13.5.3 Stored results
13.6 Calling Mata functions
13.7 Example: st_ interface function usage
13.8 Example: Matrix operations
13.8.1 Extending the command
13.9 Mata-based likelihood function evaluators
13.10 Creating arrays of temporary objects with pointers
13.11 Structures
13.12 Additional Mata features
13.12.1 Macros in Mata functions
13.12.2 Associative arrays in Mata functions
13.12.3 Compiling Mata functions
13.12.4 Building and maintaining an object library
13.12.5 A useful collection of Mata routines

10
14 Cookbook: Mata function programming
14.1 Reversing the rows or columns of a Stata matrix
14.2 Shuffling the elements of a string variable
14.3 Firm-level correlations with multiple indices with Mata
14.4 Passing a function to a Mata function
14.5 Using subviews in Mata
14.6 Storing and retrieving country-level data with Mata structures
14.7 Locating nearest neighbors with Mata
14.8 Using a permutation vector to reorder results
14.9 Producing LATEX tables from svy results
14.10 Computing marginal effects for quantile regression
14.11 Computing the seemingly unrelated regression estimator
14.12 A GMM-CUE estimator using Mata’s optimize() functions
References

Author index

Subject index

11
Figures
5.1 Superimposed scatterplots
6.1 Change in Treasury bill rate
6.2 Average marginal effects in a probit model
6.3 Predictions of real exchange rates
7.1 Rolling quantile regression coefficients
7.2 Distribution of the sample median via Monte Carlo simulation
7.3 Q–Q plot of the distribution of the sample median
8.1 Moving-average growth rates
8.2 Estimated monthly volatility from daily data
9.1 Automated graphics
10.1 Point and interval elasticities computed with margins
10.2 Air quality in U.S. cities
12.1 Rolling lincom estimates

12
Tables
2.1 Numeric data types
5.1 Models of sulphur dioxide concentration
9.1 Grunfeld company statistics
9.2 Grunfeld company estimates
9.3 Wage equations for 1984
10.1 Director-level variables
11.1 MCAS percentile ranges
14.1 Demographics: Full sample

13
Preface
This book is a concise introduction to the art of Stata programming. It
covers three types of programming that can be used in working with Stata:
do-file programming, ado-file programming, and Mata functions that work
in conjunction with do- and ado-files. Its emphasis is on the automation of
your work with Stata and how programming on one or more of these levels
can help you use Stata more effectively.

In the development of these concepts, I do not assume that you have


prior experience with Stata programming, although familiarity with the
command-line interface is helpful. While examples are drawn from several
disciplines, my background as an applied econometrician is evident in the
selection of some sample problems. The introductory first chapter motivates
the why: why should you invest time and effort into learning Stata
programming? In chapter 2, I discuss elementary concepts of the command-
line interface and describe some commonly used tools for working with
programs and datasets.

The format of the book may be unfamiliar to readers who have some
familiarity with other books that help you learn how to use Stata. Beginning
with chapter 4, each even-numbered chapter is a “cookbook” chapter
containing several “recipes”, 47 in total. Each recipe poses a problem: how
can I perform a certain task with Stata programming? The recipe then
provides a complete worked solution to the problem and describes how the
features presented in the previous chapter can be put to good use. You may
not want to follow a recipe exactly from the cookbook; just as in cuisine, a
minor variation on the recipe may meet your needs, or the techniques
presented in that recipe may help you see how Stata programming applies to
your specific problem.

Most Stata users who delve into programming use do-files to automate
and document their work. Consequently, the major focus of the book is do-
file programming, covered in chapters 3, 5, 7, and 9. Some users will find
that writing formal Stata programs, or ado-files, meets their needs.
Chapter 11 is a concise summary of ado-file programming, with the
cookbook chapter that follows presenting several recipes that contain
developed ado-files. Stata’s matrix programming language, Mata, can also
be helpful in automating certain tasks. Chapter 13 presents a summary of

14
Mata concepts and the key features that allow interchange of variables,
scalars, macros, and matrices. The last chapter, cookbook chapter 14,
presents several examples of Mata functions developed to work with ado-
files. All the do-files, ado-files, Mata functions, and datasets used in the
book’s examples and recipes are available from the Stata Press website, as
discussed in Notation and typography.

The second edition of this book contains several new recipes illustrating
how do-files, ado-files, and Mata functions can be used to solve
programming problems. Several recipes have also been updated to reflect
new features in Stata added between versions 10 and 14. The discussion of
maximum-likelihood function evaluators has been significantly expanded in
this edition. The new topics covered in this edition include factor variables
and operators; use of margins, marginsplot, and suest; Mata-based
likelihood function evaluators; and associative arrays.

15
Acknowledgments
I must acknowledge many intellectual debts that have been incurred during
the creation of the first and second editions of this book. I am most indebted
to Nicholas J. Cox, who served as a technical reviewer of the original
manuscript, both for his specific contributions to this project and for his
willingness to share his extensive understanding of Stata with all of us in the
Stata user community. His Speaking Stata columns alone are worth the cost
of a subscription to the Stata Journal. Studying Nick’s many routines and
working with him on developing several Stata commands has taught me a
great deal about how to program Stata effectively.

My collaboration with Mark E. Schaffer on the development of ivreg2


and other routines has led to fruitful insights into programming techniques
and to the coauthored section 14.12. At StataCorp, Bill Gould, David
Drukker, Alan Riley, and Vince Wiggins have been most helpful and
encouraging. In the Stata user community, David Roodman and Ben Jann
have contributed greatly to my understanding of Stata’s and Mata’s
programming languages.

Several of the new recipes in this edition have been inspired by


participants in Stata Users Group meetings and seminars at the IMF Institute
for Capacity Development, the Civil Service College Singapore, DIW
Berlin, and several academic venues. I thank Stan Hurn for encouraging a
more detailed discussion of maximum-likelihood estimation techniques.

Oak Square School Christopher F. Baum


Brighton, Massachusetts
November 2015

16
Notation and typography
In this book, I assume that you are somewhat familiar with Stata, that you
know how to input data, and that you know how to use previously created
datasets, create new variables, run regressions, and the like.

I designed this book for you to learn by doing, so I picture you reading
this book while sitting at a computer and using the sequences of commands
contained in the book to replicate my results. In this way, you will be able to
generalize these sequences to suit your own needs.

Generally, I use the typewriter font to refer to Stata commands,


syntax, and variables. A “dot” prompt followed by a command indicates that
you can type verbatim what is displayed after the dot (in context) to
replicate the results in the book.

I use the italic font for words that are not supposed to be typed; instead,
you are to substitute another word or words for them. For example, if I said
to type by(groupvar), you should replace “groupvar” with the actual name
of the group variable.

All the datasets and do-files for this book are freely available for you to
download. You can also download all the user-written commands described
in this book. See http://www.stata-press.com/data/itsp2.html for
instructions.

In a net-aware Stata, you can also load the dataset by specifying the
complete URL of the dataset. For example,

This text complements the material in the Stata manuals but does not
replace it, so I often refer to the Stata manuals by using [R] , [P] , etc. For
example, [R] summarize refers to the Stata Base Reference Manual entry
for summarize, and [P] syntax refers to the entry for syntax in the Stata
Programming Reference Manual.

1
17
Chapter 1
Why should you become a Stata programmer?
This book provides an introduction to several contexts of Stata
programming. I must first define what I mean by “programming”. You can
consider yourself a Stata programmer if you write do-files, which are text
files of sequences of Stata commands that you can execute with the do
([R] do) command, by double-clicking on the file, or by running them in the
Do-file Editor ([R] doedit). You might also write what Stata formally
defines as a program, which is a set of Stata commands that includes the
program ([P] program) command. A Stata program, stored in an ado-file,
defines a new Stata command. You can also use Stata’s matrix programming
language, Mata, to write routines in that language that are called by ado-
files. Any of these tasks involves Stata programming.1

With that set of definitions in mind, we must deal with the why: why
should you become a Stata programmer? After answering that essential
question, this text takes up the how: how you can become a more efficient
user of Stata by using programming techniques, be they simple or complex.

Using any computer program or language is all about efficiency—getting


the computer to do the work that can be routinely automated, reducing human
errors, and allowing you to more efficiently use your time. Computers are
excellent at performing repetitive tasks; humans are not. One of the strongest
rationales for learning how to use programming techniques in Stata is the
potential to shift more of the repetitive burden of data management,
statistical analysis, and production of graphics to the computer. Let’s
consider several specific advantages of using Stata programming techniques
in the three contexts listed above.

Do-file programming

Using a do-file to automate a specific data-management or statistical task


leads to reproducible research and the ability to document the empirical
research process. This reduces the effort needed to perform a similar task at
a later point or to document for your coworkers or supervisor the specific
steps you followed. Ideally, your entire research project should be defined

19
by a set of do-files that execute every step, from the input of the raw data to
the production of the final tables and graphs. Because a do-file can call
another do-file (and so on), a hierarchy of do-files can be used to handle a
complex project.

The beauty of this approach is its flexibility. If you find an error in an


earlier stage of the project, you need only to modify the code and then rerun
that do-file and those following to bring the project up to date. For instance,
a researcher may need to respond to a review of her paper—submitted
months ago to an academic journal—by revising the specification of
variables in a set of estimated models and estimating new statistical results.
If all the steps that produce the final results are documented by a set of do-
files, her task is straightforward. I argue that all serious users of Stata
should gain some facility with do-files and the Stata commands that support
repetitive use of commands.

That advice does not imply that Stata’s interactive capabilities should
be shunned. Stata is a powerful and effective tool for exploratory data
analysis and ad hoc queries about your data. But data-management tasks and
the statistical analyses leading to tabulated results should not be performed
with “point-and-click” tools that leave you without an audit trail of the steps
you have taken.

Ado-file programming

On a second level, you may find that despite the breadth of Stata’s official
and user-written commands, there are tasks you must repeatedly perform
that involve variations on the same do-file. You would like Stata to have a
command to perform those tasks. At that point, you should consider Stata’s
ado-file programming capabilities. Stata has great flexibility: a Stata
command need be no more than a few lines of Stata code. Once defined, that
command becomes a “first-class citizen”. You can easily write a Stata
program, stored in an ado-file, that handles all the features of official Stata
commands such as if exp, in range, and command options. You can (and
should) write a help file that documents the program’s operation for your
benefit and for those with whom you share the code. Although ado-file
programming requires that you learn how to use some additional commands
used in that context, it can help you become more efficient in performing the
data-management, statistical, or graphical tasks that you face.

20
Mata programming for ado-files

On a third level, your ado-files can perform some complicated tasks that
involve many invocations of the same commands. Stata’s ado-file language
is easy to read and write, but it is interpreted. Stata must evaluate each
statement and translate it into machine code. The Mata programming
language (help mata) creates compiled code, which can run much faster
than ado-file code. Your ado-file can call a Mata routine to carry out a
computationally intensive task and return the results in the form of Stata
variables, scalars, or matrices. Although you may think of Mata solely as a
matrix language, it is actually a general-purpose programming language,
suitable for many nonmatrix-oriented tasks, such as text processing and list
management.

The level of Stata programming that you choose to attain and master
depends on your needs and skills. As I have argued, the vast majority of
interactive Stata users can and should take the next step of learning how to
use do-files efficiently to take full advantage of Stata’s capabilities and to
save time. A few hours of investment in understanding the rudiments of do-
file programming—as covered in the chapters to follow—will save you
days or weeks over the course of a sizable research project.

A smaller fraction of users may choose to develop ado-files. Many users


find that those features lacking in official Stata are adequately provided by
the work of members of the Stata user community who have developed and
documented ado-files, sharing them via the Stata Journal, the Statistical
Software Components (SSC) archive,2 or their own user site. However,
developing a reading knowledge of ado-file code is highly useful for many
Stata users. It permits you to scrutinize ado-file code—either that of official
Stata or user-written code—and more fully understand how it performs its
function. In many cases, minor modifications to existing code may meet your
needs.

Mata has been embraced by programmers wishing to take advantage of


its many features and its speed. Although this book does not discuss
interactive use of Mata, I present two ways in which Mata can be used in
ado-files: in “one-liners” to fulfill a single, specific task, and as functions to
be called from ado-files.

1.1 Plan of the book

21
The chapters of this book present the details of the three types of Stata
programming discussed above, placing the greatest emphasis on effective
use of do-file programming. Each fairly brief chapter on the structure of
programming techniques is followed by a “cookbook” chapter. These
chapters contain several “recipes” for the solution of a particular,
commonly encountered problem, illustrating the necessary programming
techniques to compose a solution. Like in a literal cookbook, the recipes
here are illustrative examples; you are free to modify the ingredients to
produce a somewhat different dish. The recipes as presented may not
address your precise problem, but they should prove helpful in devising a
solution as a variation on the same theme.

1.2 Installing the necessary software

This book uses Stata to illustrate many aspects of programming. Stata’s


capabilities are not limited to the commands of official Stata documented in
the manuals and in online help. The capabilities include a wealth of
commands documented in the Stata Journal, the Stata Technical Bulletin,
and the SSC archive.3 Those commands will not be available in your copy of
Stata unless you have already located and installed them. To locate a user-
written command (such as thatcmd), use search thatcmd (see [R] search).

Newer versions of the user-written commands that you install today may
become available. The official Stata command adoupdate
([R] adoupdate), which you can use at any time, will check to see whether
newer versions of any user-written commands are available. Just as the
command update query will determine whether your Stata executable and
official ado-files are up to date, adoupdate will determine whether any
user-written commands installed in your copy of Stata are up to date.

. There are also specialized forms of Stata programming, such as dialog programming,
scheme programming, and class programming. A user-written program can present a
dialog, like any official Stata command, if its author writes a dialog file. The command
can also be added to the User menu of Stata’s graphical interface. For more information,
see [P] dialog programming and [P] window programming. Graphics users can write
their own schemes to set graphic defaults. See [G-4] schemes intro for details. Class
programming allows you to write object-oriented programs in Stata. As [P] class
indicates, this has primarily been used in Stata’s graphics subsystem and graphical user
interface. I do not consider these specialized forms of programming in this book.
. For details on the SSC (Boston College) archive of user-contributed routines, type
help ssc .

22
Chapter 2
Some elementary concepts and tools
2.1 Introduction
This chapter lays out some of the basics that you will need to be an effective
Stata programmer. The first section discusses navigational and
organizational issues: How should you organize your files? How will Stata
find a do-file or an ado-file? The following sections describe how to edit
files, appropriate data types, several useful commands for programmers,
and some guidelines for Stata programming style. The last section suggests
how you can seek help for your programming problems.

2.1.1 What you should learn from this chapter

Know where your files are: master the current working directory and
the ado-path
Learn how to edit do- and ado-files effectively
Use appropriate data types for your variables
Use compress when useful
Use time-series operators effectively
Use factor variables and operators effectively
Use capture, preserve, and restore to work efficiently
Use Stata’s data input commands effectively
Adopt a good style for do-file programming and internal documentation
Know where (and when) to seek help with your Stata programming
Know how to trace your do-file’s execution to diagnose errors

2.2 Navigational and organizational issues


We are all familiar with the colleague whose computer screen resembles a
physical desk in a cluttered office: a computer with icons covering the
screen in seemingly random order. If you use Stata only interactively, storing
data files on the desktop might seem like a reasonable idea. But when you
start using do-files to manage a project or writing your own ado-files, those
files’ locations become crucial; the files will not work if they are in the
wrong place on your computer. This section describes some navigational

24
and organizational issues to help ensure that your files are in the best places.

2.2.1 The current working directory and profile.do

Like most programs, Stata has a concept of the current working directory
(CWD). At any point in time, Stata is referencing a specific directory or
folder accessible from your computer. It may be a directory on your own
hard disk, or one located on a network drive or removable disk. In
interactive mode, Stata displays the CWD in its status bar. Why is the CWD
important? If you save a file—a .dta file or a log file—it will be placed in
the CWD unless you provide a full file specification directing it to another
directory. That is, save myfile, replace will save myfile in the CWD.
Likewise, if you attempt to use a file with the syntax use myfile, clear,
it will search for myfile in the CWD, returning an error if the file is not
located.

One of the most common problems that beginning users of Stata face is
saving a data file and not knowing where it was saved. Of course, if you
never change the CWD, all of your materials will be in the same place, but
do you really want everything related to your research to be located in one
directory? On my Mac OS X computer, the directory is
/Users/baum/Documents, and I would rather not commingle documents
with Stata data files, log files, and graph files. Therefore, I would probably
change the CWD to a directory, or folder, dedicated to my research project
and set up multiple directories for separate projects. You can change your
CWD with the cd command; for example, cd /data/city, cd
d:/project, or cd "My Documents/project1".1 You can use the pwd
command to display the CWD at any time. Both cd and pwd are described in
[D] cd.

You may want Stata to automatically change the CWD to your preferred
location when you start Stata. You can accomplish this with profile.do.
This file, placed in your home directory,2 will execute a set of commands
when you invoke Stata. You might place in profile.do the command cd
c:/data/NIHproject to direct Stata to automatically change the CWD to
that location.

2.2.2 Locating important directories: sysdir and adopath

25
The sysdir ([P] sysdir) command provides a list of six directories or
folders on your computer that are important to Stata. The BASE directory
contains the Stata program itself and the official ado-files that make up most
of Stata. You should not tamper with the files in this directory. Stata’s
update ([R] update) command will automatically modify the contents of
the BASE directory. The SITE directory may reference a network drive in a
university or corporate setting where a system administrator places ado-
files to be shared by many users.

The PERSONAL directory is, as its name suggests, personal. You can
place your own ado-files in that directory. If you want to modify an official
Stata ado-file, you should make a copy of it, change its name (for instance,
rename sureg.ado to sureg2.ado), and place it in your PERSONAL
directory.

The PLUS directory is automatically created when you download any


user-written materials. If you use search ([R] search) to locate and install
user-written programs from the Stata Journal or Stata Technical Bulletin,
their ado-files and help files will be located in a subdirectory of the PLUS
directory.3 If you use the ssc ([R] ssc) command to download user-written
materials from the Statistical Software Components (SSC) (Boston College)
archive or net install to download materials from a user’s site, these
materials will also be placed in a subdirectory of the PLUS directory.

Why are there all these different places for Stata’s ado-files? The
answer lies in the information provided by the adopath ([P] sysdir)
command:

Like sysdir, this command lists six directories. The order of these
directories is important because it defines how Stata will search for a
command. It will attempt to find foo.ado in BASE, the location of Stata’s
official ado-files. The third directory4 is “.”, that is, the CWD. The fourth is
PERSONAL, while the fifth is PLUS.5 This pecking order implies that if

26
foo.ado is not to be found among Stata’s official ado-files or the SITE
directory, Stata will examine the CWD. If that fails, it will look for foo.ado
in PERSONAL (and its subdirectories). If that fails, it will look in PLUS (and
its subdirectories) and as a last resort in OLDPLACE. If foo.ado is nowhere
to be found, Stata will generate an unrecognized command error.

This search hierarchy indicates that you can locate an ado-file in one of
several places. In the next section, I discuss how you might choose to
organize ado-files, as well as do-files and data files related to your
research project.

2.2.3 Organization of do-files, ado-files, and data files

It is crucially important that you place ado-files on the ado-path. You can
place them in your CWD ([3] above in the ado-path listing), but that is
generally a bad idea because if you work in any other directory, those ado-
files will not be found. If the ado-files are your own or have been written by
a coworker, place them in PERSONAL. If you download ado-files from the
SSC archive, please heed the advice that you should always use Stata—not a
web browser—to perform the download and locate the files in the correct
directory (in PLUS).

What about your do-files, data files, and log files? It makes sense to
create a directory, or folder, in your home directory for each separate
project and to store all project-related files in that directory. You can
always fully qualify a data file when you use it in a do-file, but if you move
that do-file to another computer the do-file will fail to find the datafile.
Referencing files in the same directory simplifies making a copy of that
directory for a coworker or collaborator and makes it possible to run the
do-files from an external drive, such as a flash disk, or from a shared
storage location, such as Dropbox or Google Drive.

It is also a good idea to place a cd command at the top of each do-file,


referencing the CWD. Although this command would have to be altered if
you moved the directory to a different computer, it will prevent a common
mistake: saving data files or log files to a directory other than the project
directory.

You might also have several projects that depend on one or two key data
files. Rather than duplicating possibly large data files in each project

27
directory, you can refer to them with a relative file specification. Say that
your research directory is d:/data/research with subdirectories
d:/data/research/project1 and d:/data/research/project2. Place
the key data file master.dta in the research directory, and refer to it in
the project directories with use ../master, clear. The double dot
indicates that the file is to be found in the parent (enclosing) directory,
while allowing you to move your research directory to a different drive (or
even to a Mac OS X or a Linux computer) without having to alter the use
statement.

2.3 Editing Stata do- and ado-files

The Do-file Editor has an advantage over most external editors: it allows
you to execute only a section of the file by selecting those lines and hitting
the Do icon. You should recognize that do- and ado-files are merely text
files with file types of .do or .ado rather than .txt. As such, it is a very
poor idea to edit them in a word processor, such as Microsoft Word. A
word processor must read the text file and convert it into its own binary
format; when the file is saved, it must reverse the process.6 Furthermore, a
word processor will usually present the file in a variable-width character
format, which is harder to read. But the biggest objection to word
processing a do-file or ado-file is the waste of your time: it is considerably
faster to edit a file in the Do-file Editor and execute it immediately without
the need to translate it back into text.

You can use other text editors (but not word processors) to edit do-files.
However, Stata’s Do-file Editor supports syntax highlighting, automatic
indentation, line numbering, bookmarks, and collapsible nodes, so there are
few features provided by external editors that are not readily available in
the Do-file Editor.7

If you need to work on a collection of several files as part of a larger


project, you can do so within Stata’s Project Manager. The Project Manager
is integrated with Stata’s Do-file Editor and allows you to manage do-files,
ado-files, and any other type of file in a convenient manner. See [P] Project
Manager.

2.4 Data types

28
Stata, as a programming language, supports more data types than do many
statistical packages. The major distinction to consider is between numeric
and string data types. Data-management tasks often involve conversions
between numeric and string variables. For instance, data read from a text
file (such as a .csv or tab-delimited file created by a spreadsheet) are
often considered to be a string variable by Stata even though most of the
contents are numeric. The commands destring and tostring (see
[D] destring for both) are helpful in this regard, as are encode and decode
(see [D] encode for both).

Fixed-width string variables can hold values up to 2,045 bytes in length.


As a rough guide, one byte is required to store basic punctuation marks and
unadorned Latin letters (those characters known as lower ASCII); two bytes
are required to store letters such as é. Some Unicode characters require up
to four bytes to store (see [U] 12.4.2 Handling Unicode strings for details).
You usually need not declare the length of string variables because Stata’s
string functions (see [FN] String functions) will generate a string variable
long enough to hold the contents of any generate ([D] generate)
operation. They require as many bytes of storage per observation as their
declaration; for instance, a str20 variable requires 20 bytes per
observation. If you require longer strings, you can use the strL data type,
which can hold up to two billion bytes.

Stata’s numeric data types include byte, int, long, float, and double.
The byte, int, and long data types can only hold integer contents. See
table 2.1 for a summary of the numeric data types.

Table 2.1: Numeric data types

Storage type Minimum Maximum Bytes

byte 1
int 2
long 4
float 4
double 8

29
The long integer data type can hold all signed nine-digit integers but
only some ten-digit integers. Integers are held in their exact representation
by Stata so that you can store a nine-digit integer (such as a U.S. Social
Security number) as a long data type. However, lengthy identification
numbers also can be stored as a double data type or as a string variable.
That will often be a wise choice, because then you need not worry about
possible truncation of values. You also will find it useful to use string
variables when a particular identification code could contain characters.
For instance, the CUSIP (Committee on Uniform Security Identification
Procedures) code used to identify U.S. security issues used to be wholly
numeric but now may contain one or more nonnumeric characters. Storing
these values as strings avoids later problems with numeric missing values.

As displayed above, the two floating-point data types, float and


double, can hold very large numbers. But many users encounter problems
with much smaller floating-point values if they mistakenly assume that
floating-point arithmetic operations are exact. Floating-point numbers (those
held as mantissa and exponent, such as ) expressed in base 10
must be stored as base 2 (binary) numbers. Although 1/10 is a rational
fraction in base 10, it is not so in the binary number system used in a
computer:

Further details of this issue can be found in [U] 12.2.2 Numeric storage
types, Gould (2006b) , and Cox (2006b). The implications are clear: an if
condition that tests some floating-point value for equality, such as if diff
== 0.01, is likely to fail when you expect that it would succeed.8 A float
contains approximately 7 significant digits in its mantissa. This implies that
if you read a set of nine-digit U.S. Social Security numbers into a float,
they will not be held exactly. A double contains approximately 15
significant digits. We know that residuals computed from a linear regression
using regress and predict eps, residual should sum to exactly 0. In
Stata’s finite-precision computer arithmetic using the default float data
type, residuals from such a regression will sum to a value in the range of
rather than 0.0. Thus, discussions of the predict ([R] predict)
command often advise using predict double eps, residual to
compute more accurate residuals.

30
What are the implications of finite-precision arithmetic for Stata
programming?

You should store ID numbers with many digits as string variables, not
as integers, floats, or doubles.
You should not rely on exact tests of a floating-point value against a
constant, not even 0. The reldif() function ([FN] Mathematical
functions) can be used to test for approximate equality.
As suggested above, use double floating-point values for any
generated series where a loss of precision might be problematic, such
as residuals, predicted values, scores, and the like.
You should be wary of variables’ values having very different scales,
particularly when a nonlinear estimation method is used. Any
regression of price from the venerable auto.dta reference dataset
on a set of regressors will display extremely large sums of squares in
the analysis of variance table. Scaling price from dollars to
thousands of dollars obviates this problem. The scale of this variable
does not affect the precision of linear regression, but it could be
problematic for nonlinear estimation techniques.
Use integer data types where it is appropriate to do so. Storing values
as byte or int data types when feasible saves disk space and
memory.

2.4.1 Storing data efficiently: The compress command

compress ([D] compress) is a useful command, particularly when working


with datasets acquired from other statistical packages. This command will
examine each variable in memory and determine whether it can be stored
more efficiently. It is guaranteed never to lose information or reduce the
precision of measurements. The advantage of storing indicator (0/1)
variables as a byte data type rather than as a four-byte long data type is
substantial for survey datasets with many indicator variables. It is an
excellent idea to apply compress when performing the initial analysis of a
new dataset. Alternatively, if you are using the third-party Stat/Transfer
application to convert data from a SAS or an SPSS format, use the
Stat/Transfer “Optimize” option.

2.4.2 Date and time handling

31
Stata does not have a separate data type for calendar dates. Dates are
represented, as they are in a spreadsheet program, by numbers known as %t
values measuring the time interval from a reference date or “epoch”. For
example, the epoch for Stata and for SAS is midnight on 1 January 1960.
Days following that date have positive integer values, while days prior to it
have negative integer values. Dates represented in days are known as %td
values. Other calendar frequencies are represented by the number of weeks,
months, quarters, or half-years since that reference date: %tw, %tm, %tq, and
%th values, respectively. The year is represented as a %ty value, ranging
from AD 100 to AD 9999. You can also use consecutive integers and the
generic form, as %tg.

Stata also supports business-daily calendars ([D] bcal), which are


custom calendars specifying the dates to be included and excluded in the
calendar specification. For instance, most financial markets are closed on
weekends and are also closed on holidays. A business-daily calendar
allows you to set up the data in “trading time”, which is crucial for many
time-series commands that require no gaps in the sequence of daily values.
Stata can construct a business-daily calendar from the dates represented in a
variable.

Stata provides support for accurate intradaily measurements of time,


down to the millisecond. A date-and-time variable is known as a %tc
(clock) value, and it can be defined to any intraday granularity: hours,
minutes, seconds, or milliseconds.9 For more information, see
[U] 12.3 Dates and times. The tsset ([TS] tsset) command has a delta()
option, by which you can specify the frequency of data collection. For
instance, you can have annual data collected only at five-year intervals, or
you can have high-frequency financial markets transactions data,
timestamped by day, hour, minute, and second, collected every five or ten
minutes.

When working with variables containing dates and times, you must
ensure that the proper Stata data type is used for their storage. Weekly and
lower-frequency values (including generic values) can be stored as data
type int or as data type float. Daily (%td) values should be stored as data
type long or as data type float. If the int data type is used, dates more
than 32,740 days from 1 January 1960 (that is, beyond 21 August 2049)
cannot be stored.

32
More stringent requirements apply to clock (date-and-time) values.
These values must be stored as data type double to avoid overflow
conditions. Clock values, like other time values, are integers, and there are
86,400,000 milliseconds in a day. The double data type is capable of
precisely storing date-and-time measurements within the range of years
defined in Stata (AD 100–9999).

Although it is important to use the appropriate data type for date-and-


time values, you should avoid using a larger data type than needed. The int
data type requires only two bytes per observation, the long and float
data types require four bytes, and the double data type requires eight bytes.
Although every date and time value could be stored as a double, that would
be very wasteful of memory and disk storage, particularly in a dataset with
many observations.

A suite of functions (see [FN] Date and time functions) is available to


handle the definition of date variables and date/time arithmetic. Display of
date variables in calendar formats (such as 08 Nov 2006) and date and
time variables with the desired intraday precision is handled by the
definition of proper formats. Because dates and times are numeric
variables, you should distinguish between the content or value of a date/time
variable and the format in which it will be displayed.

If you are preparing to move data from a spreadsheet into Stata with the
import delimited ([D] import delimited) command, make sure that any
date variables in the spreadsheet display as four-digit years. It is possible to
deal with two-digit years (for example, 11/08/06) in Stata, but it is easier
to format the dates with four-digit years (for example, 11/08/2006) before
reading those data into Stata.

2.4.3 Time-series operators

Stata provides the time-series operators L., F., D., and S., which allow the
specification of lags, leads (forward values), differences, and seasonal
differences, respectively.10 The time-series operators make it unnecessary
to create a new variable to use a lag, difference, or lead. When combined
with a numlist, they allow the specification of a set of these constructs in a
single expression.

Consider the lag operator, L., which when added to the beginning of a

33
variable name refers to the (first-)lagged value of that variable: L.x. A
number can follow the operator: L4.x refers to the fourth lag of x. More
generally, a numlist can be used: L(1/4).x refers to the first through fourth
lags of x, and L(1/4).(x y z) defines a list of four lagged values of each
of the variables x, y, and z. Similarly to the lag operator, the lead operator,
F., allows the specification of future values of one or more variables.
Strictly speaking, the lead operator is unnecessary because a lead is a
negative lag, and an expression such as L(-4/4).x will work, labeling the
negative lags as leads.

The difference operator, D., can be used to generate differences of any


order. The first difference, D.x, is or . The second difference,
D2.x, is not but rather ; that is, , or
. The seasonal difference operator, S., is used to
compute the difference between the value in the current period and the
period one year ago. For quarterly data, you might type S4.x to generate
and S8.x to generate .

You can also combine the time-series operators: LD.x is the lag of the
first difference of x (that is, ) and refers to the same expression
as DL.x. Any of the above expressions can be used almost anywhere that a
varlist is required.

In addition to being easy to use, time-series operators will never


misclassify an observation. You could refer to a lagged value as x[_n-1]
or a first difference as x[_n] - x[_n-1]; however, that construction is not
only cumbersome but also dangerous. Consider an annual time-series
dataset in which the 1981 and 1982 data are followed by the data for 1984,
1985, …, with the 1983 data not appearing in the dataset (that is, they are
physically absent, not simply recorded as missing values). The observation-
number constructs above will misinterpret the lagged value of 1984 to be
1982, and the first difference for 1984 will incorrectly span the two-year
gap. The time-series operators will not make this mistake. Because tsset
([TS] tsset) has been used to define year as the time-series calendar
variable, the lagged value or first difference for 1984 will be properly
coded as missing whether or not the 1983 data are stored as missing in the
dataset. Thus, you should always use the time-series operators when
referring to past or future values or computing differences in a time-series
dataset.

34
The time-series operators also provide an important benefit in the
context of longitudinal or panel datasets ([XT] xt), where each observation,
, is identified with both an and a subscript. If those data are xtset
([XT] xtset) or tsset ([TS] tsset) , using the time-series operators will
ensure that references will not span panels. For instance, z[_n-1] in a
panel context will allow you to reference (the last observation of panel
1) as the prior value of (the first observation of panel 2). In contrast,
L.z (or D.z) will never span panel boundaries. Panel data should always
be xtset or tsset, and any time-series references should use the time-
series operators.

2.4.4 Factor variables and operators

Factor variables allow you to create indicator (0/1) variables, interaction


variables, and powers of continuous variables. Factor-variable operators
create temporary variables “on the fly” and may be used with most
estimation and postestimation commands. Factor variables are a
convenience at estimation time, but they are essential if you plan to calculate
marginal means, predictive margins, or marginal effects with the margins
([R] margins) command. Programmers who plan to allow margins as a
postestimation command should ensure that their estimation command
supports factor variables.

To create an indicator variable, you use the i. operator with any


integer-valued variable with values in the range [0, 32740]. The values
need not be consecutive integers and need not start at 0. For instance, we
may have the variables gender, taking on two values (1, 2), and region,
taking on five values (1, 2, 3, 4, 9). By default, the lowest value is treated as
the base level, and an indicator variable is produced for all levels except
the base level. Thus,

is equivalent to

You can force inclusion of the base level or exclusion of other levels by
specifying variations on the i. operator. For instance, if we wanted to
specify 2 as the base level for gender, we could do this using the notation

35
ib2.gender. Or, we could include both values of gender in our model and
exclude the constant by specifying ibn.gender and the noconstant
option. For more information about selecting levels for indicator variables,
see [U] 11.4.3 Factor variables.

Another common use of factor variables is the construction of


interaction terms and powers. We can create interactions between indicator
variables, between indicator variables and continuous variables, or
between continuous variables. We can obtain powers of a variable by
interacting a continuous variable with itself. To specify only an interaction,
you use the # operator. To specify main effects and an interaction, you use
the ## operator. Stata assumes that variables specified in an interaction are
indicator variables. This means that you do not need to prefix a variable
name with i. unless you wish to customize the levels. To specify that the
variable should be treated as continuous, you must use the c. operator.

For instance, we could include main effects for gender and region
and create an interaction between them in our model by typing

Or, we could include main effects and an interaction between region and
income by typing

If a variable appears in multiple interactions, we want to specify the


main effect separately and use the interaction operator. For instance,

If we only wanted to fit the model, this would be equivalent to typing

The shorter factor-variables notation gives us access to many more


postestimation commands such as margins ([R] margins), contrast
([R] contrast), marginsplot ([R] marginsplot) for a more sophisticated
analysis.

For example, suppose you are analyzing the bpress dataset containing

36
blood pressure measurements of individuals categorized by gender (with
indicator variable sex, 0 male) and one of three agegrp codes (1 = 30–
45, 2 = 46–59, 3 = 60+).

If you are using a command such as regress that supports factor


variables, you can use the factorial-interaction operator to estimate this
regression with several clear advantages. The labeling of the output is more
comprehensible:

You can then easily obtain marginal effects for each of the covariates:

37
2.5 Handling errors: The capture command
When an error is encountered in an interactive Stata session, it is displayed
on the screen. When a do-file is being executed, however, Stata’s default
behavior causes the do-file to abort when an error occurs.11 There are
circumstances when a Stata error should be ignored, for example, when
calculating a measure from each by-group that can be computed only if there
are more than 10 observations in the by-group.

Rather than programming conditional logic that prevents that calculation


from taking place with insufficient observations, you could use capture
([P] capture) as a prefix on that command. For instance, capture regress
y x1-x12 will prevent the failure of one regression from aborting the do-
file. If you still would like to see the regression results for those regressions
that are feasible, use noisily capture .... The capture command can
also be used to surround a block of statements, as in

rather than having to repeat capture on each regress command.

2.6 Protecting the data in memory: The preserve and


restore commands

Several Stata commands replace the data in memory with a new dataset. For
instance, the collapse ([D] collapse) command makes a dataset of
summary statistics, whereas contract ([D] contract) makes a dataset of
frequencies or percentages. In a program, you may want to invoke one of
these commands, but you may want to retain the existing contents of memory
for further use in the do-file. You need the preserve and restore (see
[P] preserve for both) commands, which will allow you to set aside the
current contents of memory in a temporary file and bring them back when
needed. For example,

38
We use and modify auto.dta, and then preserve the modified file. The
collapse command creates a dataset with one observation for each value
of rep78, the by() variable. We sort that dataset of summary statistics and
save it.

We are now ready to return to our main dataset:

The restore command brings the preserved dataset back into memory.
We sort by rep78 and use merge to combine the individual automobile
data in memory with the summary statistics from repstats.dta. Although
these computations could have been performed without collapse,12 the
convenience of that command is clear. The ability to set the current dataset
aside (without having to explicitly save it) and bring it back into memory
when needed is a useful feature.

2.7 Getting your data into Stata

39
This section details data input and manipulation issues.13 Source data can
be downloaded from a website, acquired in spreadsheet format, or made
available in the format of some other statistical package. The following two
subsections deal with those variations.

2.7.1 Inputting and importing data

Before carrying out statistical analysis with Stata, many researchers must
face several thorny issues in converting their foreign data into a Stata-usable
form. These issues range from the mundane (for example, a text-file dataset
may have coded missing values as 99) to the challenging (for example, a
text-file dataset may be in a hierarchical format, with master records and
detail records). Although a brief guide to these issues cannot possibly cover
all the ways in which external data can be organized and transformed for
use in Stata, several rules apply:

Familiarize yourself with the various Stata commands for data input.
Each has its use, and in the spirit of “do not pound nails with a
screwdriver”, data handling is much simpler if you use the correct tool.
Reading [U] 21 Entering and importing data is well worth the
investment of your time.
When you need to manipulate a text file, use a text editor, not a word
processor or a spreadsheet.
Get the data into Stata as early in the process as you can, and perform
all manipulations via well-documented do-files that can be edited and
re-executed if need be (or if a similar dataset is encountered). Given
this exhortation, I will not discuss input ([D] input) or the Data
Editor, which allows interactive entry of data, or various copy-and-
paste strategies involving simultaneous use of a spreadsheet and Stata.
Such strategies are not reproducible and should be avoided.
Keeping track of multiple steps of the data input and manipulation
process requires good documentation. If you ever need to replicate or
audit the data manipulation process, you will regret it if your
documentation did not receive the proper attention.
Working with anything but a simple rectangular data array will almost
always require the use of append ([D] append), merge ([D] merge),
or reshape ([D] reshape). You should review the information about
those commands in chapter 5 and understand their capabilities.

40
Handling text files

Text files —often described as ASCII files—are the most common source of
raw data in microeconomic research. Text files can have any file extension:
they can be labeled .raw or .csv (as Stata would prefer) or .txt or
.asc. A text file is just that: text. Word processing programs, like Microsoft
Word or OpenOffice, are inappropriate tools for working with text files
because they have their own native, binary format and generally use features
such as proportional spacing, which causes columns to be misaligned. A
word processor uses a considerable amount of computing power to translate
a text file into its own native format before it can display it on the screen.
The inverse of that translation must be used to create a text file that can be
subsequently read by Stata.

Stata does not read binary files other than those in its own .dta
format.14 The second rule above counsels the use of a text editor rather than
a word processor or spreadsheet when manipulating text files. Every
operating system supports a variety of text editors, many of which are freely
available.15 You will find that a good text editor is much faster than a word
processor when scrolling through a large data file. Many text editors
colorize Stata commands, making them useful for Stata program
development. Text editors are also extremely useful when working with
large survey datasets that are accompanied with machine-readable
codebooks, often many megabytes in size. Searching those codebooks for
particular keywords with a powerful text editor is efficient.

Free format versus fixed format

Text files can be free format or fixed format. A free-format file contains
several fields per record, separated by delimiters, characters that are not to
be found within the fields. A purely numeric file (or one with simple string
variables, such as U.S. state codes) can be whitespace-delimited; that is,
successive fields in the record are separated by one or more whitespace
characters:

The columns in the file need not be aligned. These data can be read from a
text file (by default with extension .raw) with Stata’s infile ([D] infile

41
(free format)) command, which must assign names (and if necessary, data
types) to the variables:

Here we must indicate that the first variable is a string variable of maximum
length two characters (str2); otherwise, every record will generate an
error that says state cannot be read as a number.

We may even have a string variable with contents of varying length in


the record:

However, this scheme will break down as soon as we hit New Hampshire.
The space within the state name will be taken as a delimiter, and Stata will
become quite befuddled. If string variables with embedded spaces are to be
used in a space-delimited file, they themselves must be delimited (usually
with quotation marks in the text file):

42
So what should you do if your text file is space-delimited and contains
string variables with embedded spaces? That is a difficult question because
no mechanical transformation will generally solve this problem. For
instance, using a text editor to change multiple spaces to a single space and
then to change each single space to a tab character will not help, because it
will then place a tab between New and Hampshire.

If the data are downloadable from a web page that offers formatting
choices, you should choose a tab-delimited rather than a space-delimited
format. The other option, comma-delimited text or comma-separated values
(.csv), has its own difficulties. Consider field contents (without quotation
marks) such as “College Station, TX”, “J. Arthur Jones, Jr.”, “F. Lee Bailey,
Esq.”, or “T. Frank Kennedy, S.J.”. If every city name is followed by a
comma, then no problem, because the city and state can then be read as
separate variables. But if some are written without commas (“Brighton
MA”), the problem returns. In any case, parsing proper names with
embedded commas is problematic. Tab-delimited text avoids most of these
problems.

The import delimited command

If we are to read tab-delimited text files, the infile ([D] infile (free
format)) command is no longer the right tool for the job; we should now use
import delimited ([D] import delimited).16 The import delimited
command reads a tab-delimited or comma-delimited (.csv) text file
regardless of whether a spreadsheet program was involved in its creation.

43
For instance, most database programs contain an option to generate a tab-
delimited or comma-delimited export file, and many datasets available for
web download are in one of these formats.

The import delimited command is handy. This is the command to use


as long as one observation in your target Stata dataset is contained on a
single record with tab or comma delimiters. Stata will automatically try to
determine which delimiter is in use (but options tab and comma are
available), or any valid Stata character can be specified as a delimiter with
the delimiter(char) option.17 For instance, some European database
exports use semicolon (;) delimiters because standard European numeric
formats use the comma as the decimal separator. If the first line (or a
specified line) of the .raw file contains variable names, they will be used
and translated into valid Stata variable names if they contain invalid
characters, such as spaces. This is useful because if data are being extracted
from a spreadsheet, they will often have that format. To use the sample
dataset above, now tab-delimited with a header record of variable names,
type

The issue of embedded spaces or commas no longer arises in tab-delimited


data, and you can rely on the first line of the file to define the variable
names.

It is particularly important to heed any information or error messages


produced by the data input commands. If you know how many observations
are present in the text file, check whether the number Stata reports is
correct. Likewise, the summarize ([R] summarize) command should be
used to discern whether the number of observations, minimum and
maximum, for each numeric variable is sensible. Data-entry errors often can
be detected by noting that a particular variable takes on nonsensical values,

44
usually denoting the omission of one or more fields on that record. Such an
omission can also trigger one or more error messages. For instance, leaving
out a numeric field on a particular record will move an adjacent string field
into that variable. Stata will then complain that it cannot read the string as a
number. A distinct advantage of the tab- and comma-delimited formats is
that missing values can be coded with two successive delimiters. As will be
discussed in chapter 5, assert ([D] assert) can be used advantageously to
ensure that reasonable values appear in the data.

An additional distinction exists between infile and import


delimited: the former command can be used with if exp and in range
qualifiers to selectively input data. For instance, with a large text-file
dataset, you could use in 1/1000 to read only the first 1,000 observations
and verify that the input process is working properly. By using if
gender=="M", we could read only the male observations; or by using if
runiform() <= 0.15, we could read each observation with probability
0.15. These qualifiers cannot be used with import delimited, but you can
specify a range of rows to read with the rowrange() option to read less
than the entire file.

Accessing data stored in spreadsheets

In the third rule in section 2.7.1, I counseled that copy-and-paste techniques


should not be used to transfer data from another application directly to
Stata. Such a technique cannot be reliably replicated. How do you know that
the first and last rows or columns of a spreadsheet were selected and
copied to the clipboard, without any loss of data or extraneous inclusion of
unwanted data? If the data are presently in a spreadsheet, the appropriate
portion of that spreadsheet should be copied and pasted (in Excel, Paste
Special to ensure that only values are stored) into a new blank sheet. If Stata
variable names are to be added, leave the first row blank so that they can be
filled in. Save that sheet, and that sheet alone, as Text Only – Tab delimited
to a new filename. If you use the file extension .raw, it will simplify
reading the file into Stata with the import delimited command.

You can also use the import excel ([D] import excel) command to
read the contents of an Excel or Excel-compatible worksheet, in either .xls
or .xlsx format, into Stata. You can specify the worksheet to be input and
optionally provide a cell range from which data are to be read. The first
row can be used to provide Stata variable names. Note that if a column of

45
the worksheet contains at least one cell with nonnumerical text (such as NA),
the entire column is imported as a string variable. Therefore, you should be
familiar with Stata’s string-to-numeric conversion capabilities, as discussed
in section 3.2.5.

Two caveats regarding dates: First, both Excel and Stata work with the
notion that calendar dates are successive integers from an arbitrary starting
point. To read the dates into a Stata date variable, they must be formatted
with a four-digit year, preferably in a format with delimiters (for example,
12/6/2004 or 6-Dec-2004). It is much easier to make these changes in the
spreadsheet program before reading the data into Stata. Second, Mac OS X
users of Excel should note that Excel’s default is the 1904 date system. If the
spreadsheet was produced in Excel for Windows and the steps above are
used to create a new sheet with the desired data, the dates will be off by
four years (the difference between Excel for Mac and Excel for Windows
defaults). Uncheck the preference Use the 1904 date system before saving
the file as text.

Mac OS X and Unix users of import excel who work in non-English


languages may need to make adjustments to the system locale used for
importing their workbook. See [D] import excel for details.

Fixed-format data files

Many text-file datasets are composed of fixed-format records: those obeying


a strict column format in which a variable appears in a specific location in
each record of the dataset. Such datasets are accompanied by codebooks,
which define each variable’s name, data type, location in the record, and
possibly other information, such as missing values, value labels, or
frequencies for integer variables.18 Below is a fragment of the codebook for
the study “National Survey of Hispanic Elderly People, 1988”,19 available
from the Inter-University Consortium for Political and Social Research.

46
The codebook specifies the column in which each variable starts (LOC)
and the number of columns it spans (WIDTH).20 In this fragment of the

47
codebook, only integer numeric variables appear. The missing-data codes
(MD) for each variable are also specified. The listing above provides the full
codebook details for variable 13, marital status, quoting the question posed
by the interviewer, coding of the six possible responses, and the frequency
counts of each response.

In a fixed-format data file, fields need not be separated, as we see


above, where the single-column fields of variables 0019, 0020, and 0021
are stored as three successive integers. Stata must be instructed to interpret
each of those digits as a separate variable. This is done with a data
dictionary, which is a separate Stata file, with file extension .dct,
specifying the necessary information to read a fixed-format data file. The
information in the codebook can be translated, line for line, into the Stata
data dictionary. The data dictionary need not be comprehensive. You might
not want to read certain variables from the raw data file, so you would
merely ignore those columns. This might be particularly important when
working with Stata/IC and its limit of 2,047 variables. Many survey datasets
contain many more than 2,000 variables. By judiciously specifying only the
subset of variables that are of interest in your research, you can read such a
text file with Stata/IC.

Stata supports two different formats of data dictionaries. The simpler


format, used by infix ([D] infix (fixed format)), requires only that the
starting and ending columns of each variable are given along with any
needed data-type information. To illustrate, we specify the information
needed to read a subset of fields in this codebook into Stata variables, using
the description of the data dictionary in infix:

48
Alternatively, we could set up a dictionary file for the fixed-format
version of infile ([D] infile (fixed format)). This is the more powerful
option because it allows us to attach variable labels and specify value
labels. However, rather than specifying the column range of each field that
you want to read, you must indicate where it starts and its field width, given
as the %infmt for that variable. With a codebook like the one displayed
above, we have the field widths available. We could also calculate the field
widths from the starting and ending column numbers. We must not only
specify which are string variables but also give their data storage type. The
storage type could differ from the %infmt for that variable. You might read
a six-character code into a ten-character field knowing that other data use
the latter width for that variable.

49
The _column() directives in this dictionary are used where dictionary
fields are not adjacent. You could skip back and forth along the input record
because the columns read need not be in ascending order. But then we could
achieve the same thing with the order ([D] order) command after data
input. We are able to define variable labels by using infile.

In both examples above, the dictionary file specifies the name of the
data file, which need not be the same as that of the dictionary file. For
example, highway.dct could read highway.raw, and if that were the case,
the latter filename need not be specified. But we might want to use the same
dictionary to read more than one .raw file. To do so, leave the filename out
of the dictionary file, and use the using modifier to specify the name of the
.raw file. After loading the data, we can describe ([D] describe) its
contents:

50
The dictionary indicates that value labels are associated with the variables,
but it does not define those labels. Commands such as

must be given to create those labels.

One other advantage of the more elaborate infile data-dictionary


format should be noted. Many large survey datasets contain several
variables that are real or floating-point values, such as a wage rate in
dollars and cents or a percentage interest rate (such as 6.125%). To save
space, the decimal points are excluded from the text file, and the codebook
indicates how many decimal digits are included in the field. You could read
these data as integer values and perform the appropriate division in Stata,
but a simpler solution would be to build this information into the data
dictionary. By specifying that a variable has a %infmt of, for example,

51
%6.2f, a value such as 123456 can be read properly as daily sales of
$1,234.56.

Stata’s data-dictionary syntax can handle many more-complicated text


datasets, including those with multiple records per observation or those
with header records that are to be ignored. See [D] infile (fixed format) for
full details.

2.7.2 Importing data from other package formats

The previous section discussed how foreign data files could be brought into
Stata. Often, the foreign data are already in the format of some other
statistical package or application. For instance, several economic- and
financial-data providers make SAS-formatted datasets readily available,
while socioeconomic datasets are often provided in SPSS format. The most
straightforward and inexpensive way to deal with these package formats
involves the third-party application Stat/Transfer, a product of Circle
Systems, Inc. Stat/Transfer has the advantage of a comarketing relationship
with StataCorp, so you can acquire a copy of Stat/Transfer at an
advantageous price from StataCorp.

The alternative to Stat/Transfer usually involves having access to a


working copy of the other statistical package and having enough familiarity
with the syntax of that package to understand how a dataset can be exported
from its own proprietary format to a text-file format (for example, .csv).21
Even for those researchers who have that familiarity and copies of another
package, this is a rather cumbersome solution because (like Stata) packages
such as SAS and SPSS have their own conventions for missing-data formats,
value labels, data types, etc. Although the raw data can be readily exported
to a text format, these attributes of the data will have to be re-created in
Stata. For a large survey dataset with many hundred (or thousand) variables,
that is unpalatable. A transformation utility like Stat/Transfer performs all of
those housekeeping chores, ensuring that any attributes attached to the data
(extended missing-value codes, value labels, etc.) are placed in the Stata-
format file. Of course, the mapping between packages is not always one to
one. In Stata, a value label stands alone and can be attached to any variable
or set of variables, whereas in other packages it is generally an attribute of
a variable and must be duplicated for similar variables.

An important distinction between Stata and SAS or SPSS is Stata’s

52
flexible set of data types. Stata, like the C language in which its core code is
written, offers five numeric data types ([D] data types): the integer types
byte, int, and long, and the floating-point types float and double. Stata
also offers the string types str1-str2045 and strL. Most other packages
do not have this broad array of data types and instead resort to storing all
numeric data in a single data type: “Raw data come in many different forms,
but SAS simplifies this. In SAS there are just two data types: numeric and
character” (Delwiche and Slaughter 1998, 4). This simplicity bears a
sizable cost because an indicator variable requires only one byte of storage
and a double-precision floating-point variable requires eight bytes to hold
up to 15 decimal digits of accuracy.

Stata allows the user to specify the data type based on the contents of
each variable, which can result in considerable savings of both disk space
and execution time when reading or writing those variables to disk.
Stat/Transfer can be instructed to optimize a target Stata-format file in the
transfer process, or you can use Stata’s compress ([D] compress) command
to automatically perform that optimization. In any case, you should always
take advantage of this optimization because it will reduce the size of files
and require less of your computer’s memory to work with them.

A useful feature of Stat/Transfer is the ability to generate a subset of a


large file while transferring it from SAS or SPSS format. I spoke above of the
possibility of reading only certain variables from a text file to avoid
Stata/IC’s limitation of 2,047 variables. You can always use Stat/Transfer to
translate a sizable survey data file from SAS to Stata format, but if there are
more than 2,047 variables in the file, the target file must be specified as a
Stata/SE or Stata/MP file. If you do not have access to Stata/SE nor
Stata/MP, the transfer will be problematic. The solution is to use
Stat/Transfer’s ability to read a list of variables that you would like to keep
(or drop), which will generate a subset file “on the fly”. Because
Stat/Transfer can generate a machine-readable list of variable names, that
list can be edited to produce the keep list or drop list.

Although I have spoken mostly of SAS and SPSS, Stat/Transfer is capable


of exchanging datasets with a wide variety of additional packages, including
GAUSS, Excel, MATLAB, and more; see http://stattransfer.com for details.
Versions of Stat/Transfer for Windows, Mac OS X, and Linux/Unix are
available.

53
I must also mention an alternative solution for data transfer between
databases supporting some flavor of structured query language (SQL). Stata
can perform Open Database Connectivity (ODBC) operations with databases
accessible via that protocol (see [D] odbc for details). Because most SQL
databases and non-SQL data structures, such as Excel and Microsoft Access,
support ODBC, this is often suggested as a workable solution to dealing with
foreign data. It does require that the computer system on which you are
running Stata is equipped with ODBC drivers. These are installed by default
on Windows systems with Microsoft Office but may require the purchase of
a third-party product for Mac OS X or Linux systems. If the necessary
database connectivity is available, Stata’s odbc is a full-featured solution.
It allows for both the query of external databases and the insertion or update
of records in those databases.

2.8 Guidelines for Stata do-file programming style

As you move away from interactive use of Stata and make greater use of do-
files and ado-files in your research, the style of the contents of those files
becomes more important. One of the reasons for using do-files is the audit
trail that they provide. Are your do-files readable and comprehensible—not
only today but also in several months? To highlight the importance of good
programming style practices, I present an edited excerpt from Nicholas
J. Cox’s excellent essay “Suggestions on Stata programming style”22
(Cox 2005f). The rest of this section is quoted from that essay.

Programming in Stata, like programming in any other computer language,


is partly a matter of syntax, as Stata has rules that must be obeyed. It is also
partly a matter of style. Good style includes, but is not limited to, writing
programs that are, above all else, clear. They are clear to the programmer,
who may revisit them repeatedly, and they are clear to other programmers,
who may wish to understand them, to debug them, to extend them, to speed
them up, to imitate them, or to borrow from them.

People who program a great deal know this: setting rules for yourself
and then obeying them ultimately yields better programs and saves time.
They also know that programmers may differ in style and even argue
passionately about many matters of style, both large and small. In this
morass of varying standards and tastes, I suggest one overriding rule: Set
and obey programming style rules for yourself. Moreover, obey each of the
rules I suggest unless you can make a case that your own rule is as good or

54
Exploring the Variety of Random
Documents with Different Content
bij het stadhuis, waar hij de remonstrantie aan den burgemeester ter
bewaring overgaf.
Een uur later verliet Lodewijk zijne vrienden, onder voorwendsel
van zich naar huis te begeven; doch het was om vol droefheid en
eenzaam door de stad te dwalen; het was om zich geheel over te
geven aan de smart, die deze schriktooneelen hem veroorzaakten.
Wanhopig en buiten zich zelven, stapte hij langzaam door de straten
en scheen zich bijna niet meer te bekreunen over hetgeen er
gebeurde. Een gevoel van schaamte belette hem, zich naar
Godmaerts woning te begeven. Zou hij zeggen, dat dit alles onder
zijne oogen geschied was, zonder dat hij iets had kunnen doen om
het te beletten?
Nu de stormers door de onmacht der regeering van straffeloosheid
verzekerd waren, gingen zij voort met alles in de stad aan stukken te
houwen. Geen beeldje lieten zij op poort of muur ongeschonden
staan. En wanneer de vreedzame burger zich tegen hun geweld
wilde verzetten, werd hij door deze booswichten wreedelijk
mishandeld en met smaadwoorden bejegend. Een oneindig getal
inwoners, die over de gevolgen dezer goddeloosheid en vernieling
verschrikten, vielen van de zijde der hervormers af.
De zon had zich van wolken ontdaan. Heerlijk en prachtig zond zij
hare stralen boven de puinhoopen, die overal op de openbare
plaatsen bijeengezameld waren. Afwisselende scharen van
ontelbare menschen stroomden met blij gejuich door de stad.
“Heil! Heil!” schreeuwden zij, alsof eene razende vreugde hen dol
had gemaakt. Bijlen, ladders, koorden en meer ander werktuig
werden door hen zegepralend rondgedragen. Wanneer zij, aldus
loopende, op den gevel van eenig gebouw nog een beeld, hoe hoog
het ook ware, bemerkten, klommen zij, door het grauw toegejuicht,
naar boven, en het beeld viel dan onder het geroep: Heil! Heil!
kletterend en verbrijzeld op den grond.
Alle winkels waren gesloten, alle kerken beroofd, de gevels van
alle huizen en openbare gebouwen geschonden. Puinhoopen van
kostelijk marmer belemmerden de kruisstraten. Het scheen, dat de
Antwerpenaren, door uitzinnigheid verblind, hunne huizen niet meer
bewonen wilden en hunne eigene stad met hardnekkigheid
vernielden.
Van deze gruweldaden geschiedden er vele op de markten en in
de straten, waar Lodewijk voorbijging. Zoo zag hij voor de St.
Jakobskerk eenen grooten hoop beelden, kruisen en vele andere
gewijde zaken in een groot vuur, dat de stormers aangestoken
hadden, tot assche verbranden.
Op den namiddag ging hij voorbij het Minderbroedersklooster,
alwaar men bezig was met plunderen. De broeders en priesters
werden met spotternij en mishandeling verjaagd en vervolgd. Dit
ziende, verschrikte Lodewijk hevig, daar hij aan pater Franciscus
dacht, dan eerst ontwaakte hij uit de radeloosheid, welke hem dien
ganschen dag tot een gevoelloos mensch gemaakt had. Hij hief het
hoofd op; een nieuw vuur blikkerde in zijne oogen, en hij wendde
zich met haastige stappen naar de Veemarkt, om pater Franciscus te
gaan vinden en hem van mishandeling te bevrijden, indien het
mogelijk ware.
Dáár komende, vond hij voor het Predikheerenklooster eenen
ontelbaren hoop beeldenstormers, die hem den doorgang beletten.
Met veel moeite, na lang drukken en stooten, geraakte hij eindelijk
binnen in het klooster, dat met booswichten en dieven was vervuld.
Hij zag hen om de zilveren kandelaren vechten, hoorde de
schandelijkste vloeken tegen de welfsels bonzen, en vond den refter
vol dronken menschen, die in onzedige liedekens en lasterende
spotternijen zich vermaakten.
Lodewijk ging dwars door deze goddeloozen en gaf geene acht op
hunne scherts; hij klom de trap op, om zich naar de cel van pater
Franciscus te begeven, en kwam weldra op het eerste verdiep, waar
hij weinig volk aantrof.
De cellen stonden open, alles was binnen deze doodstil; eenige
deuren waren aan stukken geslagen als een teeken der
balddadigheden, die men hier gepleegd had. Reeds klopte het hart
van den jongeling langzamer; zijn hoofd viel met moedeloosheid
voorover, en er was weinig hoop meer in hem, alhoewel hij nog
voortstapte door den gang, wanneer hij op eens eenige stemmen
van verre zegepralend hoorde roepen:
“Hier hebben wij nog eenen paap! Werpt hem op de straat, dien
hond!”
Lodewijk sprong vooruit, smeet drie of vier mannen van de celdeur
weg en deed eenen stap in het kleine vertrek, terwijl de verbaasde
stormers elkander met ondervragende blikken bezagen.
Pater Franciscus lag, zoo lang hij was, met het aangezicht tegen
den grond voor een kruisbeeld uitgestrekt, zijne zilveren haren
raakten van wederzijden den vloer. Van tijd tot tijd deed hij eene
beweging als om de handen hemelwaarts te heffen, en eenige
vurige woorden, die zijnen mond ontsnapten, getuigden, dat hij bezig
was met bidden.
Er ontstond in den geest van Lodewijk eene gedachte om al de
spotters, die aan de deur stonden, te dooden; hij kon dit doen, want
zij waren weinig in getal en niet gewapend; maar hij verliet welhaast
dit inzicht en wierp zich geknield nevens pater Franciscus, wiens
eene hand hij in de zijne nam. Dan sprak hij:
“Vader, hier ben ik, uw beminde zoon Lodewijk. Ik kom u redden.”
De priester rechtte zich op de knieën, bezag Lodewijk met eenen
dankbaren blik en antwoordde, terwijl hij de oogen op het Christus-
beeld gericht hield:
“Lodewijk, mijn goede zoon, ik dank u om uwe genegenheid: maar
ik zal u niet volgen. Hier, in deze cel, wil ik sterven, indien God over
mijn leven heeft beschikt. Laat mij bidden, stoor mij niet. Ik wil de
wereld verlaten met den naam des Heeren op mijnen mond. Ga
heen, denk niet aan mij.”
Lodewijk sloeg als verdwaald zijne twee armen om het hoofd des
priesters, tranen borsten uit zijne oogen, en hij snikte:
“Gij sterven! Gij, mijn goede vader! O, Geertruid zou mij
vermaledijden, indien ik u hier liet! Kom aan, de goddeloozen zullen
u mishandelen; zij zullen u vermoorden.... Het is nog tijd.... Ik zal u
verdedigen of sterven met u.”
“Lodewijk, mijn brave zoon, wees bedaard.... Zie, de kroon des
marteldoods wordt mij aangeboden; zou ik die weigeren? De Heer
heeft mij zeventig jaren gegund, ik ben niet ondankbaar.”
De jongeling plaatste zijne hand op den mond des priesters.
“Uwe woorden zijn heilig,” riep hij, “maar zij branden op mijn hart
als vuur! O, zie mijne tranen, denk aan Geertruid, aan Godmaert. Gij
alleen kunt ons troosten: uw dood zou uwen vriend Godmaert het
leven kosten; want nu durf ik het zeggen, en gij weet het, hij zou deel
hebben in den moord; uw bloed zou op zijn hoofd terugvallen,... hij
heeft uwe vijanden opgestookt.... Zult gij wreed genoeg zijn, o goede
vader, om hem die eeuwige wroeging op den hals te laden, om uw
eigen bloed over hem te werpen, en zijne dochter haren vader te
doen beschuldigen? Neen, niet waar, gij gaat met mij? Gij zijt te
edelmoedig, te goed om uwen evennaaste, uwen vriend, dit ongeluk
aan te doen!”
Gedurende deze woorden had Lodewijk den priester met geweld
doen rechtstaan, en trok nu als zinneloos aan zijne hand om hem uit
de cel te doen gaan.
“Ik zal u volgen,” sprak eindelijk de pater, “maar luister wel op
deze woorden, mijn zoon; want ik wil, dat gij ze volbrenget als een
onverbrekelijk bevel.... Misschien zal men u en mij bespotten en
mishandelen; gij zult lijden met mij, zonder gemor, zonder
tegenweer.... Wat er ook gebeuren moge, al ware het dat men mij
het leven name, zoo is mijn wil, dat gij niets doet om mij te
verdedigen of te wreken,... ik verbied het u. Zult gij daartoe moeds
genoeg hebben?”
“Ja, ja, vader, kom; ik zal alles verdragen.”
Zij gingen dan ter celdeur uit, onder de smaadwoorden dergenen,
die zich in den gang bevonden, en kwamen weldra in den refter,
waar zij door eenen hoop dronken mannen moesten gaan. Dezen
hieven een verward gejuich aan, zoodra zij den priester zagen.
“Een paap! Een paap!” werd er geschreeuwd.
In een oogenblik was pater Franciscus van het boos gespuis
omringd; allerlei lasteringen werden hem toegesnauwd: de een trok
aan de kap van zijn habijt, de ander spuwde hem bier in het
aangezicht; doch de priester ging, met de oogen nederwaarts
geslagen, langzaam voort en scheen voor al deze balddadigheden
gevoelloos; zijn habijt was aan flarden gescheurd, bier lekte van zijn
statigen schedel.
Lodewijks gelaat was schrikkelijk. Men kon er genoeg op lezen,
wat leeuwenrazernij hem verteerde, het wit zijner oogen was onder
en boven zichtbaar, zijne tanden waren op elkander gesloten, en hij
neep onwetend de handen des priesters te pletten. Nogtans hij
herinnerde zich het ontvangen bevel en deed geen teeken, dat
aanduidde, dat hij tegenstand wilde bieden.
Na vele mishandelingen geraakten zij eindelijk op de Veemarkt,
maar hier werd hun toestand nog verergerd. Eene ontelbare menigte
volgde hen; velen kwamen aan de ooren des priesters de
walgelijkste woorden, de bloedigste blasphemieën uitspreken;
anderen wierpen met slijk en vuiligheid, zoodat de zilveren haren
van pater Franciscus schandelijk met zand en modder besmeurd
werden. Reeds had Lodewijk meermalen gesmeekt en geroepen:
“O, vader, laat mij ze dooden, of mijne aderen barsten nog! Ik kan
niet.... niet meer stil blijven. Om Gods wil, laat mij u wreken en
sterven!”
Maar de priester antwoordde:
“Hoe schoon is het, Lodewijk, te lijden omdat men zijnen God
getrouw is. Denk aan de Christenhelden der oude tijden: zij werden
gemarteld, gebrand, gepletterd, maar in het midden der ziedende
olie, onder den klauw der leeuwen, kwam uit hunnen heiligen mond
geene enkele klacht, geen enkel wraakzuchtig woord; alleen staken
zij de handen op tot God, om vergiffenis voor hunne beulen te
vragen. Volgen wij hun voorbeeld, mijn zoon; misschien zullen wij
heden met de glanzende kroon der martelie voor den Heer
verschijnen!”
Bij den hoek der Zwartzusterstraat, aan de Koepoort, stond een
half opgebouwd huis, waarbij een hoop gebroken schaliën lag.
Even was Lodewijk eenige stappen daar voorbij, of hij hoorde een
stuk schalie aan zijn oor fluiten. Weldra vlogen meer schaliën naar
hen, totdat eindelijk eene daarvan tegen het naakte voorhoofd van
pater Franciscus bonsde en hem eene wijde wonde toebracht....
Lodewijk zag het bloed over zijn aangezicht stroomen....
Nu kende hij geene voorzichtigheid meer; nu vergat hij het bevel
van den pater en, zonder meer naar hem om te zien, liep hij tot
dengene, dien hij de schalie had zien werpen, en stak hem met
zooveel geweld zijnen degen door het lijf, dat deze langs den rug
uitkwam; hij zag rond om nog meer slachtoffers te vinden, maar al
de spotters hadden zich loopend tot op eenen tamelijken afstand
verwijderd.
Ondertusschen was pater Franciscus op de straat nedergevallen;
de slag der snijdende schalie had hem zoo wreedelijk getroffen, dat
hij machteloos ten gronde was gezonken.
Lodewijk naderde hem met eenen angstigen schreeuw, en, hem
half opheffende, sleepte hij hem tot tegen den muur van een huis,
waar hij hem zittend liet nederzakken. Terwijl waren de balddadigen
met meer woede genaderd en wierpen allengs meer en meer met
steenen, schaliën en vuiligheid.
Vol wanhoop, radeloos en niet wetende wat te doen om den
priester te bevrijden, ging Lodewijk vóór hem op zijne hurken zitten
en bedekte hem zoo met zijn eigen lichaam. Steenen vlogen
onophoudelijk tegen zijne leden, en menige pijnlijke gil ontsnapte
hem. Misschien ware hij lang in deze houding gebleven, maar een
gedeelte van het gespuis kwam langs eenen anderen kant staan
werpen, zoodat zij dikwijls den priester raakten. Deze, uit zijne
machteloosheid ontwaakt, wilde met geweld Lodewijk van zich doen
weggaan.
“Laat mij sterven,” sprak hij, “laat mij martelaar zijn, stel u niet
langer bloot voor mij.... ik zal voor u bidden in den hemel. Kom, mijn
brave, mijn dappere zoon, geef mij een afscheidszoen....”
Maar Lodewijk antwoordde niet; al zijne aandacht was op de
vliegende steenen gericht; al zijne zorg bestond daarin, dat hij met
zijne armen of schouders, als met een schild, het lichaam des
priesters beschutte. Dan, eindelijk werd het getal hunner vijanden
zoo groot, dat Lodewijk den priester niet meer bevrijden kon. Hij
wierp zijne twee armen om den hals van pater Franciscus en klemde
zich vast tegen zijne borst.
“Daar is de zoen, dien gij gevraagd hebt, vader,” riep hij, “maar het
is geen afscheidszoen.... Neen, sterven wij te zamen voor onzen
God. O, ik zal ook martelaar zijn.... Hoe schoon is die zekerheid!....”
Zijne stem verging, en hij verborg zijn hoofd tegen den boezem
van pater Franciscus.
Gewis hadde hij zich in deze houding laten dooden: maar een
zware steen, die tegen het lichaam van pater Franciscus bonsde,
deed eenen luiden schreeuw uit zijne borst opklimmen. Lodewijk
rukte zich los, sprong met verdwaaldheid op en blikte tusschen
eenen hagel van steenen de straten in, om te zien of er geene hulp
te bekomen was. Op eens zag hij van verre in de Koepoortstraat
eenige menschen, die hij kende, aankomen.
Eene uitdrukking van blijdschap liep over zijn gelaat, en hij
schreeuwde als met eene bovennatuurlijke stem:
“Wolfangh! Wolfangh!”
En dan bedekte hij weder den priester met zijn lichaam.
Bij den naam van Wolfangh schenen de steenen in de handen der
werpers vastgehecht te zijn; zij bestaarden elkander ondervragend
en blikten rond, of zij waarlijk den man zouden zien, die den
alomgevreesden naam van Wolfangh droeg.
Weldra kwamen er een tiental mannen bij Lodewijk: het waren
zijne vrienden, welke hij bij het stadhuis verlaten had.
“Wolfangh! Schuermans!” riep Lodewijk, terwijl hij van voor pater
Franciscus wegging, “ziet, zoo behandelen zij den beste aller
menschen, een zeventigjarigen priester!”
“Ha!” riep Wolfangh als met vreugd, “er zijn boozer menschen dan
ik! Het bloed der moordenaars gaat stroomen!”
Dan wierp hij eenen medelijdenden blik op pater Franciscus en
eenen metenden blik op degenen, die hem mishandeld hadden: hij
nam in iedere hand eenen moordpriem en trok zijn hoofd tusschen
de schouders.... er kwam een geloei uit zijne borst als uit de keel van
eenen wilden stier.... en, eenen stormram gelijk, wierp hij zich
vooruit....
Eer Schuermans en de anderen hem volgen konden, lag er reeds
menig booswicht in zijn bloed te spartelen; en na een oogenblik was
in al de aanpalende straten geen enkel mensch meer zichtbaar.
Alleenlijk hoorde men in de verte den schreeuw: “Wolfangh!
Wolfangh!” als eenen schrikverwekkenden roep aanheffen.
Dan kwam Wolfangh terug bij pater Franciscus; hij bezag met
innige verontwaardiging het edel gelaat des priesters, dat nu onder
een masker van slijk en bloed onkennelijk was gemaakt, maar, na
eene wijl als verslagen op dit tooneel gestaard te hebben, verliet hij
Lodewijk en zijne vrienden, en liep naar de deur van het
tegenoverstaande huis. Ondanks zijn kloppen en roepen werd er niet
opengedaan.
Wolfangh ontvlamde in razernij, wanhopig wrong hij den ijzeren
klopper der deur krom, doch eensklaps hernam zijn ontembaar
gemoed de overhand: een oogenblik later stond hij voor de deur met
eenen arduinen dorpel, dien hij bij het afgebroken huis gehaald had.
Slot en grendel sprongen af.... De deur viel bonzend neder.
Kort daarna kwam Wolfangh uit het huis geloopen, in de eene
hand hield hij eene kom met water en in de andere eenige linnen
doeken. Hij knielde neder bij den priester, waschte zijn hoofd en
aangezicht, en verbond zijne wonde met zooveel behendigheid, dat
men hem voor eenen heelmeester zou hebben kunnen aanzien.
Nu kon men bemerken wat schrikkelijke verandering er in pater
Franciscus was omgegaan. Het verloren bloed had hem al zijne
krachten ontnomen; zijn ingevallen gelaat was meer dan bleek, het
was aschvervig en doorschijnend; zijne lippen waren van dezelfde
kleur als de moorddadige schaliën, die rond hem lagen. En nogtans
er blonk op het aangezicht des priesters eene hemelsche uitdrukking
van onderwerping aan den wil des Heeren, een glimlach als die der
engelen.
Lodewijk zat insgelijks bij pater Franciscus geknield en hielp
Wolfangh in het verbinden der wonde. Het was meest op Lodewijk,
dat de priester zijn verflauwend oog gericht hield.
“Ho, gij zult gered zijn, goede vader,” sprak de jongeling met
teederheid, “uwe wonde zal genezen. Gij zult nog langen tijd onze
beschermengel kunnen zijn.”
“Lodewijk, mijn dierbare zoon,” zuchtte de priester, “de Heer heeft
over mij beschikt. Hij heeft mij de kroon der martelie vergund. Ik zal
sterven. Niet van de wonde, die gij verbindt; maar een steen, — de
laatste, — heeft mij de borst ingedrukt. Ik voel het in mijn lichaam:
mijne ziel doet geweld om zich los te rukken; zij wil hemelwaarts....
doch ween niet om mij; mijn lot is te schoon.”
Op deze rede antwoordde Lodewijk niets; alleenlijk staarde hij met
stijve blikken op des priesters gelaat.
“Gij bemint mij dan zeer?” sprak pater Franciscus, terwijl hij
Lodewijks hand drukte.
Die woorden deden de tranen als beken uit de oogen des
jongelings stroomen.
“O ja, gij bemint mij zeer!” herhaalde de priester. “Ik zal voor u
bidden, Lodewijk.”
Nu werd pater Franciscus door Wolfangh en Schuermans
voorzichtig opgelicht, met alle voorzorg ondersteund en langzaam
naar de Keizerstraat voortgeleid, terwijl Van Halen en de andere
vrienden van Lodewijk zich bereid hielden om den eersten spotter
het leven te benemen.
Zij kwamen eindelijk aan Godmaerts woning en werden door
Theresia binnengelaten.
X
Gloria in altissimis Deo, et in terra pax hominibus bonae
voluntatis.
Luc. Cap. ii. v. 14.
Glorie aan God in den Hooge, en vrede op de aarde aan de
menschen van goeden wil.

Godmaert en zijne dochter Geertruid zaten nevens elkaar in de


boekzaal; zij deden niets en waren in dien staat van angst en
afwachting, die al de denkingskracht des menschen op een punt
vereenigt. Sedert een half uur hadden zij nog niet gesproken, zij
schenen te slapen met opene oogen. Reeds wisten zij, hoe al de
tempels beroofd, geplunderd en ontheiligd waren, hoe men de
geestelijken verjaagd en mishandeld had. Godmaert weende in het
binnenste zijns harten over de hulp, die hij den ketters weleer
verleend had; hij dacht met ijzing aan pater Franciscus, wiens lot hij
niet kende.
Niet min was Geertruid door schrikkelijke gedachten gefolterd.
Sedert den vorigen nacht had zij Lodewijk niet gezien. Niemand had
haar over hem eenig bericht kunnen geven. Pater Franciscus was in
hare woning niet verschenen, hij die anders in alle droeve of
gevaarlijke voorvallen als een schutsengel aan hare zijde stond!
Hare angstige vrees, hare benauwde gepeinzen losten zich op in
dezen zucht, die dikwijls op hare lippen dreef: ho, zij zijn dood! zij
zijn dood!
Op eens kwam Theresia in de boekzaal geloopen, roepende als
verdwaald:
“Daar zijn ze! Daar zijn ze! Lodewijk met pater Franciscus!”
Een blijde schreeuw van Geertruid antwoordde op de
aankondiging van Theresia. De jonkvrouw stond op met de armen in
de hoogte en sprong vooruit naar de deur.
Maar toen zij de beslijkte kleederen van Lodewijk zag, toen zij
bemerkte hoe zijne handen met bloedige krabben als overdekt
waren en bovenal, wanneer zij op den priester blikte, dan werd zij
door eenen hevigen slag getroffen. Zij bleef bevend in het midden
der kamer staan, zond eenen grievenden gil door de zaal en zakte
ineen als een levenloos lichaam.
Godmaert sloeg zich de twee handen voor het aangezicht en
ontrukte zich aldus aan dit smartelijk tooneel.
De priester was bijna dood; hij werd door Wolfangh en
Schuermans veeleer gedragen en voortgesleurd dan ondersteund;
zijne verslapte beenen sleepten over den grond, ze hadden geene
kracht meer om stappen te vormen. Dan, zijn hart was nog niet
gebroken, zijn geest nog niet verdoofd.
Men plaatste hem met voorzorg in eenen armstoel; hij zonk zwaar
en beweegloos er in neder.
Ongetwijfeld had Geertruid het bewustzijn niet geheel verloren;
want zij ontwaakte van zelve en stond op. In deze omstandigheid
behield zij alleen de tegenwoordigheid van geest, die er noodig was.
Terwijl al de bijzijnde personen stilzwijgend op den priester staarden,
of met luider stemme klaagden, riep Geertruid de dienstboden van
het huis tot zich. Den een zond zij om eenen heelmeester, den ander
om een geneesheer; de overigen moesten kussens en linnen
doeken gaan halen of wijn en versterkende dranken aanbrengen.
Deze bevelen gaf zij bevend en als met de koorts bevangen. Dan,
zonder Lodewijk of iemand anders te bezien, ging zij tot den priester
en wilde hem op een goed bed doen leggen, doch hij stelde er zich
tegen en, de hand der jonkvrouw vattende, sprak hij, terwijl een
heldere glimlach op zijne aschvervige lippen speelde:
“Mijne dierbare dochter, spaar u die moeite, uw goede vader gaat
tot God. Pater Franciscus verlaat de wereld,... maar waarom zoudt
gij treuren over mij, terwijl eene ongekende blijdschap mij vervult? Ik
heb lang geleefd, mijn kind; de Heer heeft mij overladen met Zijne
gunsten, en nu, nu bewijst Hij aan mij, onwaardig mensch, de
grootste genade.... ik sterf voor Zijnen heiligen naam!”
Deze woorden deden op het gemoed der jonkvrouw eenen geheel
anderen indruk dan men hadde kunnen verwachten. In stede van in
tranen los te breken, verhelderde haar gelaat, iets, dat aan eenen
glimlach geleek, kwam hare wangen betrekken, en zij hield hare
oogen als in eene hemelsche bespiegeling op den priester
gevestigd. Die verandering kwam daaruit voort, dat zij op de bleeke
wezenstrekken van den pater iets heiligs, iets goddelijks zag blinken;
dat zijne woorden vol hemelsche vreugde haar hadden doen
gevoelen, dat zulke dood, indien hij voorvallen moest, waarlijk een
geluk en eene genade van God zou zijn. Zooverre vervoerde haar
deze geestontheffing, dat in haar hart de treurnis gansch verging en
door bedaardheid werd vervangen. Op de woorden des priesters
antwoordde zij zonder droefheid:
“O, ik versta u, goede vader. Ja, gij moogt sterven! Gij moogt de
wereld verlaten! En uwe Geertruid zal niet weenen, niet klagen; want
een schooner leven wacht u, de hemel opent zich om u te
ontvangen.”
Op dit oogenblik kwam er een geneesheer in de kamer. Zonder
iemand aan te spreken ging hij tot den priester, vatte zijne hand en
bezag hem met aandacht.
Al de tegenwoordig zijnde personen schenen eensklaps uit hunne
droefheid te verrijzen en naderden te gelijk bij den geneesheer;
Godmaert zelf deed den zetel, waarin bij zich bevond, tot bij den
priester rollen.
Na eene lange wijl van algemeenen angst vroeg Lodewijk aan den
geneesheer:
“Niet waar, meester Wallensius, er is nog hoop?”
De geneesheer antwoordde niet; maar Lodewijk zijne vraag
weldra herhaald hebbende, liet hij de hand van den priester zachtjes
nedergaan en sprak met dorre stem:
“Nog een half uur, ten langste!”
Op die akelige woorden volgde eene doodsche stilte. Godmaert,
die nu bij de zijde van pater Franciscus gezeten was, sloeg zijnen
arm om den hals van zijnen lijdenden vriend en verborg het
aangezicht op zijne borst. Tusschen eenen vloed van tranen, die
onzichtbaar uit zijne oogen op des priesters kleederen leekten,
zuchtte hij:
“O vader, vriend, herhaal mij, dat gij mij vergeeft; want de
wroeging scheurt mij den boezem. Ik weet het, een gedeelte van uw
onnoozel bloed moet op mij terugvallen, indien uwe gebeden het niet
van mijn hoofd keeren. Vergeef mij! Ik heb mede de tempels van
mijnen God geschonden; ik heb het oude geloof helpen verdelgen,
en in al de begane heiligschenderijen heb ik een schrikkelijk deel;
want ik heb mijne stadgenooten aangedreven tot de
balddadigheden, die u het leven kosten. O, vergiffenis!”
Godmaert bezag op dit oogenblik het gelaat des priesters; een
engelenglimlach blonk hem tegen, eene uitdrukking zoo treffend en
zoo zoet, dat hij de koude hand van pater Franciscus aan zijne
lippen bracht en eenen dankbaren kus er op legde.
“O, gij hebt mij vergeven!” riep hij met blijdschap.
Des priesters oogen begonnen te breken; dit was zichtbaar. Hij
antwoordde in het eerst niet op Godmaerts klachten, maar
vereenigde al de kracht, die hem overbleef, alsof hij voor de laatste
maal spreken ging. Hij wenkte dan door eene lichte beweging des
hoofds Lodewijk en Geertruid, en zeide met flauwe stem, zoodra zij
bij hem stonden:
“Nu, mijne kinderen, — nu ga ik sterven, — ik voel het!”
De wijze, waarop die woorden uitgesproken werden, liet niet den
minsten twijfel over hunne waarheid. Geertruid zonk op hare knieën
voor den priester en dwong Lodewijk, in dezelfde houding nevens
haar te zitten.
De stervende pater ging voort:
“Godmaert, ja, gij hebt gedwaald — en gezondigd; — maar uw
berouw is innig.... In den naam van den God.... wiens dienaar ik ben,
— ik vergeef het u!... Treur niet door de vrees, dat de vijanden van
ons geloof — zullen zegepralen.... De kerk van Jezus Christus is
onverdelgbaar.... uit de vervolging put zij haren luister; — uit de
bevechting hare macht.... Wolfangh, de abt van St. Bernard — zal u
zeggen wat gij doen moet.... Het kloosterleven zal uwe driften
temmen,... gij zult genade vinden bij den Heer!... Liefste kinderen,
hebt dank om uwe genegenheid tot mij. — Wankelt nooit in uwe
warme liefde tot God, in uwe vaste trouw aan het eenig zaligmakend
geloof.... Geertruid, Lodewijk, — gij zult vereenigd zijn,... wanneer de
kerk — haar rouwgewaad — zal afgeworpen hebben.... Uit den
hemel.... zal — mijne ziel — over uwe kinderen — waken. — Zijt
gelukkig!.... bemint elkan.... der.... en....”
Zijne stem verging en werd onvatbaar. Door een laatste spanning
zijner levenskrachten hief hij de rechterhand boven het hoofd der
knielende gelieven, en scheen hen biddend te zegenen. Zijne hand
viel weldra ontzenuwd neder. Hij hief nog eens de oogen
hemelwaarts, en als een licht, dat, uitgaande, nog eene heldere
sprankel van zich werpt, sprak hij met klare stem deze schoone,
deze verhevene woorden:
“Gloria in altissimis Deo.... et.... in terra pax hominibus!...”
*** END OF THE PROJECT GUTENBERG EBOOK HET
WONDERJAAR: EENE GEKKENWERELD ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright
in these works, so the Foundation (and you!) can copy and
distribute it in the United States without permission and without
paying copyright royalties. Special rules, set forth in the General
Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree to
abide by all the terms of this agreement, you must cease using
and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project
Gutenberg™ works in compliance with the terms of this
agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms
of this agreement by keeping this work in the same format with
its attached full Project Gutenberg™ License when you share it
without charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country where
you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of the
copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information

You might also like