Making The Case - Investigating Large-Scale Human Rights Violations Using Information Systems and Data Analysis
Making The Case - Investigating Large-Scale Human Rights Violations Using Information Systems and Data Analysis
ISBN: 0-87168-652-X
This report is a product of the American Association for the Advancement of Science (AAAS) Science and
Human Rights Program which operates under the oversight of the AAAS Committee on Scientific Freedom and
Responsibility (CSFR). The CSFR, in accordance with its mandate and Association policy, supports publication
of this report as a scientific contribution to human rights. The interpretations and conclusions are those of the
authors and do not purport to represent the views of the AAAS Board, the AAAS Council, the CSFR, or the
members of the Association.
vii
Finally, we would like to acknowledge the United Nations missions, truth commissions and
non-governmental organizations with whom we have worked: in El Salvador, the non-governmental
Human Rights Commission (CDHES); in Guatemala, the Commission for Historical Clarification
(CEH), the International Center for Human Rights Research (CIIDH), the UN Verification Mission
for Guatemala (MINUGUA), and the Catholic Church’s Interdiocesan Project for the Recuperation
of Historical Memory (REMHI); in Haiti, the National Commission for Truth and Justice (CNVJ);
and in South Africa, the Truth and Reconciliation Commission (TRC). On behalf of the experts, we
would like to say that we have felt honored to have had the opportunity to contribute to these
projects, and we wish our future colleagues in human rights information management all the best.
viii
Author Biographies
Patrick Ball, Ph.D., is Deputy Director of the American Association for the Advancement of Sci-
ence (AAAS) Science and Human Rights Program. Since 1991, he has designed information man-
agement systems and conducted quantitative analysis for large-scale human rights data projects
for truth commissions, non-governmental organizations, tribunals and United Nations missions in
El Salvador, Ethiopia, Guatemala, Haiti, South Africa, and Kosovo. His 1997 Ph.D. dissertation
“Liberal Hypocrisy and Totalitarian Sincerity” examined the roots of the non-governmental human
rights movements in Ethiopia, Pakistan and El Salvador. AAAS has published three previous
books by Dr. Ball: Policy or Panic? The Flight of Ethnic Albanians from Kosovo, March-May
1999 (2000), Who Did What to Whom? Planning and Implementing a Large Scale Human Rights
Data Project (1996), and State Violence in Guatemala, 1960-1996: a Quantitative Reflection
(1999, with Paul Kobrak and Herbert F. Spirer).
Louise Spirer is an independent scholar, editor, and author in the field of human rights. Co-author
of articles on human rights, she is the editor of newsletter of the American Statistical Association's
Committee on Scientific Freedom and Human Rights, a member of the Board of Directors and
Treasurer of the Institute for the Study of Genocide, and a co-author of the AAAS publication
Data Analysis for Monitoring Human Rights.
Sonia L. Zambrano Gómez is a Colombian anthropologist and lawyer. She has worked in this
country as human rights researcher, and she has written publications about this subject. She also
worked for the Historical Clarification Commission of Guatemala, as Director of the Database.
Themba Kubheka is Deputy Director in the Information Technology of the Department of Land
Affairs in the South African Government. His main function is to empower regional management to
participate in the broader Information Technology plan. Themba has worked for Macro Interna-
tional Inc – a US based multinational funded by USAID - as their Management Information System
Specialist. From April 1996 to February 1998, Themba worked for the South African Truth and
Reconciliation Commission (TRC) as its Information Coordinator. Later he also served in the posi-
tion of the Documentation Officer. In his 15 years in IT, Themba has conceptualized, designed and
written numerous computer applications. In his most recent experience with the TRC, he assisted
in the development of the database and the processing of the Human Rights Violations statements.
Lic. Oliver Mazariegos was the programmer and systems administrator for Guatemalan Arch-
bishop’s Human Rights Office "Proyecto Interdiocesano Recuperación de la memoria Histórica"
(REMHI).
Rocio Mezquita, B.A., has worked on human rights projects as a data processing professional
REMHI as well as in the Guatemalan Truth Commission. She was previously an intern with Am-
nesty International/USA and worked as an election observer in the former Yugoslavia with the
Organization for Security and Cooperation in Europe (OSCE). She is presently working as a re-
searcher in Guatemala, in a human rights project at the Center for Legal Action in Human Rights
(CALDH).
Gerald O'Sullivan was the National Information Systems Manager for the South African Truth and
Reconciliation Commission. He has been in the IT industry since 1981, working primarily on finan-
cial and management information systems in South Africa and abroad as an exiled war resister. He
is currently the Director of Information Systems in the Department of Land Affairs, implementing
GIS technology to facilitate the redistribution of land.
ix
Humberto Sequeira, is the senior software programmer and database designer at Solo Software
Development in Panamá where specializing in Point of Sale software for Hospitality environments.
The most challenging and rewarding job he has been part of is the Truth Commission (CEH) in Gu a-
temala. He is very interested in how new technologies in software, communications and database
development will fill the technical gap between data and Human Rights researchers.
Ken Ward, B.S., is a computer database consultant. He has designed Human Rights Violations
database systems for the United Nations Mission in Guatemala and various non-governmental
organizations in Cambodia. He has designed several systems related to the Central American Peace
Process in El Salvador and Guatemala and has also worked as a Human Rights investigator in Gu a-
temala.
x
Introduction
Patrick Ball and Herbert F. Spirer
Overview
Telling the truth in such a way that it cannot
be denied is the first need of a truth commission To the reader:
established in the aftermath of gross human vio- This introduction summarizes our concept
lations. The magnitude of violations is often so of the relationship of the information man-
great that individual researchers cannot appre- agement system issues to the truth telling
hend the complex nature and multiple patterns of process. In the course of this summary, we
such crimes, building an official history from a frequently reference sections and chapters
collective memory is essential to truth telling. in these proceedings. To facilitate your use
This is our concern in these proceedings: build- of this introduction as a guide, we have
ing such a collective memory, and the analysis of given the relevant references in boxes such
the past through examination of that memory. as this, associated with the related text.
While the primary goal of truth telling is to
provide massive and objective support for his-
torical facts and patterns that cannot be denied, it also serves an “internal” role for those who
analyze the past to make the official record. Without an accurate and precise collective memory that
can be readily accessed, they will not be able to check their assumptions about the process of vio-
lations, or provide credible analyses.
The official record is derived from the collective memory, and the collective memory is based
on information and data. The systematic arrangement of the information and data is the basis of
the information management system.
These proceedings are about all aspects of how to build, manage, and generate analyses from
such a system. They provide an accessible handbook to guide truth tellers who want to build on
the lessons learned in these several information systems.
In this introduction, we discuss the conceptual
Fundamental to our concept of truth issues pertaining to the use of information manage-
telling in human rights is determination ment systems in the truth telling process. The dis-
of who did what to whom and how. You cussion is grounded in the theory and application
will find this concept discussed in detail
presented in the papers in these proceedings.
In Chapters 1, 3, 4, 6, and 9.
Purposes
When an organization concerned with truth telling in cases of gross crimes against humanity –
an official truth commission or a non-governmental organization – sets out to write official
histories, it often undertakes massive research projects.
These projects may use hundreds of people working in
Chapters 3 and 4 discuss the South
thousands of communities to acquire information. The African Truth and Reconciliation Com-
organization may be charged with gaining an overall mission (TRC), the largest human
understanding by generalization based on the entire rights data project ever conducted.
body of evidence in addition to reporting on individual
cases.
1
Introduction
These tasks require bringing all the collected information together and analyzing it. By so do-
ing, what all the many individuals in the organization have discovered becomes the organization’s
understanding of the truth.
Through these projects, the organizations that document large-scale human rights violations
collect much more information than any one investigator can remember or fully encompass. Further,
they may perform general analyses or correlate information from geographically dispersed sources.
Information about a given case could be given to any member of the teams of investigators, who
may number in the hundreds. In a given case, partial information could be given by people in the
southwest of the country (where the case happened), while other information about the case is
given to investigators in the northeast (where survivors fled after the incident). An investigator
working on this case may not know that other investigators in a different part of the country have
found complementary information.
The information management system provides a collective memory and the ability to relate in-
formation from different sources. By so doing, it allows anyone in the organization to access infor-
mation collected by any investigator, without restriction. An information management system used
for these purposes is a process by which information is collected, standardized, represented in a
database, and then analyzed by a variety of methods. The database – the computers and software
in which the data reside and by which it is processed – is not “the system,” it is a major component
of that system. The human rights narratives collected by the organization are complex, as are the
legal and social science processes used to classify components of human rights stories. The com-
plexity of the information management system and in particular, the database, reflects the complex-
ity of the narratives and the legal and scientific concepts necessary to serve the cause of truth
telling.
To effectively make information widely available
Standardization, classification, and
with precision and consistency, the information
categorization are discussed in all the
management system must standardize the classifica- chapters. Particularly detailed examples
tion and categorization of information. For example, of both the technical and managerial
if a witness reports to the commission that a person issues involved appear in Chapters 3, 4,
was tortured, the appropriate information system 6, 9, and 12.
personnel decide whether the acts described by the
deponent fit the organization’s definition of torture. When witnesses and victims describe where
events occurred, they often describe the location in casual terms. To convert this narrative informa-
tion to data that will represent the truth in the database, the data processors must, for example, de-
cide where on a map the events happened, and classify the events by suitable location designa-
tions. Painstaking and precise classification is necessary to assure that the data are of high quality,
but not sufficient to do so. The entire system must also be of high quality for the system outputs
to be credible and valid.
2
Patrick Ball and Herbert F. Spirer
Second, critics may argue that the chosen interview subjects are not representative of the
population of all victims. Even if a group has taken testimonies from many thousands of subjects,
there are probably many others who were victims or witnesses of human rights violations but were
not interviewed. The data might therefore be biased, reflecting only the knowledge of those who
were subjects. In this context, bias means that in some way the patterns shown by the data are a
systematically distorted reflection of the historical reality. We discuss bias in more detail later in
this introduction.
Third, critics may argue that the data are inadequate substantiation for the organization’s ar-
guments. For example, an organization might find in their data there were 100 killings reported for
the year 1978, yet only 10 killings were reported for the in the prior period from 1960 to 1977. On
this basis, the organization might want to argue that 1978 was a watershed year of dramatically in-
creased violence. A critic might respond that showing only 10 killings in the prior seventeen-year
period reveals that the organization failed to adequately investigate that period. If the critic is able
to show even a few killings from the 1960-1977 period that were excluded from the original analysis,
the entire argument might be doubted.
If interview subjects have been chosen by appropriate probability sampling methods, all three
criticisms may be rigorously evaluated (and hopefully rejected). The use of probability sampling
allows the analyst to scientifically determine that the results are valid within a measurable margin of
error (the confidence interval). In practice, few human rights projects can use probability sampling.
Such sampling can be technically complex and is time-consuming, costly to administer, and difficult
to carry out in the chaotic conditions that follow gross human violations.
Some human rights projects assume that conducting an interview with a witness may help that
witness come to terms (psychologically) with what happened. Thus, those projects invest re-
sources in taking more interviews, rather than of obtaining fewer interviews by scientifically rigor-
ous methods. Also, in the event of large numbers of deaths – many of which were not witnessed
by any survivor – the sampled population is not the same as the target population.
Some human rights projects claim that their data are valid because they collected “very large”
numbers of interviews. On the surface, “very large” is scientifically meaningless, for who is to de-
cide what is “very large”? Should this term be referred to an absolute number, such as “several
thousand” interviews, or “more than 5,000.” The numbers of testimonies collected for three of the
projects described in these proceedings are 7,000 for the CEH and the Haitian National Commission
for Truth and Justice, and 21,000 for the TRC. Or should it be based on a relative amount, some
percentage of the estimated total number of witnesses, survivors, or victims? And once again, who
sets a satisfactory threshold for a “sufficiently high” percentage? And furthermore, how does the
project estimate the total number of witnesses, survivors, or victims?
It is possible to answer the question as to how large is large enough. The critical assumption is
that the project has collected enough interviews to merit the statistical findings, if it is unlikely that
an equal or larger number of interviews would tell systematically different testimonies. It is certain
that there are some interviews that tell different stories, but if enough interviews have been col-
lected, it may be implausible that there are enough potential (but omitted) witnesses whose stories
are so different that the findings would change substantially if the omitted witnesses were in -
cluded. After collecting thousands of testimonies, and if other kinds of data are available about the
patterns of gross human rights violations, we can test for bias using certain analytical methods.
We describe some of these methods in the analytical objectives and bias sections below.
It is basic to the process that in practice a human rights organization cannot document every
violation that may have occurred, if for no other reason than the fact that many victims may have
been killed without witnesses and without any remains. Thus, the truth-telling human rights or-
ganization must define its broad analytic objectives explicitly and with attention to the needs and
resources. Despite resource limits on the depth and scope of the work of the organization, the or-
ganization’s sponsoring bodies may mandate that it gets a "complete" picture. To the non-
scientific personnel on the body that makes this mandate, this might mean that the organization is
to document every violation. Even recognizing the above limitation on collecting complete data,
this is enormously expensive. With limitations of time, of availability of skilled personnel, and of
jurisdiction, it is undoubtedly impossible. In their negotiations concerning their objectives and in
their final report, the organization must clearly explain these limitations. The organization may only
be able to ascertain patterns and trends, and cannot enumerate every possible violation. Given a
3
Introduction
general mandate, the organization must be prepared to explicitly state its analytical objectives.
Typical objectives are listed in the next section.
4
Patrick Ball and Herbert F. Spirer
meaningful questions in terms that can be answered by analysis of the data, and analysts who can
implement the relevant statistical methods.
Collecting Information
The first step in information management is
The successive steps involved in an in-
data collection, the process of getting information
formation management system are Col-
to manage. For most truth telling organizations, the
lecting Information, Data Processing
primary source of information is interviews with (Classification and Coding), Database
victims and witnesses of gross human rights Representation, and Generating Analytical
abuses. Other sources are documentary records of Reports. All chapter titles reflect this
non-governmental organizations and reports in the structure.
various forms of public media.
Assuming that the dominant source is interviews, the first priority is to design an interview
process (forms, approaches to the subjects, training programs for data collectors, and so forth). A
primary goal of this design is to assure that the person giving testimony (the deponent) will feel
that his suffering has been acknowledged and made a
You will find this issue discussed in part of the public record. As mentioned earlier, many
Chapters 3 and 6, with reference to the people in truth telling organizations believe that giving
collection of information in South Africa the deponent an opportunity to be heard is a cathartic
and Guatemala. process. Although recent research has questioned
these premises, it is still clear that a conversational
interview mode, in which power is shared between the
interviewer and the statement giver, is much less likely to re-traumatize people relative to an inter-
rogation using closed-ended questions and an aggressive or police-style interrogatory style. In
addition, and the quality of data obtained by interrogation methods is not as good as that obtained
by conversational methods. While researchers have questioned these premises as general princi-
ples, in any given case they may apply.
However the interview is structured, the information
For a flow chart of a data model must be gathered so that the data processors can determine
that reflects these relation- who did what to whom from the interview notes. The inter-
ships, see Figure 4 in the sec- view process must be designed to manage even the most
tion The Data Model of Chapter complex stories. The narrative is often complex because each
4. narrative can contain from one to many victims, violations,
and perpetrators, and they may be related to each other
through complicated relationships. Because individuals remember in different ways, important
questions should be asked several times in different ways, via direct questions and in open narra-
tives.
The basic elements of a human rights narrative are:
Many victims
A deponent may speak about gross or associated violations that happened to one victim, or
that happened to many victims. Her story, for example, may discuss only her own detention and
subsequent torture. However, in addition to her own story, she may speak about her son’s killing
and her husband’s disappearance. The witness may or may not herself be a victim.
Many violations
Each of the victims described in the statement may have suffered one or more gross violations.
For example, the witness’s son may have been detained and tortured on several separate occasions
before he was killed. These violations may have been connected to other violations that occurred
at the same time and place (e.g., several different people who were detained and tortured together),
or they may have been isolated incidents.
Many perpetrators
Each of the violations described in the narrative may have been committed by one or more
identifiable perpetrators, or by one or more
The UN Verification Mission in Guatemala unidentifiable perpetrators. The witness may
(MINUGUA) used method 1) in reports prior to or may not have seen the violation occur. For
1996, but then reformulated their system (see example, she may have been notified that her
Chapter 5). TRC statements after August 1996
were based on method 2). The data proces-
sors used qualitative information to recover
uncoded additional violations (see Chapter 4) 5
The TRC statistics probably underestimate
violations that occur more than once to the
Introduction
son’s body had been found. In such a case, she might be unable to identify any perpetrators. If the
witness was herself a victim, she may be able to describe the organization to which the perpetrators
of her violations belonged. She may also have personally recognized one or more of the perpetra-
tors or the identity of the perpetrator’s organization. Furthermore, each of the identified perpetra-
tors in the narrative may have been responsible for one or more violations. For example, the witness
may identify the individual responsible for both her torture and her son’s killing.
In the interview process and all subsequent steps of data processing and representation, the
information system must maintain the identity of who did what to whom, without simplifying the
witness’s story in ways that distort it or systematically conceal certain kinds of information. The
decision either in the design of the system or in the implementation of the interviewing process to
accept a reduced version of a complex story is a frequent cause of this kind of distortion. For exa m-
ple, 1) a system might choose to represent only one of the violations that happened to a particular
victim, or 2) to represent only one of each kind of violation. Both of these choices distort the data,
and quantitative analyses based on these simplifications are not reliable. Fortunately, if there is
sufficient narrative information in the form of qualitative descriptions of what happened, data proc-
essors usually can recover good information from distorted interview forms, but at considerable
effort.
Data Processing
Data processors receive the essentially raw
data from the interview narratives and prepare it to Detailed descriptions of how data proc-
be entered into the database. In so doing, they essing worked at the CEH and at the
extract the names of victims, perpetrators, and or- TRC, respectively, can be found in Chap-
ganizations, and apply standard definitions of ters 3, 8, and 12.
types of violations and geographic locations. For
example, consider the following narrative:
Two days ago, heavily armed men in green uniforms came to my house and de-
manded to see my son. I asked if they had a warrant and I didn’t want to call my
son but they ignored my questions and threatened to fire their weapons into the
house if I didn’t open the door. My son heard them and came near the door.
They broke through the door, grabbed my son and were hitting him. Then they
took him outside and put him on a truck and drove away. I am pretty sure I rec-
ognized some of the guys from the local police station, but when I went there,
they claimed not to know anything about it. But a neighbor of mine heard from
his cousin who is a police officer that they had my son and they took him to the
military detachment over by the highway.
Data processors may take the information above and put it in a structured form as in the tables
of Figures 1a and 1b, below. Of course, the exact nature of the tables used depends on the design
of the particular information management system.
6
Patrick Ball and Herbert F. Spirer
P002 11 Sep 1999 Victim’s house, Nebaj, El Threat Local police P002
Quiché
P001 11 Sep 1999 Victim’s house, Nebaj, El Abuse of Local police P002
Quiché authority
P001 11 Sep 1999 Victim’s house, Nebaj, El Illegal detention Local police P002
Quiché
P001 13 Sep 1999 Police station, Nebaj, El Disappearance Local police, mili- P002
Quiché tary detachment
Figure 1 reveals several characteristics of the structuring of the data. First, as discussed earlier,
each victim can suffer one or many violations. Catarina (P002) suffered one violation (threat), while
Jaime (P001) suffered three violations (abuse of authority, illegal detention, and disappearance).
One perpetrator may commit some violations (such as the threat against Catarina), while more than
one perpetrator may commit other violations (such as Jaime's disappearance).
Second, the data processors are the people in
See Chapter 3 for a detailed discussion of the organization who take each story and decide
a creative approach to the process of de- whether the evidence is sufficient to classify the
fining categories and the resulting tables acts described in the story as violations according
of definitions. to the agreed definitions of the organization. Was
the beating the perpetrators gave to Jaime
Raimundo sufficient to be considered an abuse of
authority? The data processors apply the organization’s rules and classifications to make this deci-
sion. By applying these rules and standardizing the disparate
information, the data processors create an organizational memory Chapters 2, 3, 6, 8, and 12
that can be accessed by any member or part of the organization. give extensive listings of
The classification rules determine what the commission will be human rights violation
able to analyze. Thus, “What constitutes a violation?” is a ques- categories and associated
tion the commission should address at the earliest possible mo- definitions.
ment
Many of the concepts about human rights violations are hard to define, such as severe ill
treatment or massacre. These two concepts were central to the work of the South African and Gu a-
temalan commissions. In the Haitian National Commission of Truth and Justice, extortion emerged
as one of the primary human rights abuses committed under the de facto regime. After all of the
data had been processed once, the data processors had to revisit every case to re-code for extor-
tion.
If after all the data processing has been done, a category turns out to be important, the data
must be re-coded. Although this is time-consuming, re-coding is much faster the second time.
However, neither organization had a clear
See Chapters 6, 9 and 12 for discussions of the definition of these concepts until several
development of the concept of massacre in the months after data processing work had
CEH information management system. started. The data processors' work is to
apply definitions. Hence, when definitions
are unclear, the data processors are the first to initiate demands that the organization establish clear
working concepts. Unfortunately, such determinations involve many actors and are often influ-
enced by political factors. When the organization cannot obtain consensus on the definitions of
key concepts, the data processors must develop provisional working definitions in such a way that
they can later re-code the data when the debates are finally settled.
The data processors' work prepares the information to be represented in a computer-based da-
tabase, usually in a relational structure.
7
Introduction
Database Representation
There is a common tendency to conceive of the total process in terms of the computer hard-
ware and software components. However, specifying the hardware and writing the software are the
easiest parts of the work. A qualified database programmer can implement and test a human rights
database in about one month. In our experience, human rights
projects are so different from each other that it is ineffective The need for customization of
and inefficient to develop a standard software program that the database representation
must be customized for each project. In the six projects we per- and its implementation is
sonally have worked on in the last eight years, none of them discussed in Chapters 4, 5,
could have shared their database software with the others. This 6, 9, and 12. It is a primary
is the case even though they all shared certain design charac- concern in system design.
teristics. Today, all that is needed is that the software supports
relational structures; the computer language in which it is written does not matter. Good human
rights databases have been written in Paradox (in 1991-1993), Oracle, Access, and FoxPro.1
However, it is important for organizations to recognize that they will need a full-time staff pro-
grammer to write and maintain the software and to use queries to extract data in formats appropriate
for the analysts. Organizations too small to hire a programmer should contract with a private-sector
firm to write and maintain the software they need, or they may be unable to carry out their essential
functions in a timely manner.
When making decisions about software, decision-makers often think in terms of compatibility.
In human rights data projects, compatibility depends on the classification structures used by the
data processors much more than on the computer software used to store the data. If two systems
share the concepts and definitions about what human rights violations are, then a programmer can
transform the data from one software package to another no matter what software was used origi-
nally to implement the systems. In fact, analysts may transform the data into three or more different
formats to use different packages that offer different tools. If the systems have differences in their
concepts and definitions, then even if the databases are both written in the same program, the data
are incompatible.
Thus, from the perspective of an organization's leadership, the critical questions about the da-
tabase are: What does the database contain? What is the meaning of the information contained
there? We discuss these issues in the next section.
1
Ball, Patrick, Ricardo Cifuentes, Judith Dueck, Romilly Gregory, Daniel Salcedo, and Carlos Saldarriaga.
1994. A Definition of Database Design Standards for Human Rights Agencies. Washington, DC: American
Association for the Advancement of Science and Human Rights Information and Documentation Systems
International, a discussion of human rights database design, is available at
http://shr.aaas.org/dbstandards/cover.html.
8
Patrick Ball and Herbert F. Spirer
neous press from which we have two clippings. When all the evidence has been collected, the or-
ganization must decide how to save the information about the killing. If the evidence comes to the
organization in independent streams, the researchers may not recognize until later that all of these
pieces of evidence relate to the same incident. Confounding the issue is that the facts are often
slightly different among different sources. But if we save all the different pieces of evidence docu-
menting Mr. Perez’s killing, we will have six distinct representations of this one incident. Simple
statistics done on this information would count Mr. Perez six times, which is obviously an error.
Groups that choose to keep all the accounts simultaneously are deciding that the database is pri-
marily serving the first principal database function, as a faithful representation of the sources, and
not the second function, establishing the “true” event.
In the above example, an organization might try to eliminate the duplication by choosing one
of the sources and deleting the others. By keeping only one reference to Perez’s killing, the organi-
zation can make sure that their statistics are correct and clear – Mr. Perez will only be counted once.
Cleaning the data in this way is deciding that the database is to be a true representation of the his-
torical events, and thus deciding not to represent all the data that has been collected. This is a use
of the database in its second principal function, representing what is believed to have really hap-
pened. In effect, the database that has been created looks like what is shown in Table 3, below.
Note that although six records were created in this database, five of them have been deleted
(displayed by the crossed-out lines). These records are effectively lost, and are not available for
any organizational use.
This strategy has several drawbacks. First, the audit trail from analysis to Mr. Perez and back
to the source information will be broken. If a statistical finding that included this killing were chal-
lenged (for example, by attorneys for the alleged perpetrators), the database must be able to link the
statistic in question with all the source information that provided evidence for the statistic. Sup-
pose that the human rights organization has reported that there were six killings in County Y in
May 1983. One of the six reported killings is Mr. Perez, and so the database must now show how
the group knows that Mr. Perez was killed by connecting the statistic with all the source material.
Mr. Perez’s killing was quite widely documented, and the argument that this killing really happened
is relatively strong. However, if five of the six sources were deleted, we are now faced with a mas-
sive paper search for the original sources, and having to do a paper search indicates that the com-
puterized system has failed.
A second problem is that by deleting five of the six representations of the killing, we lose the
ability to look at exactly what was coded from each source. If we want to check the data processing
by reviewing the exact data that was coded and entered from Mr. Perez’s son’s testimony, we may
not be able to see the data because it was deleted in the data cleaning. Losing the connections
between sources and information they plan to report can seriously affect the effectiveness of the
organization.
For example, at the CEH, there is no stable count of how many interviews actually were con-
ducted. Field investigators took information from various interviews and composed “cases” which
were passed to the database team – the interviews were therefore merely raw material used by the
9
Introduction
field investigators to make cases. But from the point of view of the database, the interviews have
now been hidden behind the cases, and so it was impossible to count the interviews or to measure
which violations appeared in many interviews compared to violations that appeared in only one
interview. This limitation eliminated several additional layers of analysis that might have strength-
ened the projection of the total number of killings.
The third and most serious problem with deleting multiple points of information about the
same violations is that we also destroy the information that certain violations are more frequently
reported than others are. Perhaps Mr. Perez’s neighbor, Mr. Raimundo, was killed with Mr. Perez,
yet appeared only in one of the press clippings. But Mr. Raimundo was not mentioned in any other
source. What was different about Mr. Raimundo that led to his being nearly missed by this proc-
ess? Perhaps Mr. Raimundo was of a different ethnic group than Mr. Perez, and people of Mr.
Raimundo’s group have less access to the media. If we can identify what kinds of victims are less
frequently reported, then we may be able to assume that we have not documented many more vic-
tims of this kind. If, when people of Mr. Raimundo’s group appear in our database with a clear pat-
tern of less systematic reporting than people of Mr. Perez’s group, we may suspect that there other
people in Mr. Raimundo’s group who are being missed by our investigation. The numbers of such
people might be quite large. We might therefore direct investigative resources to Mr. Raimundo’s
group, or we might use a statistical correction to increase the number of killings projected to have
occurred to people of Mr. Raimundo's group relative to Mr. Perez's group.
The right way to handle multiple reports is to create two databases: the first includes all the in-
formation faithfully from the sources, and the second encodes the organization’s judgements about
what is true. Computer hard disks are inexpensive, and most of this work can be done by appropri-
ate software. Keeping the database in two different forms involves no more work than doing it once
and then deleting all the multiply reported violations. But instead of deleting the violations that are
judged to be the same, the user creates one record in the second database for this violation; and
this step can be automated to be a single mouse click for each new record. This new record is
linked to all the constituent original records in the first database that in the “delete the extras”
method would have been deleted. The resulting form of the records in the source and judgment
datasets is shown below in Tables 4a and 4b.
Table 4a. Sample source database of multiple reports of the killing of Juan Perez
Name Date Place Violation Source Link to judgement ID
10
Patrick Ball and Herbert F. Spirer
Note that it takes no more work to link the records (by creating the records in the judgement
database and linking them to the source data via the Judgement ID field) than it did to delete them.
For statistical analysis, we use the second database to check coding and audit trails. We use the
first database to measure reporting density (the relative frequency with which certain categories of
data are reported). Both structures serve important purposes.
Bias
In the statistical sense we are using here, bias does not imply that data have been chosen to
support an ideology, or that the data reflect implicit prejudice against ethnic or political groups. In
the statistical sense, bias refers to an effect, which deprives a statistical result of accuracy by sys-
tematically distorting it. This is different from a random error, which may distort on any one occa-
sion but balances out on the average. The random errors effect precision, but not accuracy. There
could be many sources of bias, including systematic technical errors or strategic misdirection that
led the organization to miss some parts of the reality they purported to study.
Oversimplification is the most common cause of bias introduced by technical errors. For exa m-
ple, the South African Truth and Reconciliation Commission (TRC) decided to represent only one
of each kind of violation that happened to each victim. The system, for example, recorded only that
each victim suffered one act of torture, one act of severe ill treatment, etc. For killing, this is not a
problem, since a person can only be killed once. But victims who are persecuted by their political
opponents may be detained and tortured on multiple occasions, or suffer repeated acts of severe ill
treatment. In the TRC’s representation, the count of the number of violations that could have hap-
pened to each victim on multiple occasions (severe ill-treatment, torture) was biased downward
relative to the count of killings. That is, the statistics on killings were a better representation of the
real patterns and trends in killings than the statistics on non-fatal violations. This bias is hard to
detect after the fact, but it is relatively common.
Often, when a critic charges that a human rights
These applications of overlap are study is biased, s/he means that the study is too intently
discussed in Chapter 11. focused on violations committed by one perpetrating
group. This is taken to imply that the analysis has ignored
or undercounted violations committed by some other perpetrating group.2 For example, in Guate-
mala some critics claimed that the various large-scale human rights data projects had overstated the
proportion of violations for which the state was responsible relative to the proportion for which the
guerrillas were responsible. Because there were three independent projects surveying the same
human rights situation, it was possible to test the hypothesis that the data were biased in this way.
The data in each of the three projects were divided into the cases attributed to the state and those
attributed to the guerrilla. The overlap rates among the three projects were measured for the state
cases and the guerrilla cases. If overall the projects had focused more on the state cases than on
the guerrilla cases, then there should have been a higher overlap rate among cases attributed to the
state because the investigations would have covered a higher proportion of the universe of cases.
However, there was no significant difference in the overlap rates of state cases and guerrilla cases,
which implies that the coverage rate was roughly the same over both perpetrators. In this example,
it was possible to say that taken together, the proportions of responsibility attributed to the state
and to the guerrillas were not biased relative to the proportion of violations in the universe of all
violations.
There is generally no way to argue that data are completely unbiased in every way. The best
defense against the charge of bias is to take scientific samples of people who will be interviewed. If
this is not feasible, and if the organization has access to different kinds of data from different
sources, comparisons can be made between analyses from different sources. If the sources agree,
then either they share the same biases or they are all roughly unbiased. If the sources disagree,
additional research would be required to explain how one or more of the sources might be biased.
2
A related form of this bias results when a critic challenges the objectivity of an organization’s work arguing
that “violations were committed on both sides” when in truth nearly all violations were committed by one
side. Such claims are based on the attribution of moral equivalence, and are often made by diplomats, the
press, commissions of inquiry, and other quasi-official processes professing objectivity.
11
Introduction
Conclusion
The sum total of our experts’ experiences are that if an organization effectively uses a well-
designed and properly supported information management system, the organization will find that
the credibility of their report’s conclusions is high enough that critics will prefer not to challenge
the scientific conclusions. This was the case for the final report of the CEH.
Clearly, the information management system is the critical element in achieving the ultimate
goal of a truth telling organization: To produce accounts of crimes against humanity that cannot be
denied.
12
Chapter 1
The Salvadoran Human Rights Commission:
Data Processing, Data Representation, and Generating Analytical
Reports
Patrick Ball
Introduction
In this paper, I describe the work I did as while working for the Salvadoran Human Rights
Commission (Comisión de Derechos Humanos de El Salvador, CDHES).1 Between 1979 and 1991,
the CDHES took more than 9,000 interviews that were recorded in written form as testimonies. We
planned to begin work in May 1992, in conjunction with other organizations which, like the CDHES,
were part of the Coalition of Non-governmental Human Rights Groups (Coordinadora de Organ-
ismos de Derechos Humanos). The organizations included among others, Legal Aid (Socorro Ju-
dico), the Human Rights Institute of the University of Central America, and the Human Rights Of-
fice of the Lutheran Church. For a variety of reasons, among which were political issues and the
perceived lack of adequate data, all except the CDHES withdrew from the group in June 1992.
This was one of the earliest large-scale human rights information systems projects. By and
large, the other projects discussed in this handbook, which came later, had fewer of the problems
experienced in this project. However, this project is important to gaining an understanding of the
issues involved in planning and implementing large-scale data projects for human rights violations.
Even today, there are many organizations which do not have database and analytical expertise
and which may be working through similar problems. They may find the discussion in this paper
helpful in their current work.
The goal of this project was to target individual perpetrator responsibility. Only a modest frac-
tion – about 125 – of the total of 9,000 testimonies were entered into the full data base and used to
provide reports targeting individual perpetrator accountability. Note that the fully processed testi-
monies were thoroughly documented and were the most important cases identifying individual
accountability. These cases were presented in their entirety to the truth commission by the CDHES.
Because it proved impossible to follow the planned process and enter very many cases into
the full system, we developed a parallel process into which we entered the entire set of CDHES
testimonies. This second process formed the basis for the analysis that gave this project its impact.
1
This was before I worked at AAAS. In fact, I first met then-AAAS Senior Program Associate Dan Salcedo
when he visited the CDHES in September 1992.
2
The database was written in the fourth normal form, which enabled a number of powerful search methods.
See (Ball et al. 1994). Note that several earlier databases were implemented in El Salvador
15
Chapter One: The Salvadoran Human Rights Commission
Packard laser printer which required us to write inline escape sequences to define font selection,
bold, italics, etc.
The second problem, slowness of entry, was not so easily resolved and was linked to the data
processing. When data processors were set to the task of coding the testimonies it was apparent
that we had greatly underestimated the time needed for data processors to extract the relevant data
elements from the testimonies (victim identifications, perpetrator identifications, violation types,
locations, etc.) and subsequently to enter the data.
A typical output from the automated report process is shown in Appendix 1.3 This case, num-
ber 85 from the set of 110 cases presented to the Truth Commission for El Salvador in October 1992,
is identified by the date of the complaint and the date of the event. The complexity of preparing this
report is concealed by the apparent simplicity of its presentation. Although it appears to be a
document that a user could type while reviewing the data manually it is, in fact, structured output
generated by a database. Since each case has a different number of victims, violations, etc., a com-
plex process is needed to generate this report. Among the tasks that a database can do for an or-
ganization, this kind of reporting can be very helpful to synthesize repetitive, detailed information
in easy-to-digest reports. The final presentation to the Truth Commission included about 600 pages
of text generated in this manner.
In case 85, shown in Appendix 1, the three victims are named in the “VICTIMS” section. Note
that there might be any number of victims, from one to several hundred. In the next block (“AGE,”
“SEX,” and “OCCUPATION”, personal data about each victim is reported.
The “TYPES OF VIOLATION” section lists all violations that were reported as being commit-
ted against each victim. Each victim could have suffered one or several violations, and different
victims might suffer different combinations of violations. The violation type is listed, followed by
the identification of the perpetrator(s) alleged to have committed it. Torture was listed separately
by type of torture and notes about each torture act were reported.
The database provided links to the officials alleged to have had command responsibility for
the units that committed the violations. These individuals are listed in the “PERPETRATOR” sec-
tion. Note that the number of perpetrators can vary according to the number of units alleged to
have participated in the event.
Lawyers who worked on the case drafted a narrative describing each event. Their legal work is
presented in the “LEGAL ACTIONS TAKEN” and “AVAILABLE DOCUMENTATION” sections.
Those witnesses willing to be identified appear in the final section, “WITNESSES.” The objective
of the CDHES for this presentation was to show the Truth Commission that the Salvadoran judici-
ary had taken essentially no action despite nearly 15 years of continuous legal activities on the part
of the human rights NGO community.
Coincident with this work, we had entered the entire command structure of the Salvadoran mili-
tary and security forces into a database structure like that defined in Ball et al. (1994).
3
This presentation format has been used subsequently by other NGOs. In July 1997, the International Cen-
ter for Human Rights Research (CIIDH) presented to the Commission for Historical Clarification (CEH) in
Guatemala about 140 of their 17,000 cases, along with lists of the people the CIIDH had registered as killed
or disappeared. The volume containing this information was more than 700 pages long.
16
Patrick Ball
The military career structure database showed which officers held which jobs in this unit at the time
the violation was committed. Then they typed the commander’s name into an eighth column of the
table showing his command responsibility for this violation.
CDHES had tried to save time by avoiding entry into my FoxPro database and putting the data
into the Word Perfect table. Now, they were paying the price for that decision and investing a large
amount of time because they could not use a database program to perform this next phase of the
process. The magnitude of the problem was roughly this: They were able to enter about 15 cases
per hour. For the 7,000 cases we had identified as within the mandate of the various commissions
who wanted the results, this amounted to about 470 hours; the estimated total effort amounted to
almost ten person weeks with six 10 hour days.
At this rate, we would not complete the Table 1. Violation types and codes.
analysis in time to present the results to the
Comisión Ad-Hoc. I realized that the Word Perfect Arbitrary execution EA
tables could be parsed and wrote a program that
Forcible disappearance DF
read an ASCII-versions of the Word Perfect
document. The program then broke the data down Torture Tt
into fields and tables. This was not a simple
Massacre Mc
process because there could be any number of
values in each cell of the table, and the victim Illegal detention DI
values had to be matched to changing date,
Sexual violation VS
violation, and perpetrator values by counting lines
within each cell. This parsing program created as its Threatening Az
output a database whose structure included three
related tables (case, victim, and perpetrator). The Persecution Ps
victim table ni cluded a field for each of the 15 Allanamiento Am
violation types we coded, and the value in each
field indicated whether or not the victim suffered Destruction or theft of property Db
that violation. Table 1 shows the 15 violation types Displacement of population Dp
and their codes.
With this structure each victim can suffer each Disappeared Dd
violation type only once in the context of each Stabbing or wounding Hd
“incident,” or time by place combination; with
repeated incidents within a case, other violations Robbery Ro
against the same victim could be repeated. Note Other violations Ot
that this does not mean that each victim suffered
only one violation in each case. Rather, for exa mple,
the victim could only be recorded as having suffered detention and torture in a given incident,
rather than detention, torture, torture, and torture if there were three torture types employed.
This limitation is not realistic and may distort the data. 4 However, it is much less severe than
other distortions due to simplification, such as one victim-one violation, as discussed in Ball (1996).
Quick checks of the testimonies showed that it was rare for witnesses to report more than a single
instance of the same violation against the same victim (e.g., multiple illegal detention incidents).
Appendix 3 shows the summary statistics drawn down from this database.
Now we faced the need for standardization of the non-standard spellings and other references
to perpetrators. To resolve this problem, I made a list of all the non-standard perpetrator names
from the original data and matched all names (by a combination of computer and manual methods)
to a set of standard codes. I created tables that translated between all the possible non-standard
spellings of the perpetrator names (e.g., “National Guard,” “NG,” “Guard,” “Nat.Gua.” and so forth)
to a desired standard code (e.g., in this case, “NG”). With standard perpetrator codes applied to
each violation, I could use the dates (which were also non-standard and had to be extensively ed-
ited) and the codes to match to the perpetrators’ career histories.
4
Many other systems suffer from this oversimplification, notably that of the South African Truth and Rec-
onciliation Commission (TRC), although the TRC data processors used narrative data recorded by the inter-
viewers to recover from the error. See Chapter 4 for details
17
Chapter One: The Salvadoran Human Rights Commission
The results of the parsing were 7,150 cases, including 9,346 corporate perpetrators involved in
11,940 incidents. More than 17,000 victims who suffered 29,000 violations were documented by
these data.
Appendix 4 shows the results of the matching, titled “Responsible Military Individual.”5 It is
unfortunate that that the full set of testimonies was not fully captured in the format shown in Ap-
pendix 1 as a result of resource limitations. However, the political impact of the Indices of Individ-
ual Accountability and the more limited system was great. The overall lesson is that if the analytic
and political objectives are clear, the systems designer should build a system that is just adequate
for those objectives. More complexity can cause many problems while not adding much value from
the additional capability.
Lessons Learned
Conversion of Editing is never done; users Use a two-way table to Table must be set up and
non-standard are always working on data. translate changes from the used at the initiation of
input to stan- If you change the original original data to a cleaned work on data, although it
dard codes data, and users come up output. Do not make changes will be modified constantly
with a new version, all of the to original data. Learn how a) throughout the project.
changes must be redone to parse raw text files into Establishing the rule that
from start. structured data, and b) to all changes to source data
standardize uncontrolled come from users and
entry into controlled struc- automated processing
tures must be robust enough to
deal with uncontrolled
entries.
Functionality Complex methods and pro- Simpler is better; less is often Self-discipline, planning.
cedures difficult to execute. more.
Replacement Manual procedures can fail Be prepared for late-term Flexibility in response.
of manual late in project. rush projects to automate High level of skill required
methods by manual procedures. of system and program
automated designers.
methods.
5
The strategic aspects of this project are described in more detail in (Ball, 1996).
18
Appendix 1
19
Chapter One: The Salvadoran Human Rights Commission
6
CIDH is the acronym for the Interamerican Commission for Human Rights (Comision Interamericana para Derechos Hu-
manos) of the Organization of American States.
20
Patrick Ball
Appendix 2
21
Chapter One: The Salvadoran Human Rights Commission
Appendix 3
Human Rights Commission of El Salvador (CDHES)
Summary of Presented Documents, by Type of Violation and Year of Event7
Year EA DF Tt Mc DI VS Az Ps Am Db Dp Dd Hd Ro Ot Vt Pb In Cs
1973 3 1 5 1 5 0 3 3 2 0 0 0 0 0 2 5 0 1 1
1974 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1
1975 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 2 0 2 1
1977 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 2 0 2 2
1978 0 0 2 0 4 0 0 0 0 0 0 0 0 0 1 4 0 2 2
1979 20 13 13 1 21 0 2 1 1 1 0 1 1 1 13 29 0 14 13
1980 496 262 238 12 494 34 44 33 72 22 7 36 295 24 95 1237 15 388 370
1981 1610 327 328 18 692 23 87 21 173 77 5 50 10 58 81 2221 10 481 464
1982 419 471 297 9 1000 31 54 13 260 56 10 105 16 30 177 1488 18 722 681
1983 234 172 113 6 467 7 16 6 46 26 23 82 10 25 26 626 1 353 346
1984 96 154 188 15 566 9 31 10 76 14 4 115 13 9 80 835 2 557 541
1985 60 90 159 1 863 5 63 15 98 28 7 44 32 36 86 1012 13 668 650
1986 97 45 188 2 514 3 87 86 131 71 56 38 64 32 20 724 15 367 349
1987 73 55 204 3 410 12 165 63 90 41 12 15 43 20 96 558 10 293 260
1988 91 68 351 3 834 9 273 137 123 53 8 42 63 66 114 1203 44 611 500
1989 115 119 1003 3 1753 19 539 147 330 134 40 39 79 132 233 2209 45 1012 924
1990 86 90 378 2 770 15 320 122 178 55 15 40 66 55 103 1180 35 678 622
1991 46 24 340 0 959 8 571 135 148 159 25 36 98 105 257 1446 87 693 597
1992 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1
Unknown 11 4 16 0 27 0 19 13 5 2 5 0 4 5 9 55 4 40 34
Total 3460 1896 3825 76 9383 175 2275 806 1733 739 217 644 794 598 1394 14838 299 6886 6359
7
For meaning of violation type codes, see Table 1. The other codes are as follows: Vt, total number of victims; Pb, collective
victims; In, total number of events; Cs, total number of cases. No data are given for 1976 since none was available.
22
Patrick Ball
Appendix 4
Human Rights Commission of El Salvador (CDHES)
Individuals with Alleged Command Responsibility, Typical entries
Document numbers:
487/82
EA DF Tt Mc DI VS Az Ps Am Db Dp Dd Hd Ro Ot Cs Vt Pb
2 0 2 0 2 0 0 0 0 0 0 0 0 0 0 1 2 0
===============================================================
PINEDA VILLALTA, Humberto
1981 National Police, Personnel Chief
Document numbers:
100/81, 103/81, 105/81, 1079/81, ..., 76/81, 90/81, 97/81
EA DF Tt Mc DI VS Az Ps Am Db Dp Dd Hd Ro Ot Cs Vt Pb
230 24 35 1 73 0 2 0 35 0 0 0 0 0 28 39 321 1
1978-1981 Navy, Commander
Document numbers:
325/80, 417/80, 264.1/1984, 271.a/1983, 67/85, 82/85
EA DF Tt Mc DI VS Az Ps Am Db Dp Dd Hd Ro Ot Cs Vt Pb
0 2 1 0 8 0 1 0 2 0 0 0 0 0 1 6 8 0
1980-1984 Navy, Unknown
Document numbers:
117/82, 325/80, 417/80, 535/82
EA DF Tt Mc DI VS Az Ps Am Db Dp Dd Hd Ro Ot Cs Vt Pb
0 3 0 0 6 0 1 0 2 0 0 0 0 0 0 4 6 0
total: 230 29 36 1 87 0 4 0 39 0 0 0 0 0 29 49 335 1
23
Chapter One: The Salvadoran Human Rights Commission
References
Ball, Patrick. 1996. Who Did What to Whom? Planning and implementing a Large Scale Human Rights Data
Project. Washington, DC: American Association for the Advancement of Science.
Ball, Patrick, Ricardo Cifuentes, Judith Dueck, Romilly Gregory, Daniel Salcedo, and Carlos Saldarriaga. 1994. A
Definition of Database Design Standards for Human Rights Agencies. Washington, DC: American Asso-
ciation for the Advancement of Science and Human Rights Information and Documentation Systems Inter-
national.
24
Patrick Ball
25
Chapter 2
The Haitian National Commission for Truth and Justice:
Collecting Information, Data Processing, Database Representation,
and Generating Analytical Reports
1
Drs. Patrick Ball and Daniel Salcedo comprised the AAAS team.
2
A version of the report that does not include the appendices describing the work of the AAAS team is avail-
able at www.haiti.org/truth/table.htm
3
Roussiere, D. and Danroc, G., “Soif de justice en Haïti,” Le Monde Diplomatique, May 1998, pp. 22-3. The
original text follows:
Ce rapport, à la grande déception de tous, est étrangement resté, durant de très longs mois, caché dans les
tiroirs du ministre de la justice, M. Pierre-Max Antoine. Après de nombreuses protestations, celui-ci ne
l'a publié qu'au compte-gouttes. La population et les nombreuses victimes attendent toujours sa diffusion
en créole. La majorité des recommandations finales n'ont pas été mises en oeuvre. D'anciens bourreaux
ont occupé des fonctions dans la nouvelle police nationale ou encore comme gardiens de prison : l'un
d'eux était même dans le corps de sécurité du palais national alors que, pourtant, son nom figurait dans
l'annexe 4 du rapport final CNVJ (page 1-b code P 0402). Pourtant une Commission vérité, sans compé-
tence pénale, ne peut être efficace qu'en informant largement la société civile et en transformant véri-
tablement le système judiciaire ainsi que le fonctionnement des administrations. Rien n'y a fait, la paraly-
sie, l'inertie et le laxisme demeurent.
27
Chapter Two: The Haitian National Commission for Truth and Justice
dix 4 of the final CNVJ report (Page 1-b, code P 0402). However, a truth commis-
sion without the ability to punish can only be effective if it can fully inform soci-
ety at large, and transform the judicial system and the functioning of administra-
tive bodies. Nothing has been done; paralysis, inertia, and inactivity reign.
The failure to publicize the report was the responsibility of the administration of President
René Préval, not of the commissioners.
Interviews, Data Processing, and the Database
Interviews
The sequence of data collection in the CNVJ interviews was as follows. A person, the dénun-
4
ciateur comes to the interview team to give information about an abuse. The violent events being
reported may have happened to the dénunciateur, and no one else. Or, the dénunciateur may be
reporting abuses that happened to other people. Thus, the dénunciateur may or may not be a vic-
tim. Furthermore, there may be other victims. Thus a single interview may yield information about
one, two, three, or any number of victims.
Each victim may have suffered one or many violations. The violations may have happened at
one or many points in time. That is, a victim may have been detained and tortured on one date in
one place, but raped and murdered on a subsequent date in a different place.
Furthermore, one or many identifiable perpetrators may have committed each violation. That is,
“Antoine” and “Pierre” were responsible for the hypothetical detention and torture in the previous
paragraph, but “Pierre” and “Michel” committed the rape and murder. Complex relationships among
these various entities existed, and were captured in the interviews.
The teams conducted more than 7,000 interviews. Interview teams consisted of one Haitian
and one international team member. This pairing was largely to satisfy an explicit mandate of the
CNVJ that interview teams comprise both Haitians and internationals. Few internationals – even
Francophones – speak Haitian Créole so that it was an absolute necessity that each interview team
includes a Haitian.
Data Processing
A serious handicap to the work of the CNVJ was the late start of data processing, which did
not begin until the teams had completed interviewing at the end of August 1995. This was in large
part due to a significant leadership vacuum that took some months to fill. This lapse occurred be-
cause of serious political differences and the lack of staff experience with research of this kind. As
a consequence, the directors of the interviewing teams resigned in late August 1995.
Analysts were chosen from among the interview teams, and they applied the codes following
the methods that most projects use. Again, like most projects, the definitions of key concepts
changed and new ideas were added to the analysis after most of the interviews had been coded.
The analysts re-coded the entire set of interviews at least three times, although during the second
and third reads on each interview they were re-coding only for specific themes.
All data entry was done by means of FoxPro “browse windows,” with field and record level
validation. For example, all the codes (for types of violation, geography, victim or perpetrator refer-
ences) were checked as the users typed them to at least assure that the codes were valid. The six
workstations were all freestanding. There was no sharing or serving of files across the network
(although the machines did share a printer).
4
The standard terminology for a person complaining about a human rights violation in the Central American
and Caribbean regions is denunciador (Spanish) and dénunciateur (French and Créole). The word is more
closely related to the English “complain or “report” than to “denounce.”
28
Patrick Ball and Herbert F. Spirer
All entered data were aggregated into a single database on a central machine. We achieved
this result without duplication or loss of entered data by assigning a unique block of key values to
in each
interview,
one or many
perpetrators
each violation (execution, each victim may suffer
may be detention, rape, etc) one or many violations
identified happened at a certain time
and place.
each violation
may have been
each perpetrator committed by one
With each may have been or many
perpetrator, we may accused of one or perpetrators
know his name, many violations
organization, or
other
characteristics.
each workstation. Before aggregation, the relational integrity of the data was checked by tracing
each foreign key to the primary table from which it originated.5 This was necessary because the
database software used (FoxPro for Macintosh) did not do internal integrity checking. The data
were merged into the common database on which the analysis was run. Our major programming
task was to carry out the preceding functions essential to creating a common database with as-
sured data integrity.
As part of the aggregation process we standardized several codes that were not originally con-
trolled in the data entry, such as political affiliation of the victims. We achieved this result by cre-
ating a unique list of all the phrases and abbreviations found in the free text field. An analyst re-
viewed these selected phrases and abbreviations and assigned a code to each of the text fields. We
then merged the code and text combination back into the original data, thereby assigning a code to
each of the previously uncontrolled fields.
But even with the recoding, many analytic categories remained poorly reported. For example,
many deponents were reluctant to report their political affiliation, and so the only analysis that
could be done on this field used only a small fraction of the data and was therefore unstable.
Database
We designed the CNVJ's database to the standards established in (Ball, et. al., 1994). In accor-
dance with those standards, we followed two fundamental rules:
5
When a primary key from a table is incorporated into another table to form a relationship between the ta-
bles, it is called a “foreign” key.
29
Chapter Two: The Haitian National Commission for Truth and Justice
1. The database must not introduce additional ambiguity into the data. That is, to the ex-
tent that the original sources permit, the database must be absolutely precise regarding
who committed which violations against whom.
2. To represent a wide range of abuses, interventions, people, organizations, and to
unambiguously represent the complex relations among these entities the database must be
as extensive and amenable to change as is consistent with available resources.
As we have mentioned, any of the entities in a human rights violation may have complex rela-
tionships with none, some, or all the other entities; it is important that the data model enables an
appropriate representation in the database. The diagram below shows in schematic form the data
representation model that we used for the CNVJ.
The database designed to represent the processed data in accordance with this model permit-
ted the CNVJ database to represent the complex stories the dénunciateur reported to the CNVJ
interview teams.
Analyses
The teams conducted 5,453 interviews, in which we heard about 18,629 violations that were
committed against 8,667 victims. During the course of the project, we carried out many statistical
analyses using the relational structure described above as a basis for analyzing counts of viola-
tions by type, time, and geographic location.
A discussion of some of the statistical analyses follows. Analysis at the CNVJ met with a set
of problems similar to those dealt with at the TRC and CEH.6 Continuous efforts to establish and
maintain data quality at all stages kept the database itself in a state of change until hours before the
results had to be reproduced for distribution. The challenge for the system designers – and to
some extent, those that implemented their design – was to define the entire analytic process in
ways that supported dynamic updating.
As we have discussed at greater
length elsewhere, our experience has Table 1. CNVJ Human Rights Violations Categories and Types
convinced us that every human rights Category (Right) Violation types
project is unique and has different Life execution
attributes from the others.7 Some disappearance
analyses are general and likely to be Liberty and Integrity torture
common to most human rights projects detention
(such as victim, perpetrator, witness rape
attributes, and violations by time and Property theft
place), but many of the analytical issues attacks on goods
are particular to a given project. attacks on property
In the following sections, we first
describe the nature of many of the analyses performed that are of the general category, and then
discuss several of the analyses that were particular to this project.
A subset of analyses reported on the violation of rape. These analyses included a histogram
of victims’ ages, affiliation, the number of total female victims by department and month and corre-
6
At this point, we recommend that the concerned reader read or reread relevant sections of Chapters 3, 4, and
7
See the introductory chapter in this volume.
8
In French, denoted appartenance.
30
Patrick Ball and Herbert F. Spirer
sponding proportion raped, and the proportion of rape victims assaulted in a named place such as
a barracks or a military post. 9 (Note that the affiliation data are weak for the reasons already stated.)
To understand the nature of the problems with affiliation and to extract such limited informa-
tion as we could from the incomplete and inconsistent data, we determined the proportion of vic-
tims with some affiliation. For this purpose, we coded a victim as having affiliation if there was any
text in the relevant descriptive field. In most of the analyses for which there is sufficient data for
reliable estimates, the percent of victims with an affiliation was from 50-70%. Other analyses include
victims driven into internal refugee status, 10 finding their proportions by department, age and sex
of the victim.
Perpetrators
We analyzed perpetrators with monthly and annual time series by CNVJ human rights violation
category (Life, Integrity and Liberty, and Property) and by the affiliation of the perpetrator. Table 2
shows the five categories of perpetrator that we tracked (FadH, FRAPH, Police rural, Attaché, and
Other). Violations without perpetrators identified by category were not included even if there were
perpetrators identified for other violations against the same victim at the same time and place. We
also did these analyses by department.
FRAPH Front for the Advancement and Progress of Paramilitary enforcers for the
Haiti de facto regime. The acronym
is a pun on "blow" or "beating"
Police rural Militia
Attaché in urban areas, a semi-legal deputy to the
police
in rural areas; to the militia
Other Other
There are special considerations in analyzing data about victims and perpetrators in combina-
tion. One or many violations could have happened to each victim. Thus, sums of the numbers of
violations are usually significantly different from sums of numbers of victims. This disparity is logi-
cal, since in a given interview, a violation may have been committed on the same victim more than
once at several dates. Similarly, none, one, two, or many identified perpetrators may have commit-
ted each violation. Consequently, no count of perpetrators from one or more given organizations
can be summed with counts for other perpetrating organizations unless the perpetrators are com-
bined in categories, as we describe below.
Our final analysis in this section looked at combinations of perpetrators. Since any violation
may have been committed by one or more perpetrators who were not identified, or one, two, or
many identified perpetrators, we had to combine categories of perpetrators to analyze how actual
violations were committed. For example, it is clear that the Haitian army alone committed the bulk of
violations in which a perpetrator is identified. Also, it is much easier to identify “two soldiers” than
to identify random civilians. Substantial numbers of violations were committed by the Attachés
working in conjunction with the FRAPH, by the militia working with the Haitian army, and by the
militia. However, the single largest category is “no identified perpetrator.”
9
In French, caserne and avant-poste, respectively;
10
For convenience, we refer to these internal refugees by the French term, marronage, which literally means
“runaway.”
31
Chapter Two: The Haitian National Commission for Truth and Justice
11
Dr. Mercedes Doretti and her international team of forensic experts were a second component of AAAS
assistance to the CNVJ.
32
Patrick Ball and Herbert F. Spirer
From the scatterplot, a viewer can readily ascertain that there is a strong tendency for a rise in
the EAAF series to be coincident with a rise in the CNVJ series, and similarly, for coincident be-
havior of the declines. In addition, there is an obvious tendency for the magnitude of the differ-
ences to correlate: the greater the rise in one series, the greater the rise in the other.
For the analytical measurement, we calculated the Pearson Product Moment Correlation Coef-
ficient (r) these series to be r=.865 (r2=.748), which is quite high. If these data were from random
samples from two series in which there was no correlation among first differences, the probability
of such a high correlation occurring by chance is vanishingly small (z = 10.4, p < .000000).12
This was strong evidence of a similarity in the two series, confirming the representative valid-
ity of the CNVJ team data’s representation of the trends of violations.
12
In this case, n=38. For n>30, when the population correlation coefficient is zero, the standard error is
sqrt((1-r2)/(n-2)), and the sampling distribution is normal. In this case, (1-r2) = .252, (n-2)=36, and the stan-
dard error is .084; z=.865/.084=10.4.
13
Our analyses of trends in time are independent of differences in the absolute magnitudes of the series being
considered. Accordingly, issues of regional coverage in the sampling do not affect the trend analyses.
33
Chapter Two: The Haitian National Commission for Truth and Justice
the same times, and to rise and fall together, then we have a strong suggestion that people who
perpetrate violence are responding to the same influences. If, however, some kinds of violence fol-
low very different patterns in time, then we would have to conclude that influences to commit vio-
lence are different for different kinds of violence or that the data that we have do not support the
contention that they shared a common influence.
As we did previously (in Validity of the Survey with Respect to Time), we compared the EAAF
University Hospital cadaver data and the CNVJ summary execution data. We obtained the correla-
tion of the first differences of the monthly counts of violations for each of ten types of violation.
Table 3 shows the violation codes and Table 4 shows the correlation coefficients for these viola-
tions.
Table 3: Violation codes.
VBN Attacks on goods VMP Threats and persecution
VTT Torture
Table 4. Correlation matrix of monthly first differences between different types of human
rights violations at the national level in Haiti, September 1991 to October 1994.
Type of violation VDS VDT VES VEX VLB VMP VSX VTE VTT
VBN .66 .88 .88 .81 .88 .93 .74 .93 .94
VTE .96
We performed hypothesis tests on each of the coefficients to determine which of the coeffi-
cients were significantly different from zero at the α = .01 level. This is a more stringent requirement
than the usual α = .05 level. Of the 45 correlation coefficients, 45 were statistically significant at the
α = .01 level.
Because we made multiple hypothesis tests, we could not assume that the significance level is
truly α = .01. To determine the possibility that this many correlation coefficients could have been
found significant by chance, we determined the probability that out of the 45 “trials,” all 45 of these
hypothesis tests were truly significant.
34
Patrick Ball and Herbert F. Spirer
Since the probability of a hypothesis test turning out to be significant by chance is .01 (by our
choice of α = .01), we modeled this process with the binomial distribution for 45 trials and 45 “suc-
cesses” each with a probability of success of p=.01. The result is p<.000000; it is essentially impos-
sible to have this many significant correlation coefficients by chance. The violation counts are
measuring an underlying, common phenomenon. Additional support for this contention comes
from the magnitude of the correlation coefficients. The coefficients range from .61 to .98. The mean
is .85, and the median, .87. These coefficients are not only statistically significant (i.e., not zero),
they are practically significant; these are strong correlations. Additional support from this state-
ment is found by looking at the r2 values in Table 5.
2
Table 5: Matrix of square of the correlation coefficients (r ) for the monthly first differences
between different types of human rights violations at the national level in Haiti, September
1991 to October 1994.
Type of violation VDS VDT VES VEX VLB VMP VSX VTE VTT
VBN .44 .77 .77 .66 .77 .86 .55 .86 .88
VTE .92
35
Chapter Two: The Haitian National Commission for Truth and Justice
Table 6. Number of significant correlation coefficients for the monthly first differences
between different types of human rights violations by department in Haiti, September 1991 to
October 1994.
Type of violation Number of significant Probability of this many or more
correlation coefficients out of 45 occurring by chance
Torture 29 .0000000000
Extortion 14 .0000000000
For all of these kinds of violations, it would be extremely unlikely to find so many significant
correlation coefficients by chance (e.g., for rape, seven significant correlation coefficients in 45
pairs would occur by chance on average 16 times in ten billion trials). We find that the departments
are much more consistent for the violations for which there are many more instances of this kind
(torture, arbitrary detention, attacks on goods). More than half of all possible pairs of departments
has significant, nonzero correlation coefficients between their monthly first differences of instances
of torture and arbitrary detention.
Although the findings are weaker, it is also true that instances of rape and sexual abuse, extor-
tion, and arbitrary execution are consistent across departments. Across all the kinds of violations
examined here, the number of significant correlation coefficients found is sufficient to find that for
these kinds of violations, the violence was committed consistently in time across different depart-
ments.
A concern of the project was the extent to which data could support the hypothesis that there
was national influence on local perpetrators.14 We took the approach of seeing if the time series for
the several departmental series for relevant variables were in fact, more correlated in their trends
than would be likely by chance. Similarity of movements of a variable in all departments would be
taken as evidence of a national influence. For example, if in most departments rape rose during the
same several months and fell during others, this would be an indication of some kind of national
control.15
14
It was beyond the scope of the AAAS scientific component of the CNVJ activities to attempt to determine
that nature of that influence.
15
While national motivation or stimulation of actions is a likely cause, it is also possible for apparently disor-
ganized local perpetrators to communicate about their activities or be the recipients of information coming
from other departments.
36
Patrick Ball and Herbert F. Spirer
Lessons Learned
Commission level Conflicts over goals related Project personnel must Project personnel below
decisions affecting to political issues can affect develop ways to effec- the leadership level may
the effectiveness the work of the database tively work with commis- have no voice at the com-
and efficiency of area by increasing mid- sioners to communicate mission level. Some “politi-
information system course changes, and de- the consequences of cal” choices, despite their
work laying imperative operational commission-level deci- effects on the information
decisions. sions. system, may be consistent
with the commission’s
mission. Lack of under-
standing of information
systems and research at
the commission level. The
commission has the final
word.
Decision-making at Leadership counts. Lack of Project personnel must Developing the communi-
the project admini- leadership means delays inform commission man- cation skills needed to
stration level affecting effectiveness and agement of the dangers of speak to commission man-
efficiency of project delays. agement in terms and
ways that enable them to
understand the conse-
quences of leadership
vacuums.
Release of scientific Suppression or non-release Negotiate for controlled Negotiation of these terms
findings of scientific findings. release of scientific find- and conditions should be
ings, based on meeting done at the start of the
security conditions, limiting project.
output, access to data,
etc.
Variables in data Variables that seem rele- Pilot test proposed ques- Unwillingness of some
collection vant at start may not draw tionnaires for field inter- personnel to invest the
responses, or such re- viewing, and put through a effort up front to save
sponses as are obtained pilot data processing and effort in the future, when
are not useful analysis cycle. there is pressure for im-
mediate results.
37
Chapter Two: The Haitian National Commission for Truth and Justice
Missing data for Despite pilot testing, field Detect the problem quickly Early detection means
certain variables interviews may develop and determine corrective concurrent processing of
high rates of non-response action interviews
on certain variables
38
Patrick Ball and Herbert F. Spirer
References
Ball, Patrick. 1996. Who Did What to Whom? Planning and implementing a Large Scale Human
Rights Data Project. Washington, DC: American Association for the Advancement of Sci-
ence.
Ball, Patrick, Ricardo Cifuentes, Judith Dueck, Romilly Gregory, Daniel Salcedo, and Carlos Saldar-
riaga. 1994. A Definition of Database Design Standards for Human Rights Agencies. Wash-
ington, DC: American Association for the Advancement of Science and Human Rights Infor-
mation and Documentation Systems International.
Dorretti, Mercedes, and Ignacio Cano, n.d. “Violations of the Right to Life in Haiti: 1985 to 1995.”
Paper available from authors.
39
Chapter Two: The Haitian National Commission for Truth and Justice
40
Chapter 3
The South African Truth and Reconciliation Commission: Data
Processing
Themba Kubheka
To gather information about the gross violations of human rights suffered by South Africans,
a Gross Human Rights Violation (GHRV) statement was developed and referred to as the protocol.
The GHRV statement gave victims an opportunity to relate the violations they suffered, and in so
doing, provided the information for data processing. As the commission went about its work, the
GHRV statement went through several conceptual stages as ordered below:
Tell your story. It started as a narrative statement, but developed into a questionnaire to
make it easy for victims to understand.
Give the deponents the emotional space to tell the story in their own way. This meant pre-
senting the events and highlighting the issues as perceived by the statement-giver. How-
ever, some regional officers believed that the TRC had to serve the deponent’s emotional
needs.
For many people the act of giving a statement was a mini-hearing. The GHRV statement
fell into two main groups of deponents’ statements: those made by victims themselves
and those made on behalf of victims.
Information also came to the TRC by letter. Initially, letters were screened and the letters which
were accepted were those that provided narratives that were within the mandate of the commission.
Later in the course of the work, a Designated Statement Program helped in-house statement takers
reach out to thousands of South Africans who suffered gross human rights abuses. This program
was administered by non-governmental organizations.
In the next section, I give a chronological history of the Data Processing Unit of the TRC, and
a summary of its functions and work practices as they progressively developed during the project.
41
Chapter Three: The South African Truth and Reconciliation Commission
was derived from the "who did what to whom?" model, based on the experience with databases for
other truth commissions.
The database consultant1 applied a “Who did what to whom?” model. In this model, which is
based on his experience with other truth commissions, the first principle is to record “who did what
to whom.” For example, in “John hit Jane on the head,” the victim and perpetrator are linked
through the recorded act of violence. “When and where” place the act in the context and are re-
corded through the narrative that links acts into a coherent whole. Recorded acts are specific and
are the building blocks of the system. For example, a number of acts, e.g., blow to the head, electric
shock to the genitals, make up an incident of torture. An event comprises several incidents, such
as “The arrest, detention and torture of Mr. Masinga.”
The rationale for detailed recording of each act is the complexity of incidents of human rights
violations. During a single event such as the Boipatong massacre, there can be many linked perpe-
trators, victims and acts separated from each other at various times and places. To make sense of
this massive amount of information, it is important to break down the event into its component
parts in the greatest possible detail.
This conceptual model of a human rights violation protocol is the basis for the training of
statement takers and their subsequent work with deponents.
Chronology of Events
We then initiated the process of collecting supporting information. This process involved con-
tacting a wide range of organizations and institutions to assemble comprehensive lists of neces-
sary and potential data sources. In this process, additional data processors and data entering per-
sonnel were employed by the TRC and were involved in the task of information collection. Because
there was a large amount of information that was not available in machine-readable form, it was
decided that the data processing unit would compile a range of resources. These could be stored in
hard copy as part of a resource pack or included in the computerized information on the database.
At this time, we started collating some of the information collected, such as cross-checking various
lists of trade unions.
We organized and conducted a one-day workshop to train new data processors and data en-
tering personnel. Training materials included an overview of the legislation governing the work of
the TRC and the “Who did what to whom?” model. In this workshop, we brainstormed acts of vio-
lence, adding additional acts of violence to the list already developed and discussing the hierarchi-
cal organization of the acts, in particular how this structure would relate to the TRC’s legislative
categories of human rights violations. The consensus was that some of the enlarged categories
were too broad to be analytically useful and that the hierarchies should be based on acts of vio-
lence; for example, the category of asphyxiation would include tear-gassing. How to link acts of
violation to the legislative categories was left open. Once the draft list of acts of violence had been
1
Patrick Ball of the American Association for the Advancement of Science acted as methodological advisor to
the TRC and led the database development group. The "who did what to whom?" model is outlined in (Ball
et. al., 1994) and (Ball, 1996).
42
Themba Kubheka
completed, a list of synonyms for the acts was compiled, as well as a non-hierarchical, alphabetical
list, as shown in Appendix 4 (Acts of Violence).
As the discussion of acts of violence continued, researcher Lydia Levin and systems analyst
Gerald O’Sullivan worked with the database software vendor, Oracle, to develop a database model.
Data processors brainstormed, proposed and reviewed possible questions to which the database
would supply answers. For almost three months the data processors and data entering personnel
met daily to review every possible act of violence one person could inflict on another. Data proces-
sors provided ongoing and ad-hoc assistance to the research department, particularly in compiling
a legislative chronology. The results are tabulated in Appendix 4 (Acts of Violence). Every act was
written on the board and debated in detail.
Under the four categories, Killing, Abduction, Torture and Severe Ill-treatment, described in
the Act, we ended up with about 200 types of violations (Appendix 1, Initial Action Types and
Codes) which were later reduced to 90 (Appendix 2, Final Set of HRV Categories). The TRC added
the two categories, Attempted Killing and Associated Violations.
Using the initial codes of Appendix 1 (Initial Action Types and Codes) we could only detect
the act of violation from the outcomes, such as: did the victim die, or become injured or miscarry.
Hence, we would code one of the following outcomes: death, injury, damages to property, preg-
nancy, disappearance, abduction and forced removal as the violation.
The Associated Violations category, which is not a gross violation of human rights, is impor-
tant for understanding the context of the act. There are also two more categories for unclassified
cases: Other and Unknown violations. Each of these categories has several sub-headings which
explain how the violations took place (a person can be killed by different methods, so we need to
identify how they were killed). By breaking the categories into sub-headings, we can then do
meaningful counting for the final report.
In September of 1996, the three head data processors from Johannesburg, Durban and East
London met their counterparts and the head researchers in Cape Town. The CEO of the TRC and
the TRC’s methodological advisor decided that we should reduce the 200 acts of violation to about
50. After three days we could not reach consensus and returned to our regional office to consult
with our respective data processors. Following that consultation, consensus was mandated, and
achieved. At our second workshop, we were told to produce the final product and we did.
The data processors who were by this time using the initial codes of Appendix 1 (Initial Action
Types and Codes), put forward a number of observations, critical comments and objections, ex-
pressing their concerns about the categories of Appendix 2 (Final Set of HRV Categories). They felt
a detailed violation was more meaningful in describing an act. For example, beating a victim with a
gun was different from beating the victim on the soles of feet or whipping with a Sjambok (initial
action codes in Appendix 1). In the new codes of Appendix 2 (Final Set of HRV Categories) these
different types of assaults all were subsumed under BEATING.
Of course, with the new codes, the data processors’ task was easier and we could process
more statements than before and the new codes of Appendix 2 (Final Set of HRV Categories) were
retained. To synchronize the acts already captured with the new ones, the system analyst Gerald
O’Sullivan created an Excel spreadsheet for each regional office and instructed head data proces-
sors to change the old acts to the new ones, line by line. The Johannesburg office had over 10,000
lines of codes of violations to be modified, but the job was done and the corrected spreadsheets
loaded into the database.
43
Chapter Three: The South African Truth and Reconciliation Commission
are needed to handle each information source. Dealing with these input issues to produce useful,
integrated results is data processing.
But how outputs are defined -- what will be done with the data once it is processed -- also de-
termines the nature of data processing. The TRC uses information from the data base for many
purposes: to conduct research, to facilitate investigation, to record the testimony of victims of hu-
man rights violations, to record the evidence of amnesty applicants and to formulate a reparation
and rehabilitation policy. Not all of the output of the system will be tangible. The understanding of
dynamics, conflicts, and so forth, that investigators, and possibly statement takers develop, is in-
formation that will not necessarily be easily put into a protocol. To satisfy the objectives of the
TRC, data processing must make it possible for the database to serve all these needs.
Not only will data come from different sources but also it will be gathered at different regional
and national levels. It is crucial that all these information-gathering processes are not carried out in
parallel to each other but are part of an integrated whole. At this level of integration, data process-
ing is the key element. It is the point at which the incoming data and information is managed and
organized, and where the analytical process begins. Data processors make a great number of deci-
sions about how to define the information. Such decisions might include the answers to questions
such as these: “Is this truly a gross human rights violation?” “Was it part of the Boipatong massa-
cre?” “Is Colonel Swanepoel the same man already implicated in numerous other torture cases?”
Data processing is where the investigation begins. Data processors deal on a daily basis with
the full spectrum of incoming information. They should be the first to pick up on discrepancies
between the stories of amnesty applicants and their victims, and first to identify the trends of viola-
tions in particular areas or perpetrated by particular people, units, sector of society, and so forth. A
structured means of feeding this information into the research and investigation processes on an
ongoing basis is crucial or these insights will be lost.
Also, data processing is the first point of contact between the national process of amnesty
applications and the regional processes of human rights violations reporting. It is the skills of data
processors that assure that amnesty applications can be cross-checked with reports of human
rights violations. National investigations and research processes are meaningless unless they can
draw on the full range of information available from different regions.
None of these processes necessarily happen sequentially. As the TRC does its work, verifica-
tion of new information, or contradiction of information, adding to or complicating information al-
ready collected, is an ongoing process. Hearing evidence for amnesty applicants or victims of hu-
man rights violations must be recorded and linked to the original statements of witnesses or appli-
cations for amnesty. Discrepancies and additions must be identified and fed into the research, in-
vestigation and reparation processes.
In addition to its research and investigative functions, the TRC is also attempting to deliver
reparations to victims. It is crucial to this process to accurately record the individual consequences
of violations of human rights and the needs resulting from those violations. This information must
be systematically gathered and processed to generate a national policy on reparation and rehabili-
tation and to ensure full attention to the needs of every victim of a reported human rights violation
to the TRC.
? Do not analyze a bundle of statements as a group and then capture them as a group. In so
doing, you may confuse the statements.
44
Themba Kubheka
? Finish one statement entirely before moving onto another. You must code exactly what is
on the statement, even if you believe the statement is inaccurate, is full of contradictions,
etc.
? You must be very careful not to allow any biases to creep in, and further, not to allow any
of your own commentary or observations to enter the coding. What is captured is exactly
what the statement says, but in a coded form.
? Remember that you are trying to extract as many acts and victims per statement as possi-
ble. Even if you have scanty details about a particular event, code and capture what is
there. You may be able to gather more information through investigations, research, etc.
A step-by-step description of how data was captured is given in Appendix 7 (Data Processing
User Guide).
Problems
To show the nature of the work of data processing, the following is a list of some of the prob-
lems we encountered and resolved. (Note: The lessons learned in this aspect of work on the TRC
information management system have been integrated with those of the database representation,
and are found in Chapter 4.)
Data flow
? How will new information collected through hearings, statements, informally, and through
the work of the investigators, be entered and again made available for further research and
investigation?
? How will the database or processing be used to handle the problem of various types of in-
formation coming from the different sources?
? How will the information gathered be fed into the investigation and research processes?
? How to maintain a high rate of document processing in view of potentially lengthy verifi-
cations and statements that are difficult to code?
Quality control
? How to minimize coding errors?
? How to establish and enforce consistent coding practices?
? How to check for typing errors?
? How to code ambiguous information?
? What is the impact of serious errors on the quality of data?
? What are the implications of changing information and updating the database?
Overall issues
? What is role of the database in reparation and rehabilitation?
? How can the database be used as an integrative tool? For examp le, how do we link na-
tional and regional processes, amnesty, HRV and reparations?
? What will be the benefits of feeding information to investigation and research processes?
? What is the desired role of the database in supporting the objectives of the TRC?
? How will the database be used as a corroborative tool (what for and how)?
45
Chapter Three: The South African Truth and Reconciliation Commission
? What is the role of the database in tracking various processes happening in the TRC?
? How will the database assist in making HRV findings?
? If we look for trends, how will we identify which trends to look for and how will these
trends be tracked?
System design
? How to incorporate the free text field
? How to group acts into events and mega-events, and finding connections between per-
sons places, acts, and vehicles.
? Should the data processor attempt a preliminary corroboration of the file material and if so,
how?
46
Themba Kubheka
together those acts belonging to that particular victim in a chronological order. Appendix 5 shows
a completed coder’s sheet and Appendix 6 is a complete TRC statement.
47
Chapter Three: The South African Truth and Reconciliation Commission
Appendix 1
Top Level
Abduction ABD Harmful substances SBS
Subsidiary
Abduction
Forcible abduction ABD_FRDC Other ABD_OTHR
Unknown ABD_UNKN
Assault
Hitting / Kicking / Slapping / Punching ASS_HKSP Α Kaffir Klap ASS_KKLP
Stabbing and/or Hacking with a ASS_SHRP Banging head against wall ASS_BHDW
panga, knife, sharp object
48
Themba Kubheka
Beating with a gun e.g. rifle, butt, ASS_BGUN Arms and/or wrist twisted ASS_TWST
pistol-whipping
Bombing
Bomb BOM_BOMB Letter / parcel bombs BOM_LPBM
Burns
Chemicals BRN_CHEM Necklacing BRN_NKLC
49
Chapter Three: The South African Truth and Reconciliation Commission
Capital punishment
Judicial hanging CPP_JHAN Other CPP_OTHR
Unknown CPP_UNKN
Unknown ILL_UNKN
Deprivation
Deprivation of medical attention / DEP_MEDC Deprivation of privacy DEP_PRIV
treatment
Unknown DEP_UNKN
Other DEP_OTHR
Drowning
Total submersion in water DRW_TSBM Unknown DRW_UNKN
Electric shock
Electric shock to the genitals ELS_GNTL Unknown ELS_UNKN
Financial impropriety
Bribery FIM_BRIB Blackmail FIM_BLML
Ransom FIM_RNSM
50
Themba Kubheka
Framing
Person framed FRM_PRSN Other FRM_OTHR
Unknown FRM_UNKN
Harassment
Surveillance HRS_SRVY Telephone harassment HRS_TELE
Harmful substances
Poison SBS_POSN Unknown SBS_UNKN
Improper burial
Buried in shallow grave BRL_SHLW Anonymous burial BRL_ANON
Incarceration
Detention (if victim reports act as INC_DETN Banning INC_BANN
Arrest AND detention, only enter as
DETENTION)
Banishment INC_BNSH
51
Chapter Three: The South African Truth and Reconciliation Commission
Physical stress
Forced stationary postures PHY_FRSP Suspension - hanging victim by PHY_SUSP
arms, legs, etc.
Psychological torture
Simulated execution PSY_EXCU Victim is forced to watch and/or PSY_WTCH
listen to torture of others
Detention of significant other people PSY_DTEN Victim is forced to participate in the PSY_PART
torture of others
False and alarming information PSY_FLSE Victim shown other torture victims PSY_SHOW
Sexual abuse
Forced sexual acts SEX_FRSX Pumping water into the uterus SEX_PUMP
Rape by someone of the opposite SEX_RAPE Abuse with bodily fluids SEX_DBFL
sex
Rape by someone of the same sex SEX_RPSS Abuse using animals SEX_ANIM
52
Themba Kubheka
Shooting
Rubber bullets SHT_RUBB Unknown SHT_UNKN
Suffocation
Hanging SUF_HANG Wet towel or bag over the head SUF_WETT
Vandalism THF_VAND
Threats
Death threats THR_DETH Derogatory language and/or insults THR_INSL
53
Chapter Three: The South African Truth and Reconciliation Commission
54
Themba Kubheka
Appendix 2
Killing KILLING A killing is when a person dies, in one of the three ways: Assassination –
killing of a targeted person by a person or group who developed a secret
plan or plot to achieve this. Person is targeted because of his political
positions. Execution - capital punishment (death sentence) imposed and
carried out by a legal or authorized body such as a court of law or a
tribunal. Victim is aware of death sentence. Perpetrators are the state,
homeland governments or security structures of political movements. Killing
- all other deaths including a killing by a crowd of people.
Attempted ATT KILLING This category is the same as that for killing. In attempted killing the victim
Killing does not die but there was a clear intent to kill him/her.
Torture TORTURE Torture happens in captivity or in custody of any kind, formal or informal (for
example: prisons, police cells, detention camps, containers, private houses
or anywhere while tied up or bound to something). Torture is usually to get
information or to force the person to do something (for example to admit to a
crime or sign a statement). It includes mental or psychological torture (for
example: sleep deprivation or telling the person that their family is dead).
Severe Ill- SEVERE Severe Ill-treatment covers all forms of inflicted suffering that did NOT
treatment happen in custody (for example: injury by a car bomb or beaten up at a
rally).
Abduction ABDUCTION Abduction is when a person is forcibly and illegally taken away (for
example: kidnapping). It does NOT mean detention or arrest. It is not a gross
violation of human rights to be arrested (see Associated violations). If the
person is never found again, it is disappearance.
Associated ASSOCIATED These are not gross violations of human rights but are important for
Violation understanding the context of the violation (for example: detention,
harassment, framing and violating a corpse after death)
Other violations OTHER Violations, which are described but which, do not fit into any of the above
categories.
55
Chapter Three: The South African Truth and Reconciliation Commission
Burnt to death BURNING Victim is killed in a fire or burnt to death using petrol,
chemical, fire, scalding, and arson but does NOT
include Necklacing or Petrol bomb. The last two are
separate codes.
Killed by exposure EXPOSURE Person dies after being subjected to extremes such as
heat, cold, weather, exercise, forced labor.
Killed by multiple causes MULTIPLE The person is killed in a variety of ways (use the
appropriate definitions from other categories).
Petrol bomb PETROLBOMB Killed by a burning bottle of petrol. Petrol burning falls
in between burning and bombing, so, like Necklacing, it is
useful to code it separately. It was also called Molotov
cocktail.
Shot dead SHOOTING Person is shot and killed by live bullet, gunshot, bird
shot, buck shot, pellets, and rubber bullet.
Stabbed to death STABBING Killed with a sharp object such as a knife, panga, axe,
scissors, spear (including assegai).
56
Themba Kubheka
Suspicious suicide or accident STAGED Person dies in suspicious suicide or fatal accident.
This should only be used if it is not clear whether it was
really an accident or not, otherwise use the appropriate
category and explain in the description that there was a
cover-up. Examples: slipped on soap, jumped out of
window, fell down stairs, hanged himself, car
accident, booby trapped hand grenades or
explosives, shot himself.
Stoned to death STONING Person is killed with bricks, stones other missiles
thrown at them.
Killing involving a vehicle VEHICLE Dragged behind, thrown out, driven over, put in
boot but NOT car bomb. (See Bombing). Specify what
type of vehicle was involved (for example: car, train,
truck, van, bakkie, hippo, casspir).
Other type of killing OTHER All other methods of killing including buried alive,
strangling, tear-gas, decapitation,
disembowelment. Make sure that it is clear in the
description of the act exactly how they died.
Unknown cause of death UNKNOWN Person is dead but there is no further information
Attempted killing by beating BEATING Attempt to beat a person to death by being hit, kicked,
punched. State that part of the body was assaulted if
known. Example: feet, face, head, genitals, and
breast. If an object was used in the beating, specify the
object; e.g. Sjambok, baton, gun, rifle, stick, whip, plank,
beat against the wall.
Attempted killing by burning BURNING Attempt to kill victim in a fire or by using petrol, chemical,
fire, scalding, and arson but does NOT include
Necklacing or Petrol Bomb. The last two are separate
codes.
Attempted killing by poisoning, CHEMICALS Attempt to kill person by use of poison, drugs or
drugs or chemical household substance such as bleach or drain cleaner.
Attempted killing by drowning DROWNING Attempt to kill the person by drowning in a river,
swimming pool or even in a bucket of water.
57
Chapter Three: The South African Truth and Reconciliation Commission
Attempted killing by exposure EXPOSURE Attempt to kill person by subjecting him/her to extremes
such as heat, cold, weather, exercise, and forced
labor.
Attempted killing by multiple MULTIPLE Attempt to kill the person in a variety of ways (use the
causes appropriate definitions from other categories).
Attempted killing by Necklacing NECKLACING Attempt to kill by burning with petrol and tire. Necklacing
is coded separately from Burning because it featured
heavily in the past, so it is useful to distinguish between
burning with petrol and a tire and burning in a house, for
example.
Attempted killing by petrol bomb PETROLBOMB Attempt killing by a burning bottle of petrol. Petrol
burning falls in between burning and bombing, so, like
Necklacing, it is useful to code it separately. It was also
called Molotov cocktail.
Attempted killing by shooting SHOOTING Person is shot and injured by live bullet, gunshot, bird
shot, buck shot, pellets, rubber bullet, or possibly
shot at close range or with deliberate intent to kill but not
injured.
Attempted killing by stabbing STABBING Attempted killing with a sharp object such as a knife,
panga, axe, scissors, and spear (including
assegai).
Attempted killing by suspicious STAGED Attempt to kill a person by staging a suspicious suicide
suicide or accident or fatal accident. This should only be used if it is not
clear whether it was really an accident or not, otherwise
use the appropriate category and explain in the
description that there was a cover-up. Examples: slipped
on soap, jumped out of window, fell down stairs,
hanged himself, car accident, booby trapped hand
grenades or explosives, shot himself.
Attempted killing by stoning STONING Attempt to kill a person by throwing bricks, stones or
other missiles at them.
Attempted killing by torturing TORTURE Attempt made to kill a person by torturing to death.
Attempted killing involving a VEHICLE Dragged behind, thrown out, driven over, put in
vehicle boot but NOT car bomb. (See Bombing). Specify what
type of vehicle was involved (for example: car, train,
truck, van, bakkie, hippo, casspir).
Other type of attempted killing OTHER All other methods of attempted killing including buried
alive, strangling, tear-gas, decapitation,
disembowelment. Make sure that it is clear in the
description of the act exactly how they died.
Torture by BEATING Person is tortured by being beaten severely or for a long time (example: hit,
beating kick, and punch). State which part of the body was assaulted e.g. feet,
face, head, genitals, breast). If an object was used in the beating, specify
58
Themba Kubheka
the object (example: Sjambok, baton, gun, rifle, stick, rope, whip, and
plank, beat against the wall). Specify if victim is pregnant or miscarried.
Torture with CHEMICALS Tortured with poison, drugs or household substance such as bleach or
poison drain cleaner.
Electric shock ELECTRIC Electric shock to the body. Specify which body part was shocked (for
torture example: genitals, breasts, fingers, toes, ears, etc).
Torture by EXPOSURE Person is tortured by subjecting them to extremes such as heat, cold,
exposure to weather, exercise, labor, noise, darkness, light (including flashing
extremes lights, blinding by light), blindfolding, and confinement to small space,
smells, and immobilization.
Torture by bodily MUTILATE Torture involving injuries to the body where parts of the body are partly or
mutilation wholly cut, severed or broken. Specify body part, for example: genitals,
ears, fingernails, hair, etc. It includes amputation of the body parts,
breaking of bones, pulling out nails, hair or teeth, scalping.
Torture by POSTURE Person is tortured by forcing the body into painful positions, for example:
forced posture suspension, helicopter, tied up, handcuffed, stretching of body
parts, prolonged standing, standing on bricks, uncomfortable
position (including squatting, imaginary chair, standing on one leg, pebbles in
shoes), forced exercise, forced labor, blindfolding and gagging.
Torture by SEXUAL Person is torture by attacking them using their gender or genitals as a weak
sexual assault or point. This does NOT include electric shock, mutilation or beating (instead, use
abuse those categories and specify genitals as the body part abused). It includes:
slamming genital or breast in a drawer or other device, suspension
of weights on genitals, squeezing genitals or breasts, rape by
opposite sex, rape by same sex, gang rape, forced sexual acts (e.g.
oral sex, simulating intercourse), introduction of objects into the
vagina or rectum, sexual abuse using animals, threats of rape,
touching, nakedness, sexual comments or insults, sexual
enticement, deprivation of sanitary facilities for menstruation.
Torture by SUFFOCATE Torture by stopping someone from breathing, for example by: bag, towel,
suffocation tube over head (wet or dry), drowning (head, whole body
submerged), choke, strangle, stifle, throttle, teargas, bury alive.
Other type of OTHER All other methods of torture. Make sure that it is clear in the description of the
torture act exactly how the person was tortured. It includes use of animals
(specify animal e.g. snake, tortoise, baboon), use of vehicle.
Unknown type of UNKNOWN Person is tortured but the method is not known.
torture
59
Chapter Three: The South African Truth and Reconciliation Commission
Severely beaten BEATING Person is badly beaten, or beaten for a long period of time. They may
be hit, kicked, punched, twisted. State which part of the body
was assaulted if known. Example: feet, face, head, genitals, and
breast. If the person was beaten with an object, specify object (for
example: sjambok, baton, gun, rifle, stick, rope, whip, plank,
wall). Specify if victim is pregnant or miscarried.
Injured by burning BURNING Person is injured by burning with fire, petrol, chemical, scalding
but does NOT include necklacing or Petrol Bomb. The last two are
separate codes.
Injured by poison, CHEMICALS Person was poisoned or injured by poison, drugs or household
drugs or chemical substance such as bleach or drain cleaner.
Deprivation DEPRIVE This usually relates to treatment while incarcerated and would
include deprivation of food, medical treatment, sleep, and
clothing.
Injured in an EXPLOSION Person is injured by a bomb or explosives but NOT petrol bomb (this
explosion is coded separately). See below).Explosives include dynamite,
land-mine, limpet mine, car bomb, hand grenade, plastic
explosives, detonator, booby trap, letter bomb, parcel bomb,
special device (e.g. booby trapped Walkman)
Bodily mutilation MUTILATE Person is injured by having parts of their body mutilated or damaged.
Specify body part, for example genitals, fingernails, ears, hair ,
etc.
Sexually assaulted SEXUAL All forms of attack on a person using their gender or genitals as a
or abused weak point, for example: rape by opposite sex, rape by same
sex, gang rape, forced sexual acts (e.g. oral sex, simulating
intercourse), introduction of objects or substances into
vagina or rectum, sexual abuse using animals .
Injured in a SHOOTING Person is injured by being shot by live bullets, gunshot, birdshot,
shooting buckshot, pellets, rubber bullet. Specify body part injured, if
known.
Stabbed or hacked STABBING Injured with a sharp object such as a knife, panga, axe, scissors,
with a sharp object spear (including assegai).
60
Themba Kubheka
Injured in stoning STONING Person is injured with bricks, stones other missiles thrown at
them.
Injury involving a VEHICLE Injury caused by being dragged behind, thrown out, driven
vehicle over, put in boot of a vehicle. Specify what type of vehicle was
involved (for example: car, train, truck, van, bakkie, hippo,
casspir).
Suffocated SUFFOCATE Injury or ill treatment by stopping someone from breathing, for
example by drowning (head, whole body submerged), choke,
stifle, throttle, teargas, bury alive.
Other type of ill- OTHER All other methods of ill treatment. Make sure that it is clear in the
treatment description of the act exactly how they ill-treated.
Unknown type of UNKNOWN Person is ill-treated but the method is not known.
ill-treatment
Illegal and forcible ABDUCTION Victim is forcibly and illegally taken away (for example, kidnapping), but
abduction the person is found again, returned or released. It does NOT mean
detention or arrest. It is not a gross violation of human rights to be
arrested (see Associated Violation).
Disappearance DISAPPEAR Victim is forcibly and illegally taken away and is never seen again. It
does NOT include cases where somebody goes into exile and never
returns. It must be done by force. This DOES include people who have
disappeared but it is not clear why they have gone (instead of
abduction, they might have just run away or were shot and buried). In
this case, a finding will be made and the code will be left as it is or
changed to Killing if the person was killed or changed to be out of the
mandate of the TRC.
Violation after death CORPSE Body of victim was violated after death, for example by: improper
burial, body mutilated or burnt or blown up, funeral
restrictions, funeral disruption, anonymous burial, mass
grave .
Destruction of property DESTROY Includes violations such as arson, destruction, vandalism, theft,
forced removal and eviction.
Person disappeared and DISAPPEAR This is for unresolved disappearance (not abductions and not killings).
61
Chapter Three: The South African Truth and Reconciliation Commission
has not been seen since The person may have disappeared while intending to go into exile , or
while in exile from a liberation movement camp, or while as a
combatant in an operation within the country.
Financial impropriety FINANCIAL Person was subjected to bribery, extortion, pay-off, ransom,
blackmail and ruin of business.
Intimidate or harassment INTIMIDATE Victim is intimidated or harassed by dismissal from work, threats,
animals killed, visits, telephone calls, surveillance, boycott
enforcement, pointing of firearms (NOT is custody) and threat
of violence . It does NOT include vandalism or arson. This comes
under Destruction of Property.
Sexual harassment SEXUAL Person is sexually harassed. It includes threats of rape, touching,
nakedness, sexual comments or insults, sexual enticement,
deprivation of sanitary facilities for menstruation.
Professional misconduct PROFESS Person was subjected to professional misconduct by one of the
following: Doctors (district surgeon, private doctor) who neglect or
ignore injuries, collaborate in torture or conceal the cause of death or
injuries. Judiciary (magistrates, judges, etc.) who ignore torture
allegations, for example. Lawyers who neglect the case, ignore or
tamper with evidence, misappropriation of funds or failure to hand
over damages. Businesses who collaborate with perpetrators.
Tear-gassed TEARGAS Victim was tear-gassed but NOT while in custody (see Torture).
Theft or stealing THEFT Money or possessions were stolen from the victim.
Other type of associated OTHER All other types of associated violations, including released into
violation hostile environment, released into unknown place, left for
dead, rough ride, detention of family or loved ones. Give full
details in the description of the violation.
Unknown type of violation UNKNOWN Not clear from the statement what type of associated violation the
person suffered.
Other type of violation OTHER Other violations are specified by the victim, which do not fall into any
of the above classifications.
Unknown type of violation UNKNOWN Not clear from the statement what type of violation the person
suffered.
62
Themba Kubheka
Appendix 3
KILLING BEATING
KILLING BURNING
KILLING CHEMICALS
KILLING DROWNING
KILLING ELECTRIC
KILLING EXECUTE
KILLING EXPLOSION
KILLING EXPOSURE
KILLING NECKLACING
KILLING SHOOTING
KILLING STABBING
KILLING STAGED
KILLING STONING
KILLING TORTURE
KILLING VEHICLE
KILLING OTHER
KILLING UNKNOWN
63
Chapter Three: The South African Truth and Reconciliation Commission
Appendix 4
Acts of Violence
Specific acts of violence and their synonyms where relevant are shown in this appendix. As
described in the section, Chronology of Events, these acts are the result of three months discus-
sion and brainstorming.
Asphyxiation (Synonym-Choke)
Strangling (Synonym-Throttle)
Suffocation (death)
Bag overhead
Wet towel over head
Tear-gassing
Buried alive
Drowning
Submerge in water
Gagging
Assault (Synonym-Strike with an object)
Batoning
Beat (Synonyms-hit/batter) with a sharp object concealed in a cloth
Hacking
Sjambok
Stab (Synonym-cut/wound/gore)
Stoning
Cane
Flog (Synonym-whip/thrash/lash)
Beating with a rifle
Pistol whipping
Assault on Specific Parts of the Body
Beating on the soles of the feet
Beating pregnant women on the stomach
Clapping (Synonyms-whack/bang) on ears with both hands
Kaffir Klap (cheek)
Banging the head against a wall
Scalping (removal of hair from scalp with knife)
Removal of nails
Beating
Slapping (Synonyms-spank/thump/bump/strike/knock)
Kicking (Synonyms-boot/stomp)
Punching
Breaking (Synonyms-fracturing/crack/shattering/snapping) of bones
Assault Using vehicles
Dragged (Synonym-pull) behind a vehicle
Attached (Synonym-fastened) onto a moving vehicle
Thrown (Synonym-chuck) out of moving trains / taxis
Driven over
Rough ride
Put in boot
Abduction (Synonym-Kidnapping/ Apprehend/ Capture/Seize/Catch)
Disappearance
Bombing (Explode)
Land mine
64
Themba Kubheka
Grenade
Mortar / shell
Hand grenade
Explosive / bomb
Booby trap bombs
Letter bombs
Car bomb
Burns (Synonym-scorch)
Chemicals
Cigarettes
Boiling water
Live fire / burning sticks
Necklacing
Arson
Deliberate (Synonyms-Premeditated/Planned spreading of disease) Psychological -Torture
(Synonym/Torment/Pain/Anguish /Suffering/Agony/Tribulation/and Ill-treatment) - excludes
Threats.
Verbal abuse (Synonym-Mistreatment/Indignity/Violation/Insult/Offence/
Malign/Denounce/Defame/Misuse/Deceive/Subvert/Mishandle/Betray/
Unjust/Crime/Condemnation/Censure/Defamation)
Simulated execution
False and alarming information / disinformation
Detention of children and family members to extract information
Russian Roulette (Gun against the head with one bullet left)
Suspension (Synonyms-Hang/Dangle) from a great height/moving vehicle
Members of family forced to watch or participate in torture
Solitary confinement
Surveillance (Synonym-Watch)
Threatening acts e.g. brandishing guns
Dismissal from employment as a result of political affiliation
Harassment
Threats (Synonyms-Coercion/Intimidation/Warning)
Against the targeted person
Against a family member of the targeted person
Against a colleague or work associate of the targeted person
Against a friend of the targeted person
Against someone working on behalf of the targeted person e.g. lawyer, human rights worker
Threats against children
Verbal threats
Deprivation (Synonym-Loss)
Deprivation of medical attention, treatment
Deprivation of food and/or water
Deprivation of sleep
Deprivation of sanitary facilities
Denial of privacy
Overcrowding (Synonyms-Packed/Strafed/Crammed/Filled)
Placed in isolation
(Synonyms-Seclusion/Solitude/Isolation/Aloneness/Separation
Confinement (Synonyms-Detention/Incarceration) in a small space
Degradation (Synonym-Shame/Embarrassment/Abasement/Humiliation)
Deprivation (Synonym-Loss of personal hygiene)
Denial (Synonyms-Refusal/Reject) of toilet facilities
Nakedness
Abuse with excrement
65
Chapter Three: The South African Truth and Reconciliation Commission
Denial of privacy
Derogatory (Synonym-Disparaging/Rude) language
Destruction (Synonyms-Damage/Ruin/Vandalize/Smash/Devastate/Wreck/Raze) of property
Destruction of homes/offices/schools/buildings/vehicles/personal property/arson
Extortion (Synonyms-Blackmail/Coercion/Ransom/Bribe/Pay-off
Theft (Synonyms-Pillage/Plunder/Rob/Root)
Poisoning (Synonyms-Contaminate/Pollute/Infect)
Poisoning of food
Poisoning of clothing
Intravenous poisoning
Murder (Synonyms-Liquidation/Permanent re-
moval/Annihilation/Carnage/Manslaughter/Slay/Homicide
Assassination
Extra-judicial/illegal unlawful execution
Hanging
Electrocution
Ritual murder
Witchcraft
Use of animals
Sexual Molestation (Synonyms-Mistreatment/Violation/Abuse) and Rape
Forced performance of sexual acts other than rape
Introduction of objects into the rectum/vagina
Rape by someone of the opposite sex
Rape by someone of the same sex
Gang rape
Physical assault and touching
Body searching by members of the opposite sex
Pumping water into the uterus
Abuse with body fluids
Abuse with animals
Assault on genitals
Suspension of weights from the testicles
Imprisonment (Synonyms-Detention/Locking up/Confinement/Captivity/ Arrest/Incarceration)
Banning
Banishment
House arrest
Forced (Synonyms-Bound/Compelled/Obliged/Postures) position -
Physical Stress (Duress/Pressure/Force/Strain)
Suspension: hanging the victim by arms, legs, etc.
Forced exercise
Excessive exercise
Forced stationary posture - standing, kneeling, sitting, standing on two bricks
Forced labour
Stretching of limbs and trunk
Helicopter? - hanging the victim from the stick between knees and arms bound
tightly together
Stopping of blood flow
Forced carrying of heavy weights
Buried alive
Stress to the Senses
Loud noises or music
Screams and voices
66
Themba Kubheka
Powerful lights
Blindfolding
Exposure to extreme heat or cold
Bound or tied up
Complete immobilization
Overcrowding
Confined to small space
Bad smells
Staged accidents / suicide
Forced jumping or being thrown from heights
Car sabotage
Use of drugs
to effect psychological damage
to effect physical damage
Torture as a witness
Victim is forced to watch or listen to the torture of others
Victim is forced to participate in the torture / assault of others
Electric Shock
Electric shock to the genitals
Electric shock body - toes and fingers, etc
Shooting
Random shooting
Rubber bullets
Live ammunition
Birdshot
Buckshot
Capital Punishment
Post Mortem - Violation after death
Mutilation
Decapitation
Disembowelment
Improper burial - burial in a shallow grave
Blowing up bodies or body parts
Burn or braai a body
Removal of body parts
67
Appendix 5
KILLING SHOOTING Victim - with his two other comrades - were from a MK mission in South
Africa.
It was at the time when MK cadres infiltrated the country on 30 22 Umkhonto weSizwe
sabotage and other missions.
68
Themba Kubheka
Appendix 6
A TRC Statement
2
Below is the full complete statement made by an HRV victim.
The aim of a Gross Violation of Human Rights Statement is to try and gather as much informa-
tion as possible about the gross violations of human rights suffered by South Africans between 1
March 1960 and 5 May 1994. The questions that form the basis of the STATEMENT are designed
to make explicit the circumstances (broader context), the nature (type) and the consequences of the
violations.
What are “gross human rights violations”?
These are serious human rights violation like the killing of people, the kidnapping of people,
torture, or the severe ill treatment of people.
Who are victims of gross human rights violations?
Victims of gross human rights violations are people who are killed, abducted, tortured or se-
verely ill-treated; and family members or dependants of a person who was killed or who disap-
peared.
What happens to your statement?
Your statement will be recorded on the computer and you will be given a reference number
(JB04500/01GTSOW). The HRC Committee will carefully consider your statement. You might be
asked to come to a public hearing to talk about your case. The Committee will then decide if you
qualify as a victim in terms of the law that set up the Truth and Reconciliation Commission. It will
send you a letter telling you whether or not you qualify.
If the Committee on Human Rights Violations finds that you are a victim, it will include your
case in the report it sends to the Committee on Reparation and Rehabilitation. The Committee on
Reparation and Rehabilitation will look at all the cases sent to it.
Ms. Dudu Chili voluntarily gave the following statement to me and can be contacted at 27 11
331-3719 (W) and 27 11 462-7240 (H).
Ms. Dudu Chili’s statement
I, Dudu Chili, declare under oath in English that I am a female aged 54 years, ID number 411028
0191 084, and residing at number 7556 Maseko Street, Orlando West, P.O. Orlando 1804, Soweto in
the district of Johannesburg.
I wish to state that on the 28th February 1989 my house, at Orlando West in Soweto, was
bombed by the Mandela United Football Club (MUFC) and that I lost everything in it. My family
and I were left with what we were wearing.
I lost my niece – Finkie Msomi - who was thirteen years old. Finki, who was in my bedroom,
was shot in the head with an AK47 and died on the spot. Thereafter petrol bombs were hurled into
my house and it was burnt down. My cousin Barbara Chili was also burnt while trying to save Finki
from the fire. Barbara suffered third degrees burns on her waist. Finki’s sister, Ntombenhle Msomi,
was slightly burnt on the foot.
Sometime in 1986, Winnie Madikizela-Mandela formed the Mandela United Football Club. She
demanded that all the youth in our area, Orlando West, should join her club. Those who refused
were labeled sell-outs and hunted down to be killed. Since my son, Sibusiso Chili, refused member-
ship of the club, he became a target and I tried to intervene to protect my son. I approached my
cousin, Matilda Dlamini, to plead with the MUFC to spare my son’s life. Matilda, a long-standing
best friend of Winnie Madikizela-Mandela, temporarily succeeded. Matilda was married to Mo-
setlha. Mosetlha’s daughter was married to President Mandela’s son, Makgatho.
Two years later, in 1989, the hunting down of Sibusiso started again. A former member of the
MUFC, Lerotodi Ikaneng, had deserted the club. No one was allowed to leave the team. Lerotodi
was later caught and had his throat cut with garden shears by Jerry Richardson – the former MUFC
coach. Lerotodi survived. Some months later after this incident, Lerotodi pointed out one of his
2
In some cases of multiple similar entities (e.g., perpetrators, witnesses), where it does not affect under-
standing, we have omitted one or more entities.
69
Chapter Three: The South African Truth and Reconciliation Commission
assailants to Sibusiso. Lerotodi informed Sibusiso that this man had held him (Lerotodi) down
while Jerry Richardson cut his throat. Sibusiso then suggested to Lerotodi that they approach this
man and asked him to accompany them to my house to explain why they tried to kill Lerotodi. The
man agreed. At that time, I was highly involved with the youth in Orlando West. Since it was late at
night, I promised to attend to the matter the following day and asked this man to spend the night at
my place. He agreed and slept with Sibusiso and the other boys. That night, I phoned Mrs. Sisulu
to come and help to solve this problem. Mrs. Sisulu agreed and contacted the other leaders in the
area.
The next day I phoned a Mr. Ndo, who was the co-president with Mrs. Sisulu, to attend the
meeting. I also phoned a Mr. Steward Ngwenya who was a member of the Soweto Civic and he
promised to attend.
Whilst waiting for the above civic leaders to come, the young assailant requested to go home
to wash and changed into fresh clothing. He came back and was questioned on the motive to kill
Lerotodi and on the harassment of other youths that were not affiliated to the MUFC. He was also
asked why he was not attending school. The young man regretted his acts in the attempted murder
of Lerotodi and left.
I heard that some youths that were members of the MUFC reported to Winnie Madikizela-
Mandela that they saw this young man in the company of Sibusiso at my house. Lerotodi’s assail-
ant was summoned to appear before Winnie Madikizela-Mandela and her daughter, Zinzi Mandela,
to explain his visit to my house. In that meeting a decision was taken to eliminate i.e. to kill Lerotodi
and Sibusiso because they have become “too problematic”. Some MUFC members were mandated
to “carry out the order”. The late Maxwell Madondo and the self-exiled Katiza Cebekulu were part
of the group entrusted with the task to kill Sibusiso and Lerotodi. Katiza Cebekulu was also asked
to point out Sibusiso to the other members because they did not know him.
Immediately after the meeting, Dodo, a member of the MUFC club, rushed to both Lerotodi’s
place and my house to warn us of the impending attack. On hearing this, I immediately called Alfred
Msomi – Finki’s father – who lived, at the back house opposite to mine. Dodo immediately left the
township fearing for his life for alerting both the Lerotodi’s and I about the decision to kill our
sons.
The following day I was surprised to see my house being strategically guarded by people
wearing scarves and balaclavas. I informed Finki that these people were armed and apparently their
mission was to attack the house and kill Sibusiso. Sibusiso and his brothers had all gone into hid-
ing after being alerted by Dodo. This guarding of the house continued for several days – 24 hours
a day. These MUFC members apparently were not aware that we already knew of the attack.
I wish to point out that when the hunting down of both boys started, I had just arrived from
London. I had gone there to attend an anti-apartheid movement conference at Sherfield. There was
a concern shown by Winnie Madikizela-Mandela on my trip. I heard that she thought I had gone to
London to report her about the Stompie Sepei affair to the ANC leadership and other anti-apartheid
movements (i.e. the UDF, the Civic, the youth and the church leaders). Stompie Sepei – a young
activist from the Free State - had been kidnapped and killed the previous December in 1988. Stom-
pie and three other youth – Kenny Kgase, Gabriel Megoe and another – had been kidnapped from
the Methodist manse under Rev. Paul Verryn and taken to Winnie Madikizela-Mandela’s house in
Diepkloof. The remaining youth at the manse reported the matter to me since Rev. Paul Verryn was
on holiday. I was the first person to hear of the kidnapping. This trip annoyed Winnie Madikizela-
Mandela and I also became her target.
During the change of guards, my sons would sneak home to wash, change clothing and rush
back to their different hideouts. We too, had our spies watching the changing of shifts and would
immediately notify Sibusiso and others. One day Sisusiso was on his way home when he met three
of the MUFC members and a fight ensued. Immediately the word went out in the township that
some MUFC members had caught up with Sibusiso. The township youth ran to Sibusiso’s rescue.
One the three MUFC members, Maxwell Madondo, was clubbed and stoned to death. The other
two escaped and reported the killing of Maxwell to Winnie Madikizela-Mandela. Dempsey of the
South African police arrested me. First Dempsey said they were going to question me about my trip
to London. Dempsey wanted to know which ANC members did I meet and talked to. When they
could not extract this information from me, I was charged with the murder of Maxwell Madondo.
When Maxwell was killed, I was in my house. I was detained for a week and my letters were confis-
cated. My house was bombed the same day I was arrested. The following day after my arrest,
70
Themba Kubheka
Dempsey took me home in a police car. On our way, I read a poster stating in bold “Thirteen-year
old girl dies”. It never occurred to me that this girl was my niece, Finki. On arrival at my place, I
found my house destroyed by fire. Everything was completely gutted. All our belongings – furni-
ture, clothing, etc. – were burnt. Nothing was left except for the clothes we were wearing.
The police did not allow my neighbors to speak to me. My sisters informed me that my boys
were safe but that my niece Finki had died and that my sister Barbara had burnt her foot and was in
hospital. She hurt herself while trying to drag the body of Finki from the fire. I was taken to Klip-
town police station. During the court proceeding I was informed by the prosecutor that the charge
against me was withdrawn.
In conclusion I wish to state that Winnie Madikizela-Mandela was behind all the unfortunate
happenings both in Orlando West and at my home. She was in charge of the MUFC and the mem-
bers of this club took orders from her. She controlled the issuing of guns and ammunition. One of
the MUFC members – Charles “Bobo” Zwane - is serving a life sentence. Most of the MUFC mem-
bers refused to implicate her since they feared for their lives.
71
Chapter Three: The South African Truth and Reconciliation Commission
DEPONENT / VICTIM
Reference Number JB04500/01GTSOW
Person ID number 3
Surname CHILI
Aliases / Nicknames
Sex: Female
Race African
Employed: Yes
Contact name:
Contact Address:
Prison:
Contact phone:
Prison number:
72
Themba Kubheka
3
VICTIMS/WITNESSES:
1. 2. 3. 4.
Aliases / Nicknames:
ID/Passport number:
Date of birth:
Occupation:
Employed:
Street Address: 7556 Maseko Street, 7556 Maseko Street, 7556 Maseko Street, 7556 Maseko Street,
Home phone: (011) 936-7278 (011) 936-7278 (011) 936-7278 (011) 936-7278
Work phone:
Contact name:
Contact Address:
Contact phone:
Prison:
Prison number:
3
Number 5 omitted for space reasons.
73
Chapter Three: The South African Truth and Reconciliation Commission
VICTIM
1.
Aliases / Nicknames
ID/Passport number:
Date of birth:
Sex: Female
Race African
Occupation:
Employed:
Orlando West,
Soweto,
Gauteng.
Work phone:
Contact name:
Contact Address:
Contact phone:
Prison:
Prison number:
74
Themba Kubheka
WITNESSES:
1. 2. 3.
ID/Passport number:
Date of birth:
Occupation:
Employed:
Work phone:
Contact name:
Contact Address:
Contact phone:
Prison:
Prison number:
75
Chapter Three: The South African Truth and Reconciliation Commission
4
PERPETRATORS
1. 2. 3. 4.
ID/Passport number:
Date of birth:
Street Address: Protea/Norwood Police Protea/Norwood Police Orlando West Orlando West
Station Stations
Home phone:
Work phone:
Contact name:
Contact Address:
Contact phone:
Prison:
Prison number:
4
Perpetrators numbers 5, 6, and 7 omitted to save space.
76
Themba Kubheka
Date of interview
Place of interview
Language of interview
Katiza Cebekhulu, a former Mandela United Football Club member who is now in London, is
alleged to have left the country before the Winnie Madikizela-Mandela trial in 1991, in which he
was a co-accused in the Stompie Sepei trial.
Maxwell Madondo, a cook at the Winnie Madikizela-Mandela house and a member of the
Mandela Football Club, was killed when Sibusiso Chili dropped a rock on his head in February
1992.
Chili’s defense was that he has acted in self-defense and that Madondo was part of a hit-
squad of three Football Club members who had instructions to kill him. In court two of the three
were named as Madondo and “Killer”. The third was not named.
However, a British Broadcasting Corporation (BBC) program later named the third person as
Cebekhulu and interviewed him. He said that at a meeting at the offices of Winnie Mandela, it had
been decided that Sibusiso Chili and another Football Club member, Lerotodi Ikaneng, should be
killed.
The hit-squad was to have killed five youths who were accused of selling out to the police, but
instead Madondo was killed and six youth stood trial. Police later found the hit-list with five names
at the home of Winnie Madikizela-Mandela, where the Football Club members were living.
According to the BBC, “the most extraordinary development came near the end of the trial – an
incident that surprisingly went unreported by the South African media. The defense and prosecu-
tion advocates stepped outside the courtroom to confer. The defense said they would call as wit-
ness the third unnamed youth who had been with Madondo just before he was killed.” The BBC
said they had learnt that this youth was Katiza Cebekhulu and he had made a statement for the
defense confirming there had been a meeting in Winnie Madikizel-Mandela’s office in Orlando
West, at which it had been decided that Chili and Ikaneng would be killed.
He told the lawyers that the meeting had been chaired by Winnie Madikizel-Mandela and that
Zinzi Mandela and Jerry Richardson were present. He named others who were there.
After conferring with the defense, the State read the following statement into the court record:
“The admission the State will make is that the deceased Maxwell Madondo was a member of
the Mandela Football Club and that a decision was made by Mrs. Mandela and the football club to
kill accused no. 1 (Ikaneng) and no. 6 (Chili). But the witness, m’lord, whose name I will not mention
77
Chapter Three: The South African Truth and Reconciliation Commission
now, together with “Killer” and the deceased, was instructed and went out to kill accused no. 1 and
no. 6. That the person known as “Killer” was in possession of a firearm was to carry out the man-
dated decision.”
Chili was the only one found guilty and he was sentenced to one year’s jail. However, Jerry
Richardson, who was sentenced to death – later commuted to life – has made statement to the TRC
and he will be able to confirm or deny Katiza’s allegations.
Chili’s mother, Dudu Chili, told the BBC that she had worked with Albertina Sisulu to assist
boys to escape from Winnie Madikizela-Mandela’s home. She said she had been warned that a
decision had been taken to kill her son and she had warned him. Dudu Chili was one of seven
originally charged with Madondo’s murder. She was released on bail on condition that she stayed
away from Soweto for her own safety, and was discharged before the trial began. Her house was,
however, burnt down allegedly by Football Club members and her 11-year-old niece was shot and
burnt to death.
The summaries - read the entire case through highlighting the names of people mentioned and
make a short summary of the statement. It should include ‘WHO did WHAT to WHOM, WHEN,
WHERE and WHY’. Use names of victims and perpetrators.
The deponent, Dudu Chili, claims that her niece, Finkie Msomi was killed by a bomb and a
bullet shot on the 28th of February 1989 at Maseko Street in Orlando West. Winnie Madikizela-
Mandela, her daughter Zinzi Mandela and other members of the Mandela Football United Club are
implicated in this act of human rights violation. The deponent further claims that Madikizela-
Mandela and the MUFB members had begrudged her sons for refusing to join the club. They were
labeled sellouts. The person who had been targeted for murder was her son, Sibusiso Chili, who
narrowly escaped death after a mob, including Sibusiso, killed one of the assailants in the name of
Madondo in self-defense. Prior to these attacks, Lerotodi Ikaneng, another targeted youth, had
also escaped death after an attempt to murder him by cutting his throat. Ikaneng had sinned by
pulling out of the club. Chili asserts that they were thought to be dangerous because they had all
the information about the activities of the club.
(Give all known reference numbers of statements – HRV and amnesty - related to this case).
Refer: JB04520/01GTSOW, JB04637/01GTSOW, JB04519/01GTSOW, JB05408/01GTSO,
JB05194/01GTSOW, JB05714/01GTSOW, JB03657/02PS, JB05407/01MPNEL, JB05262/03NW,
JB05845/01GTSOW, JB05846/01GTSOW, AM2422/86, AM3690/96, AM6400/97, AM6401/97,
AM6402/97, AM7351/97, AM7511/97, KZN/MP/017/BL.
Extract as many acts, victims, witnesses and perpetrators as possible.
ACTS – Ensure that you use the controlled language when describing an event. For every de-
scription consult the controlled language and ensure that a word in bold is used. When multiple
injuries led to a death i.e. a person was bombed, shot and burnt, it is unclear which act was the
cause of death. State all the above three acts under Severe Ill-treatment and add a fourth under
Killing, thus Killing / Unknown.
VICTIMS – Write ‘DECEASED’ or ‘DISAPPEARED’ in brackets for all victims killed or disap-
peared respectively.
PERPETRATORS – The person who performed the act, people who gave orders or people
who were involved in the conceptualization of the act.
WITNESSES – Two categories of witnesses. Those who actually saw the event and those
who may not have seen it but can corroborate it or give more information.
78
Themba Kubheka
5
ACTS FROM THE ABOVE STATEMENT
Act 1
Victim CHILI, Dudu Olive
Age 45
Victim Number 3
Date 28-02-1989
Place Protea
Details Detained.
Outcome
Reason The police wanted Chili to give them the names of the ANC people she met in
London.
Political Context Because of the state repression at the time, an opportunity was created for
gangs like the Mandela Football Club to emerge. The club terrorized the
community around Soweto. The club was under the leadership of Winnie
Madikizela-Mandela. Anybody not cooperating with the club was branded as
a sell-out and liable to be killed.
PERPETRATORS
Name Number Organization
5
Thirteen acts were defined based on this statement. We show only Acts 1, 6, and 13.
79
Chapter Three: The South African Truth and Reconciliation Commission
Act 6
Victim MSOMI (DECEASED), Finkie Maria
Age 13
Organization
Date 28-02-1989
Outcome Injury
PERPETRATORS:
Name Number Organization
WITNESSES:
Name Number Eye witness
80
Themba Kubheka
81
Chapter Three: The South African Truth and Reconciliation Commission
Act 13
Victim CHILI, Sibusiso
Age
Organization
Place Johannesburg
Details Kept in isolation for almost a year in a dirty and filthy cell.
Outcome Detained.
Reason Punished for the Maxwell Madondo killing. Also the police wanted to know
the whereabouts of his other brothers: Mbuso, Nhlanhla and Kelly.
PERPETRATORS:
Name Number Organization
WITNESSES:
Name Number Eye witness
82
Themba Kubheka
83
Chapter Three: The South African Truth and Reconciliation Commission
A letter of acknowledgement.
The following letter is sent to each deponent / victim immediately after his / her statement has
been processed.
(letter head)
Reference No.: JB04500/01GTSOW
Ms. Dudu Chili
P.O. Box 925,
Johannesburg,
2000
Gauteng
16th September 1998
We would like to thank you for making a statement to the Truth and Reconciliation Commis-
sion. We apologize for the long delay in responding and ask for your understanding in this regard.
The Human Rights Violation Committee of the Commission is in the process of determining
whether or not you or the persons mentioned in your statement are victims of gross violations of
human rights as defined by its mandate. You will be notified of our finding by no later than 31st
March 1998.
When a finding has been made, those who were found to be victims will be referred to the
Reparation and Rehabilitation Committee. This committee will send these victims a Reparation Ap-
plication form in due course. The Reparation and Rehabilitation Committee will the make recom-
mendations to the State President on how the government should help those victims found to have
suffered gross violations of human rights.
Your willingness to trust the Commission with your memories will assist us to find out the
truth about South Africa’s past and will help bring about the healing that you and our country
need.
Thank you very much for volunteering to be part of the process of healing and reconciliation
in our country.
Yours sincerely,
Chairperson
84
Themba Kubheka
JB04500/01GTSOW
JB Johannesburg regional office
team 01
Other Examples
JB00099/01ERKWA
JB as above
00099 as above
01 as above
JB01238/03VT
03 team 03
JB04211/03WR
WR West Rand – Randfontein, Krugersdorp, Carltonville, Mohlakeng
JB03331/02PS
02 team 02
JB04100/02NW
NW North West – Mafikeng, Zeerust, Potchestroom
85
Chapter Three: The South African Truth and Reconciliation Commission
Appendix 7
Source Menu
Accesses all modules relating to GHRV statements, amnesty applications and any other
sources of information.
Note: To save space, we list all field names appearing on screens in the left to right order in
which they appear on the actual screen.
Reference Violation Document Place Taken TRC office Person ID Surname First Names
Number Type Date Number
Source Details
Once a source document has been registered and all necessary processing has taken place, de-
tails regarding the HRV, amnesty application are entered. All the information entered at registration
time will appear on the first page.
The second page of the Source Details module allows for the capturing of details of specific
acts of human rights violations, which appear in the source document (see Figure 1, above). Any
number of perpetrators or witnesses can be captured for a single act.
Person Details
Accesses all modules relating to people referenced within the system, including their relations
and biographies.
Person ID Residential Surname First Names
86
Themba Kubheka
person’s surname. This field may also be used as descriptor if nothing specific is known about the
person. The more detailed the information entered here, the more powerful the analysis and re-
search that may be done on it at a later stage.
The system also provides for the capturing of aliases and nicknames. It also allows for the
logging of multiple biographical episodes or periods for each person stored in the database. This
then serves for the building up of a political curriculum vita or any other biographical image of in-
terest for each individual person referenced within the system. If a person is deceased or disap-
peared, indicate by writing (DECEASED) or (DISAPPEARED) in brackets after the surname.
Registration Menu
Reference Number Violation Type Document Date Place Taken
87
Chapter Three: The South African Truth and Reconciliation Commission
“identified.” Construct sentences carefully: “they’s” and “he’s” in a sentence without names can
be confusing. The summary must give some indication of the political motive (if this is clear from
the organization of the victim and perpetrator and that would be sufficient). Be brief.
Fill in the “Date captured” and put your name, i.e., your person number in the fields for
“Processed by” and “Captured by”.
The “Notes” field is for the following type of comments: (a) reference to other statements, (b)
victim has appeared in a hearing, (c) an indication that the statement was not clear which perpetra-
tor was linked to which violation and (d) an indication that the statement was confusing or there
were discrepancies with dates, etc. This is where we need to capture any observation which we
may have made, such as the fact that it is linked to Boipatong massacre, Trustfeed massacre, etc. It
is NOT for the data processor to comment on whether or not you think this is a gross violation of
human rights - that is up to the HRV Committee to decide.
Save your work.
Move to the next screen for the violations.
Capturing the violations:
Refer to Appendix 5 (Completed Coder’s Sheet).
Call up the victim - by " person number”- under the acts.
Log the “date ” of the violation. You need at least the year. If no date is given, you might be
able to work it out from checking the TRC chronologies and/or checking related statements. If you
do this, then state it in the “Notes" field. If you cannot work out the date, log the year as 00 and
put an explanation in the “Notes" field.
Log the “town” where the violation happened. If this is not specified but from the context
seems to be the hometown of the victim, then use this. You can usually work this out from the con-
text of the statement. Often the victim’s hometown is also the deponent’s hometown. If you cannot
work it out, put it as “Unknown”. Put a note in the “Notes" field if the town was not specified.
Add the " Description of place." Give whatever detail possible, e.g., "At home, White City,
Soweto" or "at John Vorster Police station" or "open ground next to the main road through Duncan
Village." This is essential in order for the researchers to code properly for the "location."
Add the " Description of violation." This is the detailed free text area that was left out of the
summary.
You do not have to repeat any information captured in other areas, such as who did it or where
it was done. This area is crucial for capturing any information concerning the act not captured
elsewhere. In essence you need to capture what happened, what was used, where on their body
they were injured, how many times, etc. An example is: "jumped out of window." Remember to spec-
ify if a woman was pregnant or miscarried or if a killing was an assassination.
Use the coding sheet definitions as a guide and include the necessary keywords. You need to
ensure that you are all using the controlled language for description of events. For every descrip-
tion consult the controlled language and ensure that a word in bold face is used, i.e., use the "catch
phrases." This is to ensure that at a later stage searches can be made on the free text searching
through key words from the controlled language. This should not just be one word but it must be a
description of what happened. .
Add the Outcome and consequences. Here you capture all mental and physical injuries, such
as:
If they are not able to work as a result of their violation. This is not to be confused with if they
lost their job because of discrimination, or absenteeism whilst in detention, etc. These two fall un-
der Associated Violation.
If they lost any benefits which they should have received. Double check with the R & R"
(Reparation and Rehabilitation) form and "Further Question" for this information as well.
If any friends or relatives, including the deponent have been affected, this needs to be cap-
tured here as well.
State anything that happened immediately after the event. Examples: "fled the area," "death,"
"released after seven (7) weeks detention," or "dies four (4) days later as a result of injuries," or
"permanently disabled and died of complications five (5) years later," or "signed a confession after
two (2) hours torture and subsequently convicted on arson charges." If outcome is not specified
under that section of the statement you can often pick it up somewhere else in the statement. In
cases of incarceration, state the length of time involved, not just the date of arrest.
88
Themba Kubheka
89
Chapter Three: The South African Truth and Reconciliation Commission
Use the perpetrator " notes" field for any additional information.
Remember that there are two categories of witnesses: those who actually saw the event and
those who may not have seen it but can corroborate it, give more information, etc. You need to
indicate (Yes/No) whether or not they saw it. Examples are: "saw the victim shot" or "did not wit-
ness the shooting but saw the victim’s wounds the day after" or "was the doctor who attended to
the victim" or "acted as the deponent’s lawyer," etc.
Ensure that it is possible to understand the context around the witnesses and perpetrators - it
is not sufficient to just log the names onto the acts without explaining.
If a person fills in a statement himself/herself, that person is registered as the deponent. If
someone assists a deponent to fill in a statement the person who assisted should be mentioned in
Documentation below. At the same time not every single person named in the statement needs to
be extracted, e.g., if the deponent went to the morgue and found the deceased victim, and while
there bumped into a friend "X," "X" does not have to be captured as a witness.
If there is not much information on the perpetrators, just put what you have, e.g., SAP.
Save your work
Move to the next screen for details of documentation.
Documentation
Record documentation in the field marked "Statements made & other documents or items pro-
vided."
Only record documentation which the TRC actually has in possession and NOT documenta-
tion which we would like to have.
Mark whether or not the documents are attached.
If the items are too bulky to attach to the file, e.g., X-rays or a large file annexure, DO NOT in-
dicate that the items are attached. State clearly - in the details field - that the TRC is in possession
of the items but that they are filed elsewhere.
Include dates of documents
Use the attached list as a guide for how to list documents.
Save your work.
Move to the next screen for capturing Expectation & Consequences.
Court record Annexure D - Civil claim - Johnson vs. Ministry of Defense, case no108/89, Uitenhage court
Court record Annexure E - Criminal trial - State vs. Johnson, case 52/89, PE Supreme Court
NGO records Annexure H - extract from IDAF list of detainees, October 1989
NGO records Annexure I – Black Sash report on shooting incident, October 1989
Police records Annexure J – Photocopy of p172 of cell register, Jeffreys Bay police station, May-Dec 1989
Police records Annexure K - Police docket, CC95/89 - public violence charges against S. Johnson.
Police record Annexure L - Letter from Station Commander, 27/5/97, re: destruction of records.
90
Themba Kubheka
Medical record Annexure N - Records from Frere Hospital, EL, for S. Johnson, 19/10/89
91
Chapter Three: The South African Truth and Reconciliation Commission
92
Themba Kubheka
References
Ball, Patrick, 1996. Who Did What to Whom? Planning and Implementing a Large Scale Human
Rights Data Project. Washington: American Association for the Advancement of Science.
Ball, Patrick, Ricardo Cifuentes, Judith Dueck, Romilly Gregory, Daniel Salcedo, and Carlos Saldar-
riaga. 1994. A Definition of Database Design Standards for Human Rights Agencies. Wash-
ington, DC: American Association for the Advancement of Science and Human Rights Infor-
mation and Documentation Systems International.
93
Chapter Three: The South African Truth and Reconciliation Commission
94
Chapter 4
The South African Truth and Reconciliation Commission: Database
Representation
Gerald O’Sullivan
Introduction
The work of the Truth and Reconciliation Commission (TRC) was dominated by information
processing. By the time the Human Rights Violations Committee of the TRC had completed its
work, it had gathered 21,298 statements, containing 37,672 gross violations of human rights. The
Amnesty Committee of the TRC received a total of 7,127 applications for amnesty. At this time
(mid-1999), the work of the Amnesty Committee is not complete, so the total number of violations
gathered by the amnesty process is not known, but could ultimately be in excess of 10,000.
The anticipated volume and complexity of the information was such that the Commission de-
cided to set up a wide-area network and develop its own database to process the data. As it turned
out, the network and database comprised the backbone of the organization, structuring its work in
a systematic way. The end result is a rich, complex, logically disaggregated set of corroborated data
which enables researchers to make powerful statements about human rights violations.
Information technology in South Africa is sophisticated despite South Africa’s violent past,
under-developed economy and years of sanctions. It has become more so in the years since the
ban on liberation movements in 1990 was removed. With the necessary hardware, software and
skills available, the TRC was able to rapidly build a powerful electronic infrastructure.
In this paper, I describe the TRC’s experience of putting together this electronic infrastructure.
I will describe 1) the basic network structure, 2) the organizational structure of the TRC, 3) the in-
formation flow by which the data was loaded onto the database, 4) the logical model of the data-
base and finally 5) give some examples of the analytical results that such a database model pro-
vides. In the appendices, I give the complete statement used to gather data and the coding frame.
The editors excerpted and summarized lessons learned for this chapter and for Chapter 3. This
section appears as Appendix 3.
95
Chapter Four: The South African Truth and Reconciliation Commission
users with support. A commercial network service provider supported the Computer Officers by
performing the more complex hardware and networking tasks.
Durban
Johannesburg
Wide Area
Network
East London
Cape Town
The WAN allowed users to send e-mail from one office to another, transfer word-processed
documents between regions and share database information between the offices.
The commission network was not connected to the Internet for security reasons. Instead, each
office had one or more freestanding computers (i.e., without a connection to the network) with dial-
up access to an Internet Service Provider. There was no physical connection between the TRC
network and the Internet. This was the simplest, most reliable, least expensive way of isolating the
network from potential intruders, although more computer-literate users were frustrated by the lack
of e-mail connections to the outside world.
Human Rights Violation Committee (HRV Committee) Collecting statements of human rights violations from
victims or their surviving relatives
Reparation and Rehabilitation Committee (R&R Making recommendations for reparation and the
committee) rehabilitation of victims identified by the TRC
96
Gerald O'Sullivan
The executive arm of the commission consisted of national portfolio holders reporting to the
chief executive officer (CEO). They worked with the managers of the four regional offices to carry
out the operational functions of the TRC and gathered and processed the HRV statements and
amnesty applications on which the commissioners made findings.
Responsibility for the database and network fell under my charge as the Information Systems
Manager. I worked closely with the Information Managers in each of the regional offices to ensure
that the database functioned as expected, making enhancements to the functionality as more proc-
esses in the information flow came on stream. The Information Managers kept the information flow
moving and ensured that the data gathered by each office was loaded onto the database efficiently
and accurately.
The structure of the commission was as shown in Figure 2.
Commission
Executive
Information Research Investigative Legal Finance Executive other Regional Support services Information
systems Unit services secretaries portfolios manager manager manager
By vesting the responsibility for the electronic information systems in a position reporting di-
rectly to the CEO, the database was assured of a high profile in the organization, thereby avoiding
contests of ownership. It was not relegated to a purely “research” function or subsumed in the
work of the investigative unit.
Indeed, the reverse was a greater problem. It was difficult to get the Research department, In-
vestigative Unit and Commissioners to take ownership of the data that fed their own processes.
The main focus of the work of the Commission was on the public hearings, rather than on gathering
statements. Thus, for nearly two years, the attention of the researchers, investigators and commis-
sioners was directed away from the database, towards the logistics of preparing for hearings.
In the absence of involvement from other portfolios and committees, the perception emerged
that the contents of the database (quality, volume, and integrity) were the responsibility of the In-
formation Systems portfolio and the Information Managers in the regional offices only. This had a
substantial negative impact on the quality of the data since none of the principal users added value
through active use of the data, until the findings process began in earnest and the writing of the
final report was started.
97
Chapter Four: The South African Truth and Reconciliation Commission
the violations reported in statements made by deponents were analyzed, captured onto the data-
base, corroborated by investigators and finally passed to commissioners who made findings on
whether the violations constituted gross violations of human rights as defined by law.
The information flow was as shown in Figure 3.
1 2 3 4 6
Statement Registration Data Data Findings
Taking Processing Capturing process
5b
Research
The first four stages of the information flow were implemented early in the life of the commis-
sion. There was enormous pressure to get the database up and running and filled with data. As
soon as the first phase of the database development was completed (database engines installed on
the servers, input screens developed and installed on workstations), the registration, processing
and data capture began.
At the same time as the database development was underway, the commission started its pro-
cess of holding public hearings. These hearings generated enormous coverage for the work of the
commission and the statement-takers were able to harness the energy of the hearings to gather
statements. Unfortunately, the hearings diverted the focus of the commission from stages 5 and 6
and the crucial processes of corroboration, research and the making findings were put on hold. The
data processors and data capturers worked in isolation during this time, and received no feedback
on the quality or quantity of their work.
Once the process of corroboration began, and researchers began to rely on the primary data
from the statement-takers to prepare for hearings, rather than using mainly secondary source mate-
rial. As before, the quality of the data improved dramatically. The corroborative material (death cer-
tificates, press clippings, medical files, photographs) added enormous value to the database. Late
in the life of the commission, the findings process started and the data were authenticated.
Although laborious, the process of corroboration proved invaluable and gave the findings a
legitimacy they might otherwise have lacked. Before this, the data gathered often represented the
data-processor’s understanding of a hastily written statement, translated into English during an
interview with a possibly traumatized deponent, recounting events which may have happened sev-
eral years previously. Under these conditions, the probability of, and the scope for error were
enormous.
98
Gerald O'Sullivan
to describe the nature of the violation, the outcome of the violation and the description of where
the violation took place. By comparing the counts from the free-text statements to those from the
“license application form,” it was easy to see that we had lost almost all context and gained nothing
in the process. This form was dropped and the HRV Committee eventually compromised on a semi-
structured statement (see Appendix 1).
This semi-structured statement had advantages and disadvantages when compared to the
free-text statements, as I discovered by doing word-counts and by comparing the number of viola-
tions, victims and perpetrators per statement and the number of violations per victim. The results
were mixed but interesting.
In those offices where data processing was known to be weak, the numbers of violations, vic-
tims and perpetrators improved, but in those where data processing was known be better, the rates
dropped. The structure helped weak data processors to identify the relevant violations, victims and
perpetrators. Previously, they had been lost in the narrative, but better data-processors had less
narrative from which to draw, and the structure of the statement only allowed for one victim per
violation type, such as killing, torture, severe ill-treatment, etc. (See Appendix 1). Thus, they ended
up with fewer violations per victim and fewer victims and perpetrators per statement.
The word-counts showed little change in the amount of detail captured to describe each viola-
tion and the consequences of each violation (whether the office increased the number of violations
per statement or not). However, they did show a definite improvement in all offices where details
about the perpetrators, political context and the place of violation were concerned. It was clear that
the semi-structured statement focussed the attention of statement-takers on questions that had
been previously neglected. A deponent’s testimony is understandably centered on the trauma of
the violation itself, so less detail was gathered about the context in which the violation took place.
1
This is the Act of Parliament which established the TRC and defined its mandate.
99
Chapter Four: The South African Truth and Reconciliation Commission
The users had a suite of programs on their workstations that connected them to the database
servers. This arrangement allowed them to register statements and amnesty applications, capture
the contents of the violations, carry out complex searches on the data, extract data into spread-
sheets, and print a variety of computer-generated reports such as: the content of statements or
amnesty applications, corroboration carried out, letters of acknowledgement, perpetrator details,
incident reports, as well as statistics for monitoring the performance of the information flow.
100
Gerald O'Sullivan
The six central entities, with the attributes 2 of relevance for the purposes of this paper, were
PERSONS, SOURCES, ACTS, PERPETRATORS, WITNESSES and EVENTS. We first describe
these entities with their attributes and then show the relationships among these entities in a flow
chart.
PERSONS The PERSON entity consists of current or static information about the person,
whether he or she was a deponent, victim, perpetrator or witness to a violation,
and details about staff members.
person_seq sequential number to uniquely identify persons (only partial details may be
known about a person, so a system-generated primary key was used)
id_number South Africans have a unique 13-digit identification number which can be
used to determine date_of_birth or sex; this field could also be used to hold
passport numbers, or the old apartheid reference book number if the ID
number was not available
race human rights violations are often about ethnicity or race, uniquely so in the
South African context; this attribute was valuable when analyzing patterns
in the violence
date_of_birth the ages of victims at the time of the violation or at the time of taking the
statement can be calculated from the date of birth
A number of other PERSON attributes were on the system, but did not prove as useful as the
above, because the information was either unavailable or unreliable. These attributes included mari-
tal status, religion, employment status and language. Other attributes not included here involved
administrative functions - notes about the person, date of the victim finding, etc.
2
For clarity the names of the attributes here are not exactly the same as were used in the database.
101
Chapter Four: The South African Truth and Reconciliation Commission
SOURCES The SOURCE entity holds details of the source of the information about the vio-
lations in question. In the case of the TRC, violations either came from Human
Rights Violation statements, or Amnesty applications. Secondary source material
was only used for corroborative purposes.
reference_no file reference number allocated to the document
protocol_type a code to indicate whether the document was an HRV statement or amnesty
application; because several different versions of the HRV statement were
used, the code also identified the version
deponent the identifier of the person who made the statement or submitted the
amnesty application; this had a foreign-key constraint to person_seq in the
PERSONS table
place the town where the statement was made or amnesty application lodged
status the status field was used to track where in the Information Flow the
document was, Registered, Processed, Corroborated, or Finding
date_taken the date the statement was taken, or amnesty application made
interviewer the identifier of the staff member who took the statement or application
registrar the identifier of the staff member who registered the document
processor the identifier of the staff member who processed the document
corroborated (by) the identifier of the staff member who corroborated the document
The dates and person identifiers above held valuable details of the progress of the document
through the information flow. They were particularly useful for monitoring blockages in the system,
finding the location of backlogs and monitoring the performance of individual staff members in
terms of speed and accuracy. These fields were not normalized for ease of programming and data-
base performance. Strictly speaking, a SOURCE_HISTORY entity should have been used.
102
Gerald O'Sullivan
ACTS The ACT entity was at the heart of the database. This entity held details of the
What and Whom, as well as When, Where, How and Why. It has a many-to-one
relation to SOURCES (one document can describe many violations) as well as to
PERSONS (one person can be violated many times).
reference_no file reference of the source document
violation_type code used to categorize the nature of the violation. In practice, the TRC
conflated the category of the violation as defined in terms of the legislation
with the modus operandi of the violation, so the codes were of the form
KILLING/SHOOTING or TORTURE/ELECTRIC; in retrospect, we should have
had two fields, one for the legislative category and one for the mode of the
violence. The approach used was the result of a lack of clarity regarding the
coding frame at the start of the process. (See Appendix 2 for the coding
frame)
outcome_type code used to categorize the outcome of the violation. Unfortunately, due to
time pressures, this was not used systematically, but it does have enormous
analytic capacity for assessing the human cost of gross violations of human
rights
location_desc narrative description of the location of the violation (in a police cell, for
example, or at the training camp, at the chief’s kraal)
location_type like the outcome_type, this was not used systematically, but had it been
used, it could have contributed to the recommendations chapter of the Final
Report
day the day of the month of the violation; the date of the violation was split into
its three components - day, month and year - because on many occasions,
only partial date details were given in the documents
victim_org the code of the organization to which the victim belonged. This was selected
from a lookup table to ensure uniformity of spelling, etc.
The ACT entity had a several other attributes for administrative purposes, including a “verac-
ity” indicator. This was subsequently used to record the commissioners’ finding on whether the
violation constituted a gross violation of human rights, or whether amnesty was granted in respect
of the offence.
103
Chapter Four: The South African Truth and Reconciliation Commission
PERPETRATORS The PERPETRATOR entity holds details of the individuals who carried
out the violation. It has a many-to-one relation to the violation, because
many perpetrators can carry out one violation.
perp_org the code of the organization to which the perpetrator allegedly belonged.
This was selected from the same lookup table as the list of victim
organizations
The perpetrator entity proved to be very useful for analytic purposes, especially with respect
to the alleged organizational allegiance of the perpetrator. However, in most cases, the rest of the
information was too sparse to be of much value for investigative purposes. In most cases depo-
nents remembered little of substance other than the name of the organization involved; the other
attributes, such as vehicle_used, or place_last_seen, were rarely used.
WITNESSES The WITNESS entity holds details of the individuals who witnessed the viola-
tion. It has a many-to-one relation to the violation, because many individuals can
witness one violation.
reference_no file reference of the source document
The WITNESS entity proved less useful than was anticipated at the start. It was intended help
the investigators follow up the details of the case, but in most cases, the deponents themselves
were the best witnesses.
104
Gerald O'Sullivan
EVENTS The EVENT entity was used to group violations from a variety of documents into
conceptually meaningful events. For example, this entity was used to group all
violations pertaining to the Ratanda bus massacre in one event. The event was a
recursive entity, so small events could be grouped together into larger events.
event_id sequential number to uniquely identify events
The EVENT entity had great potential, but was not used to its full capacity by the researchers
who were expected to be the major users of this entity. Due to other pressures, they were unable to
devote enough time to learn how to make it useful for their needs. Ultimately, it proved useful to
the investigators preparing for hearings who used it to extract violations, which they then loaded
into a tool, which drew diagrams of links between thousands of incidents, perpetrators and victims
in a matter of seconds. The Event entity was also later used by the Amnesty Committee to plan
hearings by grouping violations from various amnesty applications together.
Despite the relatively few entities in the data model, it was complex enough to model all the real
world events that were brought before the commission. For example, the same person could be a
victim at different times and in different places. A person could be a deponent telling about the
death of a relative, and simultaneously be a victim in his or her own right. A person may be the
victim of torture, and then perpetrate a gross violation of human rights in retaliation at a later date.
105
Chapter Four: The South African Truth and Reconciliation Commission
Sources
made-
statement
comprises
describes
victim-of
witnessed- perpetrated-by
by
Witnesses Perpetrator
witnessed
perpetrated
To keep the database design as straightforward as possible and to minimize the time spent on
the design and build phase, no history of changes to entities was maintained. Instead, the same
record was updated as new information became available or errors identified.
Given more time, it would have been of great benefit to design a database capable of holding
various versions of the violation, for example, to keep the original version as told by the deponent
separate from the corroborated, or “the finding” version. With such a capability, researchers could
have investigated the nature of oral testimony as compared to the “official” version of history.
Also, operational managers could have seen where errors were corrected and why, or if needed, to
revert to an earlier version.
106
Gerald O'Sullivan
800
Fatal
700 Non-fatal
600
500
Violations
400
300
200
100
0
1975-02
1975-09
1976-05
1976-12
1977-07
1978-02
1978-09
1979-04
1979-12
1980-07
1981-02
1981-09
1982-04
1982-11
1983-06
1984-01
1984-08
1985-03
1985-10
1986-05
1986-12
1987-07
1988-02
1988-09
1989-04
1989-11
1990-06
1991-01
1991-08
1992-03
1992-10
1993-05
1993-12
1994-07
1995-02
Date
Other analyses were done on the ages of victims, their gender, their political affiliation, and by
the type of abuse suffered. For example, graphs were drawn of the different age cohorts of depo-
nents for each gender, which showed that the perception of statement-takers that most deponents
were middle-aged women was true.
107
Chapter Four: The South African Truth and Reconciliation Commission
Besides its analytic value, the database was used to monitor processes in the information flow.
For example, the graph below shows the progress of implementation of a pilot HRV statement in an
office.
200
180
160
140
120 HRV
Number
100 PILOT
Grand Total
80
60
40
20
0
/07
/02
/28
/08
/21
/05
/19
/16
/30
/14
/11
/25
/22
/06
/04
/06
/07
/09
/04
/05
/05
/06
/06
/07
/08
/08
/09
/10
96
96
96
96
96
96
96
96
96
96
96
96
96
96
Week beginning
This type of analysis informed research work, as well as policy formulation for the Rehabilita-
tion and Reparation Committee and strategic planning of the commission’s work. The results con-
tributed substantially to the final report of the commission, underpinning the narrative text in a way
that dramatically highlighted the scale and extent of the violence of the past.
108
Gerald O'Sullivan
Appendix 1
The TRC Gross Violations of Human Rights Statement
Note: Throughout this appendix, we have reduced the spacing between lines and removed
blank space for entries to reduce space and make it easier for the reader to determine the struc-
ture. Where blank spaces for entries, their presence is indicated by entry lines (“
...........................................”) of varying length.
STATEMENT
concerning
GROSS VIOLATIONS OF HUMAN RIGHTS
The aim of this STATEMENT is to gather as much information as possible about the gross
violations of human rights suffered as a result of the political conflict in South Africa. According to
the legislation, gross human rights violations are:
109
Chapter Four: The South African Truth and Reconciliation Commission
110
Gerald O'Sullivan
Declaration
I, ……….………………………………………………………. solemnly declare that the informa-
tion I am about to give the Truth and Reconciliation Commission, is to the best of my knowledge,
true and correct and I consider the contents of this statement binding on my conscience.
_________________________________ ________________
Signature / Finger Print / Mark Date
_________________________________
Witness signature
If you are called to a public hearing, will you be prepared to appear? YES NO
[circle]
IMPORTANT:
• Some women testify about violations of human rights that happened to family members or
friends, but they also have suffered abuses. Don’t forget to tell us what happened to you your-
self if you were the victim of a gross human rights abuse.
Please fill in this section if somebody is HELPING you to make the statement.
Full name of person helping: …………………………………………………….
Relationship to person giving statement (for example, neighbour, friend):……………………
Address:..………………………………………………………………………..
………………………………………………………………..…………………
111
Chapter Four: The South African Truth and Reconciliation Commission
NO
(for example, your son, daughter, grandchild, mother, father, aunt, friend, etc.) [circle]
112
Gerald O'Sullivan
4.VICTIM DETAILS
Please list ALL the victims you have mentioned and give details as far as you know:
Full names Sex and age Race as per Relationship Occupation Organisational involvement
of person violated at time of Apartheid of person at time of (give dates and position)
(i.e. victim) violation classification making the violation
statement (for example, Community Council,
to the victim SAP, ANC MK, APLA,, SADF, trade
unions, women or youth organisa-
tion, civics, religious group)
for example Jackie female;21 yrs White myself student UDF supporter (1983-85)
Jones Church deacon
for example Sam Ma- male; 34 yrs African my son taxi driver COSAS branch chairperson (1987)
jola MK member (since 1985)
5. POLITICAL CONTEXT
Please describe the political situation in the community at the time of each incident.
....................................................................................................................
(for example, there was a mass funeral in the community that day; stay-away; boycott; march; mutiny in the camp; political
rally; etc.)
6. PLEASE PROVIDE SPECIFIC DETAIL NEEDED BY THE TRUTH
AND RECONCILIATION COMMISSION
This section of the statement is to provide all the relevant information needed by the TRC concerning the spe-
cific gross human rights violations.
Please mark the boxes below, and then turn to the appropriate section and answer the questions afterwards as
far as you can.
The questions below are arranged according to the different types of gross human rights violations as defined
by Parliament. You are requested to:
• please indicate which categories are relevant to your experience by marking a cross (X) in the appropriate box. If
you have experienced more than one type or category of violation please indicate this by putting a cross (X) in
the appropriate boxes.
• If your experience does not fit exactly into any one of the types/categories of violations listed below, please use
the ADDITIONAL PAGES at the end of this form to write down your story.
113
Chapter Four: The South African Truth and Reconciliation Commission
Mark with an X
Killing
The person died as a result of a violation(s) (for example, shot by police at a politi-
cal funeral, died as a result of torture in detention).
Serious Injury or Severe Ill-Treatment
The person does not die. Examples include bombings, shootings, stabbings,
burnings, sexual abuse, attempted killings. These may have occurred in demonstra-
tions, political conflict between groups, armed combat, etc.
Torture
Systematic and intentional abuse with a particular purpose, for example, to get in-
formation, intimidation, or punishment. This happens in captivity or custody by
the state or other groups. The person, however, survived the ordeal.
Abduction or Disappearance
There is evidence that someone was taken away forcibly and illegally, or the per-
son vanished mysteriously and was never seen again.
114
Gerald O'Sullivan
EVENT
Name of Victim. …………………………………………...........................………
When was the person killed? (date and time): …………………..........................…
Where was the person killed? (exact location, including street, name of building, area, town):
……………..................................................……………………………….
(for example, in front of the house in Akker St.; at the taxi rank in Extension 4)
Please describe how the person was killed. Include details of what weapon was used to kill the person:
.........................................................................................................
Why was the person killed? ……....................………………………………………
Was there a post-mortem or inquest? If yes, what was the outcome?
(for example, did a doctor examine the body to find out the cause of death? Did you find out how the person was killed? Did you go
to court to find out what happened? Was anybody found responsible for the death?)
…………………………………………………………………………………………
115
Chapter Four: The South African Truth and Reconciliation Commission
PERPETRATORS
Can you identify the perpetrators in any way? Give names, rank and title, and physical description.
............................................................................................
(for example, Mr. Siyanda, member of people’s court; four men in balaclavas; a big man with a scar called Kallie)
How do you know who they were? ………………………………....……
(for example. I saw them; my neighbor told me; there was a court case)
What organization do you think they belong to or support? ……………...
(for example. SAP, UDF, witdoeke, PAC, comrades, SADF, Riot Squad, Town Council, Inkatha, ANC)
Can you specify who did what? Who was in charge? Who gave orders? Who was with them?
........................................................................................................
(for example, Mr. Siyanda ordered the killing, Vusi poured the petrol and Toto lit the match)
Where and when did you last see the perpetrator(s)? ..................................
Would you like to meet the perpetrator(s)? .................................................
WITNESSES
Is there anyone else who knows what happened to you or the alleged victim either before, during or after the kill-
ing? If yes; please answer the following questions as fully as possible.
Name of Witness Contact address and telephone What did this person see or hear?
number of witness
for example, Mrs 13 Esau St, Lenasia She saw the shooting of my son
Moodley tel (011) 123456 and told me about it.
my neighbour
ADDITIONAL INFORMATION
…………………………….…………………………………………………
116
Gerald O'Sullivan
The violation did not result in death. These may have occurred SERIOUS INJURY OR
in demonstrations, political conflict between groups, armed com- SEVERE ILL-
bat etc. Examples of severe ill-treatment include shootings, stab- TREATMENT
bings, beatings, sexual abuse, burnings.
EVENT
Name of victim. ……………………………………......................……………
When did the violation occur? (date and time) ……………...…………………
Where did the violation occur? (exact location, including street, name of building, area, town):
..............................................................................................................
(for example, in front of the house in Akker St.; at the taxi rank in Extension 4)
Please describe in detail what was done to you and/or the person you are talking about?
.……………………..................................................................……….
Were you or the victim sexually assaulted? Please give details: ..........................
Was there a court case? If yes, what was the outcome? …………………...……
PERPETRATORS
Can you identify the perpetrators in any way? Give names, rank and title, or physical description.
........................................................................................................
(for example, Kitskonstable Jacobs; Mrs Daba and a group of comrades; four men in balaclavas)
How do you know who they were? …………………………………………...…
(for example. I saw them; my neighbor told me, there was a court case)
What organization do you think they belong to or support? ………………….....
(for example. SAP, UDF, witdoeke, PAC, comrades, SADF, Riot Squad, Town Council, Inkatha, ANC)
Can you specify who did what? Who was in charge? Who gave orders? Who was with them?
.................................................................................................................
(for example. Capt Coetzee ordered the shooting; Constable Denga shot me in the stomach)
Where and when did you last see the perpetrator(s)? …......…………………….
Would you like to meet the perpetrator(s)? …………………..........…………….
WITNESSES
Is there anyone else who knows what happened to you or the alleged victim either before, during or after the inci-
dent?
If yes; please answer the following questions as fully as possible.
Name of Witness Contact address and telephone What did this person see or hear?
number of witness
(for example) Joe Mini 1409 KwaMashu, Durban He found me being beaten by Vusi
tel (031) 123456 and his friends
ADDITIONAL INFORMATION
.....................................................................................................................
117
Chapter Four: The South African Truth and Reconciliation Commission
EVENT
Name of victim. ………………………………………………....................…
When were you and/or the victim tortured? (dates, times, length of time) ……
Where did the torture occur? (exact location, including street, name of building, area, town)
:…………………………………………………………..……..........…
(for example, Loubscher’s office at the police station; in the detention centre near the camp)
Please describe in detail what was done to you or the person you talking about. In other words, describe the tor-
ture: ....................................................................
Were you sexually assaulted? Please give details: …………………………..….
Why were you or the person you are talking about tortured? …………………
(for example, to sign a statement, to become a state witness, punishment)
Describe the conditions of the captivity ……………………………............….
PERPETRATORS
Can you identify the perpetrators in any way? Give names, rank and title, or physical description
……………….....................................................................……
(for example, Kitskonstable Jacobs; Mrs Daba and a group of comrades; four men in balaclavas)
What organization do you think they belong to or support? ……….....……….
(for example. SAP, Security police, Mbokodo , ANC, SADF, Town Council, Inkatha, Transkei police)
Can you specify who did what? Who was in charge? Who was with them?
(Capt Piet was in charge of my interrogation; Botha applied electric shocks; Commander ‘Zizi’ suspended me upside down )
Where and when did you last see the perpetrator(s)? …….…………….....….
Would you like to meet the perpetrator(s)? ………........................................
ADDITIONAL INFORMATION
Describe any visits by doctors or District Surgeons. Give names and details: .....
Describe any visits with a magistrate. Give names and details: .........................
Did you see a lawyer? Was there a court case? Was the torture experience described in court? What was the out-
come of the case? .........................................................
Is there anything else you wish to tell the Commission about this experience of torture?
…………………………………………………………………….........….…….
118
Gerald O'Sullivan
WITNESSES
Is there anyone else who knows what happened to you or the alleged victim either before, during or after the inci-
dent? ..................................................................
If yes; please answer the following questions as fully as possible.
Name of Witness Contact address and telephone What did this person see or hear?
number of witness
(for example) Mrs 14 Grange Str, Meadowlands She was in the police cell with me
Khumalo tel (011) 123456 and saw my wounds
(for example) District Pretoria Central Prison He saw my injuries and refused
Surgeon treatment
can’t remember name
119
Chapter Four: The South African Truth and Reconciliation Commission
EVENT
Name of victim . …………………………………………………
When did the abduction/disappearance take place? (date and time) ……………
Where did it happen? (exact location, including street, name of building, area, town)
:………………………………………………………………………………….
(for example, from his house at 1711 Loerie St.; from the taxi rank in extension 5)
Please describe how it happened. ……………………………………………..
Where was the person taken to? (street, building, town) ……………………..
Why did it happen …………………………...........…………………………..
What was the outcome? Did the person come back? ……..…………………..
(for example, They let me go after two weeks; my son’s body was found the next day)
PERPETRATORS
Can you identify the perpetrators in any way? Give names, rank and title, or physical description.
.....................................................................................................
(for example, Mr Siyanda member of people’s court; Chief Ndlela , leader of Mbokodo; four men in balaclavas)
How do you know who they were? ……………………………………………
(for example. I saw them; my neighbor told me, there was a court case)
What organization do you think they belong to or support? ………………….
(for example. Security police, vigilantes, comrades, Mbokodo, Town Council, Inkatha, ANC, SADF)
Where and when did you last see the perpetrator(s)? ...……………………….
Would you like to meet the perpetrator(s)? ……………...…………………….
WITNESSES
Are there any witnesses to the violation either before, during or after the incident?
.........................................................................................................................
If yes; please answer the following questions as fully as possible.
Name of Witness Contact address and telephone What did this person see or hear?
number of witness
(for example) Mr 629 Site C, Khayelitsha He saw my son being dragged into
Mpokeli a taxi by five men in balaclavas.
120
Gerald O'Sullivan
7. EXPECTATIONS
An important part of the Truth and Reconciliation Commission’s proposals to the President will be about symbolic
acts which will help us remember the past, honour the dead, acknowledge the victims and their families and further
the cause of reconciliation.
Please give us your opinion on what should be done:
7.1 For individuals: ...................................................................................
(for example, medals, certificates, street names, memorials, grave stones, etc.)
7.2 For the Community: ............................................................................
(for example, a peace park, build a school, special ceremony, annual religious service, etc.)
7.3 For the Nation: ....................................................................................
(for example, a monument, national day of remembrance, etc.)
121
Chapter Four: The South African Truth and Reconciliation Commission
9. DOCUMENTATION DETAILS
Have you already made one or more statements about this incident? YES NO [circle]
If yes, please specify:
To WHOM statement was made? WHEN? CONTACT details / person
(for example, Foundation for Equality (for exam- (for example, Adv. Strydom tel. (***) -
before the Law) ple1993) *** ***
Do you have any documents that will help the Commission understand the situation and experience you have de-
scribed? YES NO [circle]
(for example, Doctor’s Certificate, Membership card, Diary, Newspaper clippings, Legal Documents, Post-Mortem report, Hospital
records, Police records, Court records, Inquest reports etc).
Type of Document Where is this document at the moment?
(for example) Inquest report with the lawyer Smith, Jones and Associates
(for example) Death certificate at home
What legal action did you or the victim take? Please give dates and the name of the lawyers, magistrates and judges
if you can. ...............................................................
(for example, was there a court case about the violation? Did you sue the perpetrators for damages? Did you lay charges against the
perpetrators?)
What was the result? .................................................................................................
ADDITIONAL PAGE
Please mark clearly which question or paragraph you are answering on this page.
.........................................................................................................................
122
Gerald O'Sullivan
……………………………………………………………………
123
Chapter Four: The South African Truth and Reconciliation Commission
RELEASE FORM:
Medico-Legal Records
I, ………………..…………………………………………....
(name of person giving permission
hereby grant permission for the Investigative Unit of the Truth and Reconcilia-
tion Commission to obtain copies of all
medico-legal records of
…..……….………………………………………..………. who is
(name of victim)
……………..……………………….........,
(relationship to victim, for example, myself, my son, my daughter)
for the purposes of ongoing investigation being conducted by the Truth and
Reconciliation Commission.
Yours faithfully,
124
Gerald O'Sullivan
Appendix 2
Introduction
The task of the TRC is to identify those people who suffered gross violations of human rights,
which are defined as follows: Killing, Abduction, Torture and Severe Ill-treatment. In addition to
these four, there is a fifth category which is not a gross violation of human rights, but is important
for understanding the context, called an Associated Violation.
Each of the five categories has several sub-headings, which explain how the violation took
place (a person can be killed in different sorts of ways, so we need to identify how they were
killed). By breaking the categories into sub-headings, we can then do meaningful counting for the
final report.
125
Chapter Four: The South African Truth and Reconciliation Commission
Torture TORTURE Torture happens in captivity or in custody of any kind, formal or informal
(for example, prisons, police cells, detention camps, private houses,
containers, or anywhere while tied up or bound to something).
Severe Ill- SEVERE Severe Ill-treatment covers attempted killing and all forms of inflicted
treatment suffering which caused extreme bodily and/or mental harm.
Abduction ABDUCTION Abduction is when a person is forcibly and illegally taken away (for
example, kidnapping). It does NOT mean detention or arrest. It is not a
gross violation of human rights to be arrested (see Associated
violations).
Associated ASSOCIATED These are not gross violations of human rights, but are important for
violation understanding the context of the violation (for example, detention,
harassment, framing, violating a corpse after death).
126
Gerald O'Sullivan
Beaten to death BEATING Person is beaten to death by being hit, kicked, punched.
State which part of body assaulted if known e.g., feet, face,
head, genitals, breasts .
Burnt to death BURNING Victim is killed in a fire or burnt to death using petrol,
chemical, fire, scalding, arson, but does NOT include
Necklacing or Petrol Bomb (these are separate codes).
Killed by poison, drugs or CHEMICALS Killed by poison, drugs, or household substance , such as
chemicals bleach or drain cleaner.
Killed by drowning DROWNING The person is drowned in a river, swimming pool, or even
in a bucket of water.
Killing by death sentence EXECUTE Hanged or shot as decided by a formal body (court or
tribunal) such as the state, homeland state, or political party.
Killed in an explosion EXPLOSION Killed by any manufactured explosive or bomb, but NOT petrol
bomb (see below).
Killed by exposure EXPOSURE Person dies after being subjected to extremes such as heat,
cold, weather, exercise, forced labour.
Necklacing NECKLACING Burnt with petrol and tire. Necklacing is coded separately
from Burning, because it featured heavily in the past, so it is
useful to distinguish between burning with petrol and a tire and
burning in a house, for example.
Other type of killing OTHER All other methods of killing including buried alive , strangling,
tear-gas , decapitation, disembowelment. Make sure that
it is clear in the description of the act exactly how they died.
Petrol bomb PETROLBOMB Killed by a burning bottle of petrol. Petrol bombing falls in
between burning and bombing, so, like necklacing, it is useful
to code it separately. Also called molotov cocktail.
Shot dead SHOOTING Person is shot and killed by live bullet, gunshot, birdshot,
buckshot, pellets, rubber bullet.
127
Chapter Four: The South African Truth and Reconciliation Commission
Stabbed to death STABBING Killed with a sharp object, such as a knife, panga, axe,
scissors, spear (including assegai).
Stoned to death STONING Person is killed with bricks, stones or other missile thrown
at them.
Unknown cause of death UNKNOWN Person is dead, but there is no further information.
Killing involving a vehicle VEHICLE Dragged behind, thrown out, driven over, put in boot,
but NOT car bomb (see Bombing). Specify what type of
vehicle was involved (for example, car, train, truck, van,
bakkie, hippo, casspir).
Torture by beating BEATING Person is tortured by being beaten severely or for a long time
(for example, hit, kick, punch). State which part of body
was assaulted e.g., feet, face, head, genitals, breasts .
Torture by burning BURNING Person is burnt, with cigarettes, or fire, for example.
Torture with poison, CHEMICALS Tortured with poison, drugs, or household substance ,
drugs or chemicals such as bleach or drain cleaner.
Electric shock torture ELECTRIC Electric shocks to the body. Specify which body part was
shocked (for example, genitals, breasts, fingers, toes,
ears, etc.).
128
Gerald O'Sullivan
Torture by bodily MUTILATION Torture involving injuries to the body where parts of the body
mutilation are partly or wholly cut, severed or broken.
Other type of torture OTHER All other methods of torture. Make sure that it is clear in the
description of the act exactly how the person was tortured. It
includes use of animals (specify animal e.g., snake, tortoise,
baboon), use of vehicle.
Torture by forced posture POSTURE Person is tortured by forcing the body into painful positions, for
example, suspension, helicopter, tied up, handcuffed,
stretching of body parts, prolonged standing, standing
on bricks,
Torture by sexual assault SEXUAL Person is tortured by attacking them using their gender or
or abuse genitals as a weak point.
Torture by suffocation SUFFOCATE Torture by stopping someone from breathing, for example by
bag, towel, tube (wet or dry) over head, drowning
(head, whole body submerged), choke, strangle, stifle,
throttle, teargas, bury alive.
Unknown type of torture UNKNOWN Person is tortured, but the method is not known.
129
Chapter Four: The South African Truth and Reconciliation Commission
Severely beaten BEATING Person is badly or severely beaten, or beaten for a long period
of time. They may be hit, kicked, punched, twisted. State
which part of the body was assaulted (e.g., feet, face, head,
genitals, breasts).
Injured by burning BURNING Person is injured by burning with fire, petrol, chemical,
scalding, but NOT Necklacing or Petrol Bomb (these are
separate. See below).
Injured by poison, drugs CHEMICALS Person was poisoned or injured by poison, drugs,
or chemicals household substance (for example, bleach or drain
cleaner).
Injured in an explosion EXPLOSION Person is injured by a bomb or explosives, but NOT petrol
bomb (this is coded separately. See below). Explosives
include dynamite, land-mine, limpet mine, car bomb,
hand-grenade, plastic explosives, detonator, booby
trap, letter bomb, parcel bomb, special device ( e.g.,
booby-trapped Walkman).
Bodily mutilation MUTILATE Person is injured by having parts of their body mutilated or
damaged. Specify body part, for example, genitals,
fingernails, ears, hair, etc.
Other type of severe ill- OTHER All other types of severe ill-treatment. Make sure that it is clear
treatment in the description of the act exactly how they were ill-treated.
It includes strangling, drowning, spreading of disease.
Sexually assaulted or SEXUAL All forms of attack on a person using their gender or genitals
abused as a weak point, for example
Injured in a shooting SHOOTING Person is injured by being shot with live bullets, gunshot,
birdshot, buckshot, pellets, rubber bullet. Specify body
part injured, if known.
Stabbed or hacked with a STABBING Injured with a sharp object, such as a knife, panga, axe,
sharp object scissors, spear (including assegai).
130
Gerald O'Sullivan
Injured in a stoning STONING Person is injured with bricks or stones thrown at them.
Unknown type of severe UNKNOWN Person was severely ill-treated, but it is not clear how.
ill-treatment
Injury involving a vehicle VEHICLE Injuries caused by being dragged behind, thrown out,
driven over, put in boot of a vehicle. Specify the vehicle
(for example, car, train, truck, van, bakkie, hippo,
casspir).
131
Chapter Four: The South African Truth and Reconciliation Commission
Illegal and forcible ABDUCTION Victim is forcibly and illegally taken away (for example,
abduction kidnapping), but the person is found again, returned or
released.
Disappearance DISAPPEAR Victim is forcibly and illegally taken away and is never seen
again.
Violation after death CORPSE Body of victim was violated after death, for example by
improper burial, body mutilated or burnt or blown up,
funeral restrictions, funeral disruption, anonymous
burial, mass grave .
132
Gerald O'Sullivan
Other type of associated OTHER All other types of associated violations, including released
violation into hostile environment, released into unknown
place, left for dead, rough ride, detention of family or
loved ones.
Petrol bombing PETROLBOMB Severely injured by a burning bottle of petrol. Also called
molotov cocktail.
Teargassed TEARGAS Victim was teargassed, but NOT while in custody (see
Torture).
Theft or stealing THEFT Money or possessions were stolen from the victim.
133
Chapter Four: The South African Truth and Reconciliation Commission
Appendix 3
Lessons Learned
By the editors
Volume and Wide Area Network and Don’t even think of Scope and nature of
complexity of development of own database working without a networking. Whether to
information facilitated work. network. Don’t use outsource software
“standard” human rights development, network, or
software database design
Network software Domain structure of Microsoft Choice of OS calls for Having individuals with
and hardware NT complicated network intense study sufficient experience and skills
management; stability of the to make good judgements;
servers compromised by getting sufficient time and funds
shortcomings of OS to make a considered decision
Security of Security and Internet access Free-standing computers Computer-literate users will be
system can be achieved connected by dial-up to frustrated by the lack of outside
Internet is simple, reliable, e-mail connections
inexpensive way to
provide Internet access
Ownership of Contests of ownership and a Have the persons with Having supporters of this
information high profile can be assured. responsibility for the recommendation in a position to
system electronic information make it happen
systems in a position
reporting directly to the
CEO
Ownership of Users may not take ownership Get the users involved Getting the message across to
data and of data they use until late in early in the project users
information process
Corroboratio, If the system serves several Work to maintain these Easy to say, hard to do.
research, getting purposes with higher political activities despite Stakeholders in the system are
findings profiles, corroboration, distractions in conflict and highest political
research, getting findings will priority may take over
be delayed
Data collection Free-flowing narrative may be Balance these two Prior to some initial data
too slow, rigidly structured requirements to produce a collection, it may be impossible
form may lose context form appropriate to the job to make a good compromise
mission, conditions, and
resources
134
Gerald O'Sullivan
Acts of violation Must be kept to a reasonable Reduce to a reasonable Finding “appropriate method.” At
number number by appropriate TRC, head processors and
method researchers could not reach
consensus until top
management mandated that
consensus be achieved in a
finite time. This approach may
not work in all situations
135
Chapter Four: The South African Truth and Reconciliation Commission
136
Chapter 5
The United Nations Mission for the Verification of Human Rights in
Guatemala: Database Representation
Ken Ward
Introduction
The United Nations Mission for the Verification of Human Rights in Guatemala (MINUGUA)
was created within the framework of the peace negotiations between the government of Guatemala
and the National Revolutionary Union of Guatemala (URNG). In the Comprehensive Agreement on
Human Rights signed on March 29, 1994, the parties asked the Secretary General of the United Na-
tions to establish a mission for the verification of the status of human rights and compliance with
the commitments of the agreement.
On September 20, 1994, one day after the UN General Assembly approved the establishment of
MINUGUA, a technical team was sent to Guatemala to work out the logistical arrangements for the
mission’s installment. This included drafting a handbook on verification methods and the design-
ing training seminars for the international monitors who were to verify the human rights situation in
the country. MINUGUA was formally installed on November 20 and its first regional office was
opened three days later in Guatemala City.
MINUGUA’s mandate was to cooperate with national institutions and entities for the effective
protection and promotion of human rights, sponsor technical cooperation programs, carry out insti-
tution building, and support the judiciary, the prosecutor’s office and governmental human rights
offices. Thus, its central role was monitoring and reporting on human rights violations.
By the time its first report was issued in March 1995, MINUGUA had eight regional offices and
five sub-regional offices and a staff of 211 international members, including 72 UN Volunteers and
30 civilian police observers almost exclusively involved in human rights monitoring. By time the
peace accords were signed and the mandate of the mission expanded to include other aspects of
the accords, approximately 150 members of the mission were involved directly in monitoring human
rights.
In addition to simply monitoring human rights violations, officials in the field offices worked to
prevent human rights violations or intervened to prevent additional violations.
1
The terms case and event are not used synonymously. Several cases might be generated by one event. This
was 1992, early in the development of the AAAS methodology and definitions were not in a state of devel-
opment. Some of this growth in understanding is evident in this paper and others.
137
Chapter Five: The United Nations Mission for the Verification of Human Rights in
Guatemala
Each office used a list control sheet to monitor the status of their cases. It consisted of a basic
table with each row containing the event number, event location, victim name, primary violation,
perpetrator, and the status of the verification.
Methodological Problems
By limiting recording of an event to its “primary” violation (that is, the violation deemed to be
most serious), only one violation will be recorded for a victim suffering several. This is a gross un-
derstatement of the nature of the victimization of the individual and leads to a false view of the
events and distortion of trends. To illustrate this latter problem, consider Table 1, below, repre-
senting the recording of counts of violations in this “one victim-one most serious violation”
schema.
Table 1. Example of the recording of counts and violations in the “one victim-one most
serious violation” schema
Violation June July
Arbitrary Detention 2 0
Torture 1 0
Arbitrary Execution 1 3
From this table, it would seem that the number of victims of Arbitrary Detention declined from
two in June to zero in July. But given the three cases of arbitrary execution that happened in July,
we cannot be sure that this decrease is real. The executed people may have been detained and tor-
tured before they were killed, in which case detention and torture went up in July. Once data have
been coded and represented in this way there is no way to find out what happened in each event.
Also, by separating victims of the same event into different case files, the relationship of the
victims to the same event can be confused or lost. If an event involves many victims, many differ-
ent violations and/or multiple perpetrators, important information on individuals and acts will be
disassociated, hidden in text, or lost altogether.
Thus, when trying to analyze what happened, there is confusion as to what exactly was re-
corded. When the mission talked about human rights “cases,” it was not clear whether a case re-
ferred to a single human rights violation against one victim, an event with many violations with
only one violation having been recorded or one victim in an event where there were many victims.
All of these interpretations are equally possible.
A second problem with recording only a primary violation involves information management.
Since the functions of the field offices include prevention and intervention, in depth knowledge of
the human rights situation in a particular region is essential for effective results. However, for any
one person to understand the case history of an office that person would need either personal
knowledge of the caseload or knowledge gained by extensive reading of individual files. For a new
member of the human rights team to determine if a perpetrator had a history of committing viola-
tions or to determine if an individual had suffered previous attacks, it would be necessary to con-
sult individual office members or review each case file from memory or individual notes, a time-
consuming and arduous task.
Thus, the primary source of information about the connections between cases and events was
the individual employee, who depended on memory or personal notes. At best, this is a poor solu-
tion to the problem. However, at MINUGUA, it was compounded by the continuing rotation of
personnel in the regional offices. Police observers were usually assigned to the mission for only six
months. UN volunteers rotated from one office to another after six months to a year.
Analysis of trends and patterns of violations were equally difficult without personal knowl-
edge of each case. This problem was even more pronounced in the main office where verification
officers worked from case summaries and lists sent from the regional offices. The consolidation of
cases from several regional offices increased the workload for the individual verification officers at
MINUGUA’s headquarters and made it harder for them to extract hidden details of cases.
138
Ken Ward
A Report Example
In March 1995 MINUGUA presented its first report to the Secretary General of the United Na-
tions on human rights in Guatemala, including anecdotal cases of human rights violations and a
table representing 288 cases of reported human rights violations admitted for verification, classified
by violation.
A footnote in the table explained that when there was more then one violation per case, only
the “most serious” was considered although “most serious” was not defined. See for example, Ap-
pendix 1, Table 1.
The table of the number of cases gave a possibly misleading impression of relative importance
of each violation (as measured by rate of occurrence). For example: cases where the violation
against the right to life (extra judicial killings, tentative killings, and death threats) was the primary
violation represented 37% of all cases accepted. Cases where the primary violation was reported as
violations against physical integrity represented only 23% and cases of personal liberty only 12%.
There may have been a great many cases of personal liberty violations that were not deemed “pri-
mary violations” occurring in cases where right to life was the only recorded violation. A ranking
based on the primary violation of the case might then lead to distorted understanding of the human
rights situation.2
2
The mission’s periodic human rights report continued this format until its November 1996 report. In May
1995, Patrick Ball was employed as a consultant to MINUGUA and helped to change the ways in which
these data were recorded and reported.
3
I was not fully informed of the managerial decision-making process.
139
Chapter Five: The United Nations Mission for the Verification of Human Rights in
Guatemala
The simplified model used for the first year and a half could not represent this complexity; it
reduced a case to one violation, one victim, and one perpetrator. However, most human rights
cases or events are complex collections of one or more violations or acts, suffered by one or more
victims, possibly at the hands of one or more perpetrators. In addition, it is possible that in each
event not every victim suffers the same series of violations and not every perpetrator commits each
violent act. I designed the new database to represent this complex structure of human rights cases
and preserve information relating to the number of victims, acts and perpetrators. By using this
structure, it would later be possible to recreate exactly who suffered what violation and who com-
mitted that violation.
In addition, a person’s role at the time of a violation (victim, witness, or perpetrator) is not part
of who that person is; rather, it reflects his/her place in a violation at a specific time. This was pos-
sible even though a person could be a victim in one human rights violation event, a witness in a
subsequent event and a perpetrator in another. Therefore, the database represented individuals not
as victims or perpetrators but rather as members of the list of all people who are in some way asso-
ciated with human rights cases. Personal information on each individual was stored in the person’s
record, such as name, date of birth, ethnicity, etc. References linked the individuals to the roles
played in each event. This structure allowed for accurately counting exactly how many victims of
violations there were and permitted the analysis of patterns of behavior, for example, of a public
prosecutor that is repeatedly involved in obstruction of justice cases.
140
Ken Ward
the user finished entering the personal information and selected an Accept button closing
this window and revealing the overlaid Case Window.
• The user selected a button for acts to add victims, their associated violations and alleged
perpetrators.
• The same Person Window was then superimposed on the Case Window, but this time the
title would specify that a victim was being added. Again, individuals could first be
searched for in the persons lookup list or the user entered their information if they were
new to the system. Selecting Accept created the relationship between this individual and
their role as victim in this event.
• Since an act requires a victim, a violation, and a perpetrator, the following step would not
return to the Case window but lead the user to a third window where a perpetrator (or per-
petrators) and a violation (or violations) could be selected. Adding perpetrator(s) followed
the same process as before and an individual or an institution (only possible in this case)
is defined or selected. Once again, when the user accepted the perpetrator(s), their role in
the event is established. Violations were selected from a control lookup list of possible
violations. After specifying all pertinent information, the user selected the Accept button
and the system created the relationships among victims, violations and perpetrators.
• Additional fields on the main case window allowed the users to add text for qualitative
case follow up and analysis.
We completed the design of the database in November 1995, and users started installation and
training in each of the 13 field offices. By the end of January 1996, every office had incorporated its
prior caseload (created since the beginning of the mission) and added new cases as they arrived.
Once a month at first, and later, every two weeks, the information was transmitted to the head office
using electronic mail. There it would be consolidated with that of the other regional offices. To as-
sure confidentiality the information was encrypted prior to transmission using Pretty Good Privacy
(PGP) public key encryption software with keys of 1024 bits.
In December 1996, the mission hired a UN volunteer to work full time on maintaining and modi-
fying the database system. This person was also in charge of producing statistical tables and lists
used by the verification officers in the Human Rights Division and other areas for analysis, creating
a standard list of statistical reports and performing ad hoc queries for data. These results were pro-
duced as hard copy and given to the reques ting party.
141
Chapter Five: The United Nations Mission for the Verification of Human Rights in
Guatemala
counts for only 14 cases, a greater than four to one ratio of killings to detention. But if we look at
the number of reported violations of extra-judicial killings compared to the number of reported
violations of arbitrary detention we see the ratio is almost one to one (69 to 66). If we look at con-
firmed violations, arbitrary detention out ranks extra-judicial killings by three to one (18 to 6).
From my preceding review of the developmental process, it is clear how different database rep-
resentations may lead to different views of reality. I feel that the original approach of counting only
one violation per case presented a misleadingly simplistic view of the human rights situation in
Guatemala. It is important to carry out the database structuring correctly, as the findings are dra-
matically affected by the nature of the system. Of course, in human rights situations it is hard to
know exactly what the nature of reality is until data collection has been in process. Ideally, the da-
tabase designers will create a design that is flexible and robust so that it can deal with changes as
the project proceeds.
4
When Ríos Montt took power, he expanded the civic action aspects of the counterinsurgency efforts, in-
cluding the peasant militias, under the name “civilian self-defense patrols.” During the subsequent transition
to civilian government, the army changed the name to “Voluntary Civilian Defense Committees” and renamed
local comandantes as “committee presidents.”
142
Ken Ward
Lessons Learned
Carrying out full A relational database The establishment of Will the initial planning, often
analysis of large- is needed. such a database into the dominated by legal and political
scale human rights mission should be an parties, have the knowledge and
violations essential part of the understanding of the need for and
commission’s activities. requirements of such a database?
Design and Without the self- Incorporate database Does achieving this
implementation of directed proposal of a needs into the initial recommendation depend on the
database volunteer, it is planning for the project. presence on the commission of an
uncertain whether or Do not depend on chance advocate for such a database?
when the database events, such as the If no knowledgeable persons are
would have been possibility that someone part of the managerial and
ready. on the staff will have the administrative structure, can this
skills and volunteer to do recommendation be achieved?
the work.
Data structure and Don’t use the Follow the guidance in Database designers need to be
unit of analysis structure, “one victim- Ball (1994) for the data familiar with the rationale discussed
one most serious structures based on who in these proceedings. Will they?
violation.” did what to whom. Law enforcement often uses the
“one victim-one most serious
violation” method; users may not
realize the implications in a human
rights situation.
Conclusions
For the year and a half prior to the implementation of a violations database MINUGUA had
only the capacity to draw broad conclusions about the human rights situation in Guatemala. Re-
ports to the Secretary General of the United Nations — MINUGUA’s official evaluation of the hu-
man rights situation in the country —relied almost exclusively on anecdotal evidence. The design
and implementation of a large-scale relational database has changed that situation.
The implementation of a database allowed the mission to present a more profound analysis of
trends and patterns of violations. The violations database has also allowed the mission to con-
cretely signal government noncompliance of its commitments as in the case with the National Civil
Police Academy and has allowed the fluid interchange of information between previously isolated
regional offices.
A final note: As shown in this paper, the implementation of MINUGUA’s violation database
was ad hoc. Such a database was not incorporated into initial planning and apparently its impor-
tance was not understood by decision-makers until after a year and a half of operational experience.
Even then, but for the availability and willingness of a skilled volunteer on the staff, we can only
guess how much longer it would have taken to undertake a design and implementation project.
MINUGUA could have made better-supported, stronger arguments at a much earlier time, exploit-
ing the wealth of information collected by a large team if a relational database system had been
planned and implemented from the start of the project.
143
Chapter Five: The United Nations Mission for the Verification of Human Rights in
Guatemala
Appendix I
Table 1. Second Report to the Secretary General of the United Nations, August 1995,
Complaints admitted by category of presumed violations*
Right to Life
Extrajudicial execution or death in violation of judicial guarantees 54
Tentative extrajudicial execution 25
Death threat 146
Total 225
Political Rights 2
Total 570
(*) The number of complaints by right violated may change during the verification process
144
Ken Ward
Table 2. Fifth Report to the Secretary General of the United Nations, September 1996
Complaints admitted*
Number of violations
Violations verified
Violations proven
Right to Life
Extrajudicial execution or death in violation of judicial guarantees 61 69 13 6
Tentative extrajudicial execution 19 54 42 39
Death threat 101 267 91 53
Total 181 390 146 98
Political Rights 3 4 3 2
145
Chapter Five: The United Nations Mission for the Verification of Human Rights in
Guatemala
146
Ken Ward
References
Ball, Patrick, Ricardo Cifuentes, Judith Dueck, Romilly Gregory, Daniel Salcedo, and Carlos Saldar-
riaga. 1994. A Definition of Database Design Standards for Human Rights Agencies. Wash-
ington, DC: American Association for the Advancement of Science and Human Rights Infor-
mation and Documentation Systems International.
United Nations. 1996 “Report of the Director Of The United Nations Mission for the Verification of
Human Rights and of Compliance with the Commitments of the Comprehensive Agreement on
Human Rights in Guatemala (MINUGUA).” 4th Report: UN Doc. A/50/878, 24 Feb. 1996
United Nations. 1996.--. 5th Report: UN Doc. A/51/1006, 19 Jul. 1996
United Nations, 1997.--. 6th Report: UN Doc. A/51/790, 31 Jan. 1997
147
Chapter Five: The United Nations Mission for the Verification of Human Rights in
Guatemala
148
Ken Ward
149
Chapter 6
The Recovery of Historical Memory Project of the Human Rights
Office of the Archbishop of Guatemala: Data Processing, Database
Representation
Oliver Mazariegos
Introduction
The REMHI (Recovery of Historical Memory) project in Guatemala originated at the Human
Rights Office of the Archbishop of Guatemala (ODHAG), when the peace agreement negotiated
by the Guatemalan government and the Guatemalan National Revolutionary Union (URNG) ap-
proved the creation of the Historical Clarification Commission (CEH). The mission of the CEH
was to investigate crimes of the 36-year history of armed conflict.
The draft agreement allotted a working time of six months to one year for the CEH investiga-
tion. ODHAG was concerned about this limited amount of time for the CEH to operate. Familiar
with the experience of El Salvador, ODHAG knew the difficulty of gathering evidence in such a
limited time. They recognized the need for an in-depth investigation and preparation of a database
that could be transferred to the CEH, and set up REMHI. The REMHI project was to provide a
reconstruction of the country’s history from the victims’ perspective, not just supply a series of
unprocessed lists and statistics to transfer to the CEH.
The concept of the task is what differentiates REMHI’s work from other, similar organiza-
tions. REMHI’s purpose was not to attempt to reveal or interpret the history, but to arrange and
describe it through the voices of the very victims who, after all, had the best knowledge of the
truth.
This project was conceived and initiated by Bishop Juan Gerardi Conedera at the end of 1984
and was communicated to the rest of the bishops in the country with the intent that it would be
adopted by the Episcopal Conference in toto. The Episcopal Conference of Guatemala decided
that each bishop should individually choose whether to carry through the proposed work in his
own diocese. Accordingly, work on the project started on April 1, 1995, as the coordinated effort
of ten of the eleven dioceses in Guatemala.
REMHI’s work is defined as “interdiocesian” because it was the result of the dioceses’ coor-
dination and it is precisely from their involvement, commitment and especially their “taking own-
ership” of the project, that the project developed and enhanced its activities.
The project was therefore conceived not only as a contribution to the peace process, but also
as a factor in the reconciliation and reconstruction of the social fabric. This is why a fourth phase
known as “the return” was added to the initial three phases of the project (preparation, collection
of testimonies, and analysis).
This fourth phase is the principal contribution that the project can give to assist in the recon-
struction of the Guatemalan social fabric, for it started its work by listening to the demands and
proposals of the people interviewed. The return phase continues at the time of this writing (mid-
1999).
Work Methodology
The Human Rights Office of the Archbishop of Guatemala set up a work team whose function
was to establish the necessary foundations to complete the proposed work.
This team — known as the Central Team — drafted an outline of the work methodology and
completed the first project phase: preparation. The diocesan bishops designated trustworthy people
to coordinate the work in their respective dioceses; they were the counterpart of the Central Team
for work in the countryside.
Throughout the preparation phase, the Central Team outlined the work methodology. The
diocesian coordinators, expanding the proposals presented by the Central Team completed these
plans.
151
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
General Coordination
Team
Diocese: Diocese: …
Diocesan Coordinator Diocesan Coordinator
Sources of Information
Direct interviews are the basis of the information used by the REMHI project. In addition,
data were obtained by analyzing the print media (“journalistic monitoring”) from 1960 to 1996,
case studies (civilian defense patrols, women, etc.), interviews with key informants (perpetrators
and experts on related subjects), declassified information provided by the National Security
Agency (NSA), and a series of studies known as monographs. The latter were documents covering
investigations the leading actors of the internal armed conflict (the church, guerrillas, etc.).
To complete the interview information, the experts used monographs as a starting point, and
used journalistic analyses to obtain information on context; informants filled in any gaps.
152
Oliver Mazariegos
Interview Database
One of the most important and difficult steps at the start of the project was defining what was
expected of the information system, since the response to this question had implications that
would influence the total development of the project.
The main definitional difficulty was articulating the project objective. According to the origi-
nal conception of the project, we were to assemble a database with statistical and documentary
aims, which could quickly transfer information to the CEH.
I explain the design of the database as having three principal phases, in chronological order:
1. Specification of the interview form
2. Creation of the database
3. Analysis of the first interviews
153
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
In addition, we made a series of practical changes to improve the interview form. These in-
cluded using larger letters and more readable fonts, providing ample writing spaces for organizers
with little aptitude for writing, numbering the forms, identifying the different sections included in
the form (deponent, victim, cover page, etc.) with different colors, etc. With prior authorization of
the deponent, a tape recorder could be used to record the complete interview.
We also included specific spaces in the form to record information relating to coding. The
purpose of this provision was to facilitate data processing.
The final version of the interview form is shown in Appendix 1.
Victim
Perpetrator
Act
Deponent
We recognized that the primary goal of the database was statistical documentation. To serve
this goal, we attempted to classify the greatest possible amount of information: sociodemographic
data, individual data, information relating to time and space, etc. The main challenge in this proc-
ess was to break down the information to a level that would make possible the reconstruction of
the facts.
Logical schema
Our concept of the appropriate information methodology was based on the following logical
principles: There are three actors: one victim, one perpetrator, and one deponent (on whom we
rely). These three people are related to each other by one act, the violation.
These parts (or roles) that individuals play cannot be fixed nor are they exclusive. The depo-
nent can be the victim in another violation, or the victim or the perpetrator can be the same depo-
nent, etc. Besides, the result of counting of these four units of information can be zero in the per-
petrator’s case, or multiple, since in a violent act there can be various victims, various perpetra-
tors, different deponents or various abuses.
This large number of possible combinations was the main complication in the design. It led to
a series of questions that were difficult to resolve. At first we required that the database tell us
who did what to whom, and in addition, who reported this information. This requirement greatly
complicated counting the actors, since the greater the breakdown we tried to achieve, the more
complicated it was to maintain a structure (links) that would permit us to reconstruct the facts from
the systematized information.
We confronted such dilemmas as how to create a database that in addition to showing the vic-
tims and cases would tell us exactly what the deponents reported. Thus, we tried to create a data-
base that could relate what a certain deponent stated and who the deponent identified. In case an-
other deponent mentioned another victim or other victims later, it was necessary to know the level
of overlap that the interviews presented in order to affirm that deponent x mentioned victim m
while deponent z mentioned victim m, and also deponent n.
At the level of database design this situation could have been easily resolved. However, such
a solution would have complicated inputting the information to a database and in the long run
154
Oliver Mazariegos
would have been impractical. In view of this situation, we settled on only maintaining the link by
case. Hence, we would know who were the perpetrators, the victims and the deponents.
As a result of this reasoning, the structure represented above has the case at its center. This is
the linking mechanism for the three actors (victim, perpetrator, and deponent) to achieve the goal
of indicating who did what to whom, and furthermore who told us. From this point we can choose
the most convenient unit of analysis, which could be the interview, victim, violation, victim per
violation, etc. The important thing was that the database should not limit this choice so that we
could make a final decision later, since we were not set on any of the three choices from the start.
Data structure
For the definition of a fact, we considered that a fact could contain different violations, each
one with its own respective data (date, place, responsible force, etc.) that bear a close relation such
as causality, context, etc. This definition, similar to what Patrick Ball defines as context, is what
permits us to differentiate a series of violations committed together against one or various victims
from another series of violations committed independently one by one. It is what permits us to
maintain the relation in a disappearance-torture-murder modus operandi and differentiate an act of
torture and murder performed on one victim but carried out in a different context.
Below is a schematic representation of the data structure. We explain it, working from left to
right:
155
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
156
Oliver Mazariegos
Data Processing
With the final design of the interview forms and the first database design completed, we
started our work on the processing of data. This function was supposed to last approximately three
months. It depended on a work team of five to eight people whose task was to input the data from
the interviews.
Coding
For this work, two major tasks were identified, coding and data-input. Coding was the task of
assigning codes to diverse classifications on which we relied, such as the place of the events, sec-
toral classifications, responsible forces, etc. Data-input was the task of transcribing the forms on
paper to the database system.
Due to (1) the nature of coding, the mechanism designed for data entry, (2) the short period of
time needed for the team to accomplish the work, (3) the status of computer technology at the time
(the beginning of 1995), and (4) the systems analyst’s experience, it was urgent to start the work
as soon as possible. We decided that inputting the information to the database would be done with
a text-based interface and that subsequently we would create a system using a graphic user inter-
face for data query.
The Human Rights Office of the Archbishop of Guatemala at the time relied on a Novell
Netware 3.1 Network Operating System with an Ethernet protocol using star topology installa-
tions. A small computing center was established with four workstations with a topological bus for
cost reasons. To avoid overloading traffic on the network, an additional network card was installed
in the server exclusively for the computing center. The database was developed on FoxPro 2.6 for
DOS, the same as the journalistic database.
157
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
To have a level of specialization that would allow us to determine whether an interview was
related to what was mentioned in another interview, we had the analysts distribute the work by
geographical areas. This helped to determine the exact date of a massacre, for example.
Consequently, the list of victims was maintained separately from the main list, but kept ex-
actly the same structure. However, from the information on which we could draw we managed to
obtain the name, sex, age, and at times, the ethnic group identification of the victim.
Among the new scenarios encountered upon receiving the interviews and entering them into
the database, we noticed that when there was more than one deponent for the same case, we would
come across data that could be either complementary or contradictory. For example, one deponent
might report a number of victims and another deponent gave us a different number. Even worse
were cases where one deponent informed us of a disappeared victim and another deponent men-
tioned the death of the same victim.
Since the project did not investigate or dig deeper into the interviews we received, in many in-
stances we lacked sufficient resources to disqualify an interview. The answer to this dilemma was
that we would have to adapt the database so it was able to store different versions of the same
case.
This decision implied a potential artificial inflation of the statistics. Therefore, at the time of
calculating the statistics we had to make decisions to resolve this problem to avoid biasing the
results. At the level of the database structure we resolved this problem in the following manner:
1. The information was complementary. For example, one deponent is specific about the
date of the violation, but the other deponent is not. We would then modify the violation
previously stored in the database, use the same pattern number, violation number, collec-
tive number and order number, but specify a different interview number.
2. The information was contradictory. We recorded everything anew as if it were its own
case so that in the end, we could group the patterns by victim and decide which of the dif-
ferent versions we would use in the final analysis.
3. The information was neither complementary nor contradictory. Duplication was taken
into account in the creation of statistics and final lists with the aim of not artificially in-
flating the statistics.
It was during the analysis of the first interviews that the analysts and investigators discovered
the great potential of interviews as investigative material. However, until that time we had not
taken measures at the level of the database so that we could recover this information.
The Thesaurus
Since this material was mainly qualitative information, the cost of incorporating it in the data-
base made it an almost impossible task. We therefore created the Thesaurus, which was a list of
keywords identified by project investigators. The words dealt with subjects such as the modus op-
erandi, effects on victims and their families and communities, demands, proposals, cultural ques-
tions, ethnic issues, etc.
The Thesaurus was initially proposed by investigators according to subject —religion, perpe-
trators, effects, demands, etc. — and throughout its use was enhanced by the information proc-
essing team. The Thesaurus is summarized in Appendix 5.
The Thesaurus-based system was the tool on which investigators depended when maximizing
the narrative capacity of those interviewed. In this way we hoped to conduct a detailed investiga-
tion (the individual effects on women in a certain region in the western part of the country, for
example) that would cross the base information regarding violations with Thesaurus keys to obtain
a list of interviews mentioning the subject. Thus, our conceptualization of information would re-
main as is specified in the following figure:
158
Oliver Mazariegos
Victim Perpe-
trator Thesau-
Act
rus
Depo-
TESTIMO
nent
NY
This new complexity and revised use of the database created the need for an interface for the
database that should be easy to use. The new interface would allow investigators to perform refe r-
ence and cross checks in the database. We developed this new interface with FoxPro 2.6, in a
Windows 3.11 graphic environment.
Data Input
The data processing team (coders) had to carry out tasks and develop methods that had
not at first been contemplated. Among the most important were transcribing the interviews, in
some cases, six hours in length. This called for analyzing the Thesaurus, interacting with analysts,
and discussing the parameters and policies that guided how decisions were made (such as the case
of the difference between a disappearance and a forced disappearance). Inputting information to
the database was a process that ultimately involved 18 people and took 20 months.
Once information input to the database was complete, we created cleaning processes to
reduce duplications in the database. We did this even though from the beginning, the computerized
system indicated the actors whose first name and surname coincided with data that was specified
at the time the information was inputted.
To calculate descriptive statistics, we exported the database to Excel and through pivot tables
(dynamic crosstabulations) we were able to perform most calculations and create desired charts.
Lessons Learned
Problem Solution Issues
Lack of uniformity. The decisions were made by the coding team, which Sometimes, the discussions seemed
Everyone did not took into consideration the opinions of all project annoying and tedious, but in the end
always understand personnel. Important decisions concerned the were perceived as helpful.
the policies and take violation type, use of thesaurus, classifications.
similar actions in The quality and profile of the coding
similar situations. Internal workshops to structure the discussion, team is an important factor in success
training in different aspects (gender, ethnic affairs, of the discussions. The coding team
etc.) and sharing of experiences. was the key source for every detail.
159
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
Finding qualitative Set up and used keywords (Thesaurus). The elaboration of the thesaurus is
information. It was sensitive work; anything not specified
hard to find in it will be untraceable.
qualitative
information
integrated into 6,000
interviews.
Mixed violations. Don’t do it again! Treat massacres just like all other
violations.
Horror of codifying. Any code used in the database should have a zero
value option.
Control of existing Even though the input of a whole interview can take a This practice proved to be helpful for
work. long time, the input of general information about the other purposes such as control of flow,
interview itself is a task that consumes little time. distribution of work, interviews
Thus, every time interviews were received, the coding tracking, etc. When controlling the
team inputted into the database the id # of the development of the activities of the
interview, and some general information (date, place coding team, it’s easy to know how
of interview, and so forth) much has the team done, but hard to
know how much is left. All you can do
is to make an estimate.
Fatigue, emotional Workshops to discuss these issues. Be creative. Working in data entry in a database
issues. Don’t ignore this issue! that deals with human rights violations
means more than keypunching. The
“key- punchers” are people who must
deal with atrocities and horrors, the
pain of others, etc.
Where to start in Read Ball, Who did What to Whom, Washington: Don’t try to re-invent the wheel, find
database design. AAAS (1996). out what has already been done.
Incomplete Build a system capable of managing incomplete data. Try always for the highest level of
information. completeness of data. However, when
We made printed forms for the victims of massacre. working with this type of information
Since the original forms used one sheet per victim (from a period of 5-35 years), it is
and most of the data was missing, we made a special certain that much of the data will be
form for listing the victims, their names, gender, date incomplete and imprecise, especially
of birth and ethnic group. dates.
Dispersion of Log decisions, so you can gather all the decisions in It is impractical and not advisable to
decision-making. the data analysis phase. centralize the decision-making
process in one person. The process of
decision-making, is carried on
throughout the course of the project
and is distributed in space, time and
throughout the organization. An
inevitable risk that must be dealt with.
Success of the Those individuals who worked in more operational Facing the fact that preparation,
project tasks (interviewers, encoders, etc.) are the best capacity and experience of the people
source of evaluations, ideas and understanding of who design, structure and direct the
how to make the project a success. project is necessary, but not sufficient.
160
Oliver Mazariegos
Appendix 1
Interview forms
Cover page
Interview number five digits
Location of interview
Location of violations
Victim
Interview number five digits
Surname(s)
Given name(s)
Pregnant? Yes, No
161
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
Document number
Expires in
Age count
(Year)
Place name
Municipality
Department
Country
Of which groups a member? (Political, Category of group, name of group, dates, duties
military, social, community, trade union,
refugee, displaced person, etc.)
Comments
1
Caserio is a smaller division than a village. Several Caserios comprise a village.
162
Oliver Mazariegos
Summary
In narrative form, answers to the following questions:
1. Who was the victim?
Perpetrator
Interview number five digits
Surname(s)
Given name(s)
Document number
Expires in
Age count
(Year)
Plot
House
Place name
163
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
Municipality
Department
Country
Who was responsible for these plus two digit code plus one digit code
violations?
Comments
164
Oliver Mazariegos
Deponent
Interview number five digits
Surname(s)
Given name(s)
Document number
Expires in
Age count
Date dd/mm/yy
165
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
Comments
166
Oliver Mazariegos
Appendix 2
167
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
Appendix 3
168
Oliver Mazariegos
Appendix 4
169
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
Appendix 5
Thesaurus
Types of Violations
1. Death caused by:
1.1 Extrajudicial execution
1.2 Indiscriminate attack
1.3 Bomb
1.4 Artillery
1.5 Explosives
1.6 Mines
1.7 Crossfire
1.8 Other
2. Death resulting from Persecution:
2.1 Suicide
2.2 Hunger
2.3 Illness
2.4 Accident
2.5 Other
3. Forced Disappearance:
3.1 No Reappearance
3.2 Reappeared Alive
3.3 Reappeared Dead Date of Reappearance: ___/___/___
3.4 Unknown
4. Disappearance:
4.1 No Reappearance
4.2 Reappeared Alive
4.3 Reappeared Dead Date of Reappearance: ___/___/___
4.4 Unknown
5. Forced Detention
6. Torture:
6.1 Cruel and inhumane treatment
6.2 Torture
7. Sexual Violation
8. Attack against personal integrity with injury:
8.1 Knives, etc.
8.2 Firearm
8.3 Bomb
8.4 Artillery
8.5 Explosives/Mines
8.6 Other
9. Attack against an institution or group with damage:
9.1 Firearms
9.2 Bomb
9.3 Artillery
9.4 Explosives/Mines
9.5 Other
10. Threats against people:
10.1 Bomb Alarm
10.2 Death Threat
10.3 Intimidation
10.4 Other
11. Threats against an institution or group:
11.1 Bomb Alarm
170
Oliver Mazariegos
Responsible Forces
1. Army
1.1 EMP: General Presidential Staff (or Estado Mayor Presidencial)
1.2 DSP: Office of Presidential Security (or Dirección de Seguridad Presidencial)
(Archive)
1.3 Presidential Guard
1.4 G-2 Place
1.5 S-5 Place
1.6 Kaibiles 2 Place
1.7 Traveling Military Police
1.8 Specialists Place
1.9 Ministry of Defense
1.10 General Defense Staff
1.11 Air Force
1.12 Brigade
1.13 Military Zone
1.14 Military Base
1.15 Special Command
1.16 Outpost
1.17 Other
2. Police
2.1 National Police
Section
Station
Substation
2.2 Special Command
2.3 National Guard
2.4 Municipal Police Place
2.5 Judicial Police Place
2.6 Other
3. Combined Forces
4. Irregular Forces
4.1 Commissioned Soldiers Place
4.2 PAC: Civilian Self-Defense Patrols Place
5. Death Squads
5.1 Mano Blanco
5.2 ESA: Secret Anticommunist Army (Ejército Secreto Anticomunista)
5.3 NOA: New Anticommunist Organization (Nueva Organización Anticomunista)
5.4 JJ: Avenging Jaguar (Jaguar Justiciero)
5.5 Other
6. Insurgent Forces
6.1 EGP: Guerrilla Army of the Poor
6.2 ORPA: Organization of the People in Arms
6.3 FAR: Armed Rebel Forces
6.4 PGT: Guatemalan Workers’ Party
6.5 Unitary Front
6.6 URNG
6.7 Other
2
Special Task Force
171
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
7. Unidentified
7.1 Civilian clothed
7.2 Uniformed
7.3 Unknown
8. Mayors, Farmers, Private security forces, etc.
9. Others
Types of Responsibility
1. Lone vigil (patrolling or similar)
2. Participation in violation of Physical Integrity (executing, torturing, etc.)
3. Participation in violation against Property (burning houses, destroying crops, etc.)
4. Intellectual Responsibility (commanding)
5. Collaborator
172
Oliver Mazariegos
173
Chapter Six: Project of the Human Rights Office of the Archbishop of Guatemala
174
Chapter 7
The International Center for Human Rights Investigations: Generating
Analytical Reports
Herbert F. Spirer
Introduction
In this case study, I review my work in conducting the analysis of the data and generating the
graphs and tables for the joint International Center for Human Rights Investigations (CIIDH) and
American Association for the Advancement of Science (AAAS) report on Guatemala (Ball, Kobrak
and Spirer, 1999). The purpose of the report was to use statistics in conjunction with historical
analysis to tell the story of state violence in Guatemala from 1960 to 1996. The published report of
154 pages contains 42 graphs, 9 tables, and numerous statistics appearing in a text that largely re-
flects the information in the figures. Despite the small number of graphs and tables in the final re-
port, it was informed by many hundreds of figures, analyses, and statistics, created over a nine-
month period.
I give a summary of the lessons learned from this work, and make recommendations to help
others working on similar projects. The project organization, analytical tools, and working relation-
ships used on this project are generally related to those used by industrial analysts. In view of the
growing use of large-scale datasets in human rights, I expect that with time the human rights field
will develop its own approaches to data analysis. This paper is intended to be a contribution to
that developmental process.
The statistical methodology (described below) used in the CIIDH/AAAS Guatemala project
work is straightforward and well established; neither sophisticated nor novel methods were used.
Because of the need to maintain the highest standards of credibility, the dominant issue in the sta-
tistical analysis was the avoidance of error and control of the process of generation and use of
analyses. For that reason, my focus in this case study is to show how we met that need.
I believe that we were effective in meeting the standard of credibility necessary for a human
rights report that established state responsibility for political violence. Other workers in this field
should be able to use knowledge of this case study to achieve the same standard, and may do so
more efficiently. In the section Lessons Learned, I review the lessons learned on this project, make
recommendations, and discuss how those lessons could be applied in future projects.
There was considerable similarity between the process of generating analytical reports for the
CIIDH and CEH projects as carried out by the analysts. Accordingly, the analysts for these two
projects, Eva Scheibreithner and myself, jointly prepared recommendations for future large-scale
human rights data analysis. These recommendations appear in Appendix 1 of Chapter 10, Data
Analysis Recommendations.
Preliminaries
175
Chapter Seven: The International Center for Human Rights Investigations
p. v.) EDA uses the statistical measures of descriptive statistics, and in addition a number of other
methods that involve creative analysis and interpretation. These methods include tabulations and
crosstabulations, time series plots, scatterplots, transformations of variables, autocorrelation
analysis, regression analysis, difference analyses, and others. Generalizations drawn from EDA
may be extended to a larger universe, but cannot be given meaning in terms of mathematical prob-
ability.
Inferential statistics
In statistical inference, the analyst generalizes from sample data to make probability-based
statements about the larger universe from which the data were obtained. These probability state-
ments are usually expressed in hypothesis test results or as confidence intervals. For example, in
the CIIDH report, Appendix 5, Monthly Seasonal Variation Analysis, I use a hypothesis test to
infer that the observed monthly seasonal pattern of killings and disappearances is essentially cer-
tain to have been caused by an organized plan.
The Data
The CIIDH database is a relational database consisting of cases culled from press sources,
documentary, and direct testimonies. CIIDH team members collected over 10,000 cases from news-
papers, by reading every newspaper published during the 36-year period of armed conflict in Gu a-
temala. Four thousand additional cases came from documentary sources such as the archives of
several Guatemalan non-governmental organizations and the publications of the Justice and Peace
Committee of the Guatemalan Church in Exile. Members of the CIIDH team directly collected over
5,000 testimonies for inclusion in the database.
We define a case as the information given by a single source (press report, or interview, or
document) concerning violations that are reported as having happened at a particular time and
place. “Violations” are instances of violence, including killings, disappearances, torture, kidnap-
ping, and injury. “Victims” are people who suffer violations. A case may be simple (one victim who
suffered one violation) or complex (many victims, each of whom suffered many different violations).
In the CIIDH analyses, the unit of analysis is almost always the violation.
The basic data with which I worked were contained in four flat datasets (two-dimensional ta-
bles of information without established relationships to other tables), each with variables chosen
from a common set of variables. Complex Structured Query Language1 (SQL) queries and extensive
programming produced these datasets with variables selectively chosen from the listing of vari-
ables shown in the data dictionary of Appendix 1. Unfortunately, the variables did not keep the
same definitions in all data sets.
The four basic datasets were denoted by ctanon, ctnmd, rtanon, and rtnmd as indicated in Ap-
pendix 1. In this terminology, the prefix “ct” denotes complete, in that these are the data net of
overlaps among data sources (interviews, documents, and periodicals). The prefix “rt” denotes
reduced, in which the source categories “other” and “non-CIIDH interviews” were folded into the
“documents” category.
The suffix “anon” indicates that the dataset consists of both anonymous and named viola-
tions for which victim identification exists, and the suffix “nmd” indicates that the dataset consists
only of precisely named violations.
I also worked with four additional datasets in which only killings appear with additional vari-
ables to describe the nature of the killings and the size of the group in which they occurred. These
datasets carry the additional suffix “k”. All datasets were followed by “v” with an integer (1, 2, 3,…)
suffix to indicate the version of the dataset. By the completion of the report, the version number
had reached 16. Table 1 is a summary of the datasets.
Table 1. Datasets.
Data set name Description
1
Structured Query Language is a computer language used to retrieve, update, and manage data.
176
Herbert F. Spirer
177
Chapter Seven: The International Center for Human Rights Investigations
tures that were not anticipated, calling for a recoding or revised query of the dataset. Conse-
quently, transitions were frequent.
For this project, the general sequence of transitions was as is shown below in steps 1-12:
1. Patrick Ball (PB) creates a file.
2. PB transmits the file to me (HS) in a .dbf or .xls format, whichever was convenient for one
or both of us as an e-mail attachment. We had to use both formats because of initial con-
flicts in Excel versions. My actions would then be to:
3. Download the file in native form and archive it.
4. Convert the file to Excel format and archive it.
5. Make a working copy of the file.
6. Filter, reorder, consolidate, summarize, and otherwise manipulate the data to facilitate a
desired analysis.
7. Perform the desired analyses.
8. Transmit the results to PB.
9. Create the graphs.
10. Transmit the graphs to PB.
11. Revise the graphs in accordance with format and analytical needs through joint exchanges
with PB and Paul Kobrak (PK).
12. Transmit the graphs as attachments by e-mail.
The likelihood and form of the data integrity challenge at a transition is dependent on the tran-
sition and the circumstances. For example, I cannot recall an instance in which we found an error
resulting from download transmission or format conversion (2-5, 8, and 12, above). However, I only
know that these transitions were error-free because I was checking the results. I had many errors –
often minor -- develop in the other transitions, which were detected and corrected. Our concern for
even minor errors was to avoid the possibility of any negative effect on our credibility.
There were also challenges to the integrity of our results that relate to handling the data. For
example, Excel apparently has internal instabilities, or as yet undocumented capacity limitations. On
a number of occasions I returned to a workbook several days or weeks after creating graphs and
found that the graph had disappeared or that formatting features were altered. I never had this
problem in a small worksheet. Archiving the original data and any revised datasets that entered
into analysis is essential. However, this action is another transition where the integrity of the da-
taset itself is in jeopardy from the failure to archive the latest version, or the inadvertent deletion of
a file.
Throughout the process described above, I carried out different levels of checking, as I judged
appropriate, as discussed in the next section.
Verification Methods
My approach to verification is based on applying descriptive statistical methods to the dataset
or pair of datasets (before a transition and after a transition). By definition, summary statistics re-
duce the information content of the data to facilitate an understanding of the whole set. I show the
descriptive statistical measures used for numerical and categorical variables listed in Table 2.
178
Herbert F. Spirer
Crosstabulation Crosstabulation
Mean
Median
Sum
For a single dataset, I look for reasonableness in the values. For example, if a dataset contains
the variable SEXO for gender, a one-way tabulation should show some number of males and fe-
males, which may be coded “m” and “f”. What other value might reasonably appear in the tabula-
tion? If there has been agreement on the representation of unknown gender values as d (for
desconicido), then we expect some number of d’s to appear in a one-way tabulation. If I find no d
values but a number of –1’s, then I suspect that there may have been a change in the assignment
of unknown values in this dataset. Of course, this would have to be reconciled.
But if both –1’s and d’s appear, then something is seriously wrong. It may be miscoding or a
more fundamental problem. Or perhaps, the tabulation includes blanks. What might be signified by
a blank, a missing value that was not properly coded or entered, an input error, or a blank record
(which may reflect a serious error)?
With two datasets – one before and one after – I look for a reasonable comparison in the val-
ues. If there are two datasets, and the second is one in which records have been removed from the
first dataset described above, then only m, f, or the missing data value should appear, and in no
case with a higher count than in the first set.
Extreme values of numerical variables (maximum, minimum) can be a symptom of a problem. If
there are a large number of numerical values, a one-way tabulation is usually more confusing than
revealing. Extreme values may be outliers in the sense that they either are unreasonable or differ
greatly from the normal range of deviations. For example, although –1 might be used as a missing
value indicator for ages, what do we make of a –2 also appearing in the dataset? Is a maximum age
of observation of 95 an error?
Comparison of the median and mean values is a quick way to determine skewness of the distri-
bution of numerical data. To carry out this comparison, the analyst needs a sense of what the dis-
tribution of the data is, or should be, or how it would be changed by some transitional step using
before and after comparisons.
The sum of columns is a simple check and it is easy in Excel to maintain sums of numerical
fields at the bottom of the dataset. I monitored sums and record counts on a continual basis while
working with a particular dataset. Using the sum on a continuing basis is a process that has its own
problems, because of the automatic selection of data by Excel for certain procedures, and my own
errors.
Many of the desired analyses are crosstabulations, and in themselves provide a basis for
checking the dataset integrity. While I infrequently made crosstabulations as a check on a dataset,
I almost invariably compared marginal totals in crosstabulations to the values produced by inde-
pendent one-way tabulations.
It is tempting to think of automating these checks and verifications to reduce the dependency
on human intervention. Without automation, some person has to make a conscious effort to carry
out the check. But with automation, you may have another source of errors and lose the judgmental
insight that can only come from knowledge of the data and what its attributes should be, or are
179
Chapter Seven: The International Center for Human Rights Investigations
most likely to be. Accordingly, during the project, I used only the Excel built-in functions (where
appropriate) to obtain values for the verifications discussed above. However, as will be discussed
in Lessons Learned, I also used an intermediate approach.
In the final analysis, human intervention is critical. In one case, the routine checks suggested a
possible incorrect coding. To track this down, I visually scanned approximately 10,000 records by
observing the “play” of the patterns on the screen as I scrolled rapidly through the dataset. Using
this method, Patrick Ball found a coding error, traceable to key entry at the source.
There is no substitute for vigilance and scrutiny.
Examples of errors
The following are a few examples of errors that were detected in the process of analysis. Some
related to problems existing in the database or the query process. These have significance for proj-
ect personnel other than data analysts and therefore have general interest and applicability in the
management and implementation of the information system. However, the overwhelming majority of
errors were the results of my own actions, occurring on a continual basis and which, by and large,
can only be generalized to the need for each individual analyst to work constantly at avoidance,
detection, and correction of error.
Early in the project, time series plots showed a midyear peak in violations with a clear, pro-
nounced peak of violations in the sixth month, June. At first we were concerned only with revealing
this pattern, but attempting to find out why such a pattern should exist led to the investigation of
the coding process by which violations were assigned to a particular month. When the precision of
the date of the violation was one year (that is, the violation could only be placed somewhere or at
some time within a particular year), the violation was arbitrarily assigned to June. This resolved the
problem of giving it a date, and would not affect any analysis of annual patterns. However, when
the data were summarized by month across all years, the number of violations in June was improp-
erly inflated by violations that could have happened at any time during the year.
When analyzing the patterns of collective and individual killings that required the use of
named datasets, I routinely summed killings by individual and obtained the maximum and minimum
values in the column of sums. A minimum of zero would indicate the presence of a zero due to one
or more entry errors, corruption of a cell, or records that should not have been in the data set. A
maximum above 1 would indicate miscoding, entry, or corruption errors. Two different cases were
uncovered by this check:
1. In the early phases of analysis, I found instances where an individual was reported as suf-
fering more than one death. This anomaly resulted from more than one source of data re-
porting an individual’s death. This problem was traced back to duplicate reporting leading
to miscoding.
2. In another case, the same individual was reported as killed by the same source at different
dates. This is a genuine error, but I found only one.
In the data description associated with a dataset and the data block associated with an analy-
sis, the number of violations is reported. The dataset rtanonkv7 contains only killings and hence,
its violations total should have been the same as the count of killings in its source dataset, rta-
nonv7. Observation revealed that it was not the same, 34,747 compared to 34,659, a difference of 85!
While this is an error of only 0.2%, we could not overlook it for reasons of credibility and because
it might reflect larger compensating errors. On examination, Patrick Ball found that those 85 death
records were reported as more than one of the three death killing categories -- cadavers, individual,
and collective. His new program brought the two totals into agreement.
Early in the analysis, a one-way tabulation of ages in the named dataset showed ages of 0 and
-1. Both values had been used to represent missing values of age. Conflicts in the number of
missing values found at the same time were traced back to a revision of the coding process that
caused the loss of the ages of 540 people (out of about 10,000, depending on the dataset).
180
Herbert F. Spirer
coverage of the actual events must – as it did – come from the interplay and conjunction of the
quantitative knowledge gained from the data and the equally relevant anecdotal and qualitative
knowledge of subject matter experts.
Our data could not be obtained by probability sampling, which would have enabled the use of
inferential statistics and its related disciplines of statistical hypothesis testing and confidence in-
terval estimation. However, we did use probabilistic approaches to evaluate the apparent monthly
pattern (Ball, et. al, 1999: Appendix 1).
Thus, most of the tools that are usually called “statistical methods” in the educational process
and in much research did not apply to the analyses used in the body of the CIIDH report. The
challenge in this project was to apply simple methods to complex, large-scale datasets in such a
way that the voice of the data is heard and understood by both the knowledgeable members of the
project team and the lay audience of CEH, researchers, and the interested public.
Methods of data analysis
Accordingly, we used summary statistics, tables and graphs as our primary tools of analysis.
In Excel, graphs are called “charts,” reflecting the orientation of Excel to business applications.
Most statistical programs (e.g., Stata) call them graphs, as we do in this case study.
Our use of tables did not extend beyond the two-way crosstabulation. In our analyses, we
used graph formats (e.g., logarithmic axes) and types (e.g., scatterplots) that we did not present in
the final report. In fact, with few exceptions, the tables and graphs appearing in the report are fully
described in the AAAS/HURIDOCS handbook, Data Analysis for Monitoring Human Rights
(Spirer and Spirer, 1993). One exception is the comparative histogram that relates absolute and rela-
tive rates of killing by age (Ball, et. al., 1999: Figure 16.2), another is the time series plot of percent
of victims by age that uses stacked line plots (Ball, et. al., 1999: Figure 11.4).
Since readers of this paper may want to relate our approach to the formal discipline of statis-
tics, we reiterate that we have used the tools of descriptive statistics -- describing, presenting, and
summarizing data to reveal or gain a better understanding about the processes that created the
data. Exploratory Data Analysis (EDA) is a related set of techniques for understanding, analyzing,
and presenting data, its structure and systematic patterns (Tukey, 1977). Easily understood by
non-professionals, these methods have much to offer in the adversarial human rights environment.
Their effectiveness has been demonstrated, as in (Hoaglin and Velleman, 1995: 277):
181
Chapter Seven: The International Center for Human Rights Investigations
In time series plots, I omitted point markers, used ticks sparingly, and only major values are la-
beled and associated with ticks. Appendix 5 shows a typical time series plot. Appendix 6 shows
that a spare style does not mean that complex relationships are not portrayed.
Not all of Tufte’s recommendations can be achieved through formatting. Graphical presenta-
tion and analysis are interdependent. Table 3 shows the frequency distribution for the types of
graphs used in the final report. This is by no means a summary of the graphs used during the
analysis, but indicates the types of graphs that are likely to be used in an analytical report on hu-
man rights violations for this type of audience.
Histogram 1 1
Comparative histogram 2 1
TOTAL — 42
182
Herbert F. Spirer
transform to the logarithm of the variable. This transformation makes it possible to view all values
without an indecipherable cluster at the low end of the axis. If the transformed distribution is nor-
mal, have we learned anything about the process?
In this case we know that the number of killings has a skewed distribution because of factors
pertaining to the actions of the state and the skewed high end results from the actions of the
Laugerud García and Ríos Montt regimes. If no comparison is being made (for example, between the
number of killings of males and females, where the high ends differ by a large factor), there is no
good reason for using the logarithmic transformation. A logarithmic transformation is often used to
normalize a skewed data distribution in order to use the methods associated with normal distribu-
tions; in this case there is no need for such a transformation except to make the scale of values
visually tractable.
The relevant model for the time series of violations is analogous to the standard model of in-
dustrial quality control. In this model, many sources of variation common to all data points (called
“common causes”) accumulate to give a background level of random variation. In terms of this
model, the time sequence of killing in Guatemala has a “background” level due to common causes.
One or more significantly large deviations from that level would be denoted as due to “systematic
causes.” The analyst then searches for the systematic cause, which in this case usually is the im-
position of state policies.
183
Chapter Seven: The International Center for Human Rights Investigations
• The date of the analysis provides a control for error. For example, the date of the analysis
must be consistent with the date of the source dataset. Also, the date reveals whether the
latest desired version of the analysis has been performed by reference to the Figure List.
Thus, Patrick Ball proposed, and I agreed that I would associate an informational data block
with each analysis. This block was attached to every analysis until the final report was prepared. It
took several iterations before we fixed on a standard format, shown below:
Date of analysis [date]
Analyzed by: [analyst]
Records included: [count of the records used in the analysis]
Violations included: [count of violations used the analysis]
File Reference: [workbook(s), worksheet(s) ]
The data block count of the records used in the analysis was not necessarily the total included
either in the source dataset, the workbook derived from it, or the particular worksheet. It was the
number used to perform the particular analysis. Of course, I made errors in the data blocks and had
to apply the same constant verification as in the case of the datasets themselves. However, in the
long run, these data blocks proved to be invaluable in verification and in finding a way among the
many dozens of subsidiary workbooks and sheets when I needed to revise or verify and analyze.
Backup
I started with a simple backup strategy. I backed up my work locally on removable disks and
each week mailed a complete compressed copy of my project files on a 100MB ZIP disk to the
AAAS for archival storage. Since we finished the project and have been able to create full elec-
tronic archives at the end, this aspect of the project could be considered a success. We can trace
any analysis in the final report to a figure including a data block, and hence, reconstruct the origi-
nal analysis.
However, my inconsistent directory structure and file naming conventions made this more dif-
ficult than it should have been, as will be discussed in the following section, Lessons Learned.
These problems came in part from the fluid nature of the project, which was essential to a creative
process.
Lessons Learned
In a successful project such as this one, the retrospective issue is to set the stage to carry out
successful projects in the future. By showing what we did in the preceding sections, I hope that
others will get guidance for their own future work. In this section, I specifically target functions and
methods that worked well, and those that did not work well, in order to make recommendations that
can be applied both by others and us in similar large-scale human rights data analysis projects.
Large-scale analysis of human rights data rarely occurs in the same environment twice; it is
much closer to social science research than industrial statistical analysis. A common issue in ap-
plying lessons learned to recommendations is a tendency to introduce central control, uniformity,
standard procedures, and conformance to rules as a way to improve efficiency and effectiveness.
This is a valid approach in situations where control is important. On the other hand, freedom of
action and tolerance of diversity is vital to creativity. I regard the establishment of the appropriate
balance between these two poles is the major administrative and personal challenge that we face.
My own preference at this stage is to lean toward promoting creativity. As a minimum, each con-
tributor should have a unique individual approach to resolving the common problems – but in such
a way that other team members can access and comprehend his or her work.
Another common and general issue is self-discipline. If you set up a rule for naming files or a
procedure for backups, and so forth, stick to it. This is not easy when trying to get new answers to
new problems under time pressure, but it is precisely those circumstances when lack of discipline
will hurt the work the most.
Our lessons learned and related recommendations are summarized in Table 4, following. A
more comprehensive jointly authored set of recommendations for data analysis, based on both the
CIIDH and CEH experiences appears in Appendix 1 of The Guatemalan Commission for Historical
Clarification: Generating Analytical Reports, by Eva Scheibreithner.
184
Herbert F. Spirer
Directory structure A rational project structure Agree on a project Will a common structure serve all?
would help everyone. directory structure for Can a single structure be used
Backups from other team common use. throughout the duration of the
members would be project?
comprehensible on sight.
File name system Patrick Ball’s dataset naming Use appropriate file Can satisfactory rules be set at
rules worked well. Mine naming rules that will be start of project?
quickly became understandable to all.
unsatisfactory. It was good
only for a small-scale project,
here no better than
sequential serial numbers.
Field names Ambiguity in field names is Don’t use the same field Self-discipline.
treacherous name for different
variables even if
appearing in different
datasets.
Control documents I can’t work without the Some people need Finding approaches to shared
control documents described control and some don’t. documents that are mutually
in this report Do what fits you. satisfactory to team members
working together.
Update of control If you don’t keep your Don’t end up being Self-discipline.
documents records updated, you may be sorry.
sorry.
Errors, transitions Human, machine, program, At every stage, be Self-discipline. See also,
transmission errors happen vigilant and scrutinize. Facilitating error checking and
verification, below.
Backup Backup of files is a Good Have individual and Present system seems
Thing project backup system. satisfactory; is it good enough for
the next project?
Software Different software, different Have team members Agreeing on software and
versions lead to working together use versions at project start. Cost of
unnecessary inefficiencies the same programs and upgrading team members’
and errors. I had to switch versions from the start. resources. Site licenses and
both computers and project-owned hardware:
software versions to match reasonable approaches?
his. These transitions caused
a number of problems.
185
Chapter Seven: The International Center for Human Rights Investigations
Audit trail of No way to know what series Don’t use Excel for Which programs to use for
procedures of edits and operations have analysis in large-scale statistical analysis? Is Stata, for
been performed in Excel data projects. which AAAS has license, the
statistical software of choice?
Can we get adequate graphical
output from Stata?
Graph objects Encapsulated Postscript Make sure that the Is Stata good enough? Do we
Graphics would have been a statistical software have to consider other
lot easier to work with in final produces graphical files alternatives?
report than Excel pictures. that facilitate report
production.
Variable formats It is not a Good Thing to Don’t do it! Be careful with variable
define numerical variables to definitions.
have textual values.
186
Herbert F. Spirer
Conclusion
In this report I reviewed my work on the CIIDH/AAAS report, summarized the most important
lessons learned, and made recommendations for work of this nature on future projects.
I know that this summary will help me to do a better job on the next similar analysis project. I
hope that it will also help others, and in that spirit, close with this quote:
187
Chapter Seven: The International Center for Human Rights Investigations
After spending many years with the Estonian Cancer Registry [I] now think more
intensively about data quality than about the application of refined statistical
and cartographic methods to data analysis.2
2
From a book review of Global Geocancerology: A World Geography of Human Cancers. The Scientific
American, Feb. 1987, pp. 27-31.
188
Herbert F. Spirer
Appendix 1
ctanon
rtanon
ctnmd
rtnmd
AGE Age of victim integer >= 0 x x
-1 – missing value
2 – month
3 – quarter
4 – semester
5 – year
6 – decade
7 – season, no year
8 – no idea of date
1 – yes
1 – yes
189
Chapter Seven: The International Center for Human Rights Investigations
1 – yes
12 – December
190
Herbert F. Spirer
03 - Méndez Montenegro
04 – Arana Osorio
05 – Laugerud García
06 – Lucas García
07 - Ríos Montt
08 - Mejía Víctores
09 – Cerezo Arévalo
10 – Serrano Elias
11 - de León Carpio
03 – Verapaces
04 - Petén
05 – Oriente
06 – Meseta Central
M – male
d – unknown
PER – periodical
u – urban
d – unknown
191
Chapter Seven: The International Center for Human Rights Investigations
no se sabe - unknown
no se sabe, no se sabe –
department and municipio
unknown
Indigenous
Ladino
otro – other
192
Herbert F. Spirer
civ-ind – civilian-
indigenous organization
civ-uni – civilian,
university
M – male
d – unknown
no tenia trabaj –
unemployed
otro – other
blank – missing
Ds – disappeared
Se – kidnapping or illegal
detention
Hr – injury
To – torture
193
Chapter Seven: The International Center for Human Rights Investigations
Appendix 2
3
Data sets with the prefix “f” for full were not used in my analysis.
194
Herbert F. Spirer
Appendix 3
Filename No. of
muni_vlcn
mon_vlcn
type_sou
records
certfech
ape2cnt
hompct
yr_vlcn
v_ape1
munitot
c_nmd
svnum
c_tot
vlcn
age
sex
cnt
ftf2t2v1.dta 17,941 1 2 3 4 5 6 7 8
ftf2t1v1.dta 14,025 1 2 3 4 5 6 7 8 9
fintaba.txt 13,821 1 2 3 4 5 8 9 6 7
ft11.txt 14,025 1 2 3 4 5 6 7 8 9
ft1.txt 14,025 1 2 3 4 5 6 7 8 9
ft2.txt 17,941 1 2 3 4 5 6 7 8
mn4.txt,.xls, 34 1 2 3 4
.dta
mnap.txt, 410 2 3 1 7 4 5 6
.xls, .dta
4
Data sets with the prefix “f” for full were not used in my analysis.
195
Chapter Seven: The International Center for Human Rights Investigations
Appendix 4
Notes:
1. If an entry is given in “Source,” then the file is derived from an original data set described in Data
Set Descriptions.doc.
196
Herbert F. Spirer
2. Dummy blocks contain blocks of entries to add to data sets. With these blocks included, Pivot
Table will have something to chew on for each year. This makes the year variable continuous and
complete.
3. Proliferation of .xls files with same prefix on name was to avoid excessively large files. If I had to do
it over again, I would simply number them sequentially.
197
Chapter Seven: The International Center for Human Rights Investigations
Appendix 5
Figure 8.3 Percent of killings and disappearances occurring in rural areas by year, 1960-1995
100%
Percent of killings and disappearances occurring in
90%
80%
70%
60%
rural areas
50%
40%
30%
20%
10%
0%
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
Year
198
Herbert F. Spirer
Appendix 6
Figure 9.1: Number of killings and disappearances reported in the press by area (rural vs.
urban) and by year
350
Rural
300 Urban
250
200
150
100
50
0
1959
1961
1963
1965
1967
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
199
Chapter Seven: The International Center for Human Rights Investigations
Appendix 7
st
01 ? 03 Monthly 1 diff sources and not needed
violations
10 9 Apr 98 All Break into numbered sections Done, needs watching and
agreement on style
200
Herbert F. Spirer
Appendix 8
Fig# Fig# Fig# Figure title Derived from Filters Datase Resp
24 Oct 14 figure, notes t .
Oct 13 Oct
01.1 1.1 1.1 Number of killings and Net of URNG, CTanon HFS
disappearances by certfech<=5
year, 1960-1995
02.1 2.1 2.1 Number of killings and Net of URNG, CTanon HFS
disappearances by certfech<=5
year, 1960-1969
03.1 3.1 3.1 Number of killings and Net of URNG, CTanon HFS
disappearances by certfech<=5
year, 1970-1979
04.1 4.1 4.1 Number of killings and Net of URNG, CTanon HFS
disappearances by certfech<=5
year, 1980-1989
05.1 5.1 5.1 Number of killings and Net of URNG, CTanon HFS
disappearances by certfech<=5
year, 1990-1995
06.1 6.1 6.1 Number of Figure 3.26, killings Net of URNG, CTanon HFS
disappearances and only, without means certfech<=2
killings, by regime
06.2 6.2 6.2 Average monthly this is the other half Net of URNG, CTanon HFS
number of deaths and of Fig 3.26; ordered certfech<=2
disappearances, by by regime
regime
06.3 6.3 7.1 Number of killings and Net of URNG, CTanon HFS
disappearances by certfech<=2
month, July 1979-June
1984
07.1 7.1 8.1 & 8.2 Number of killings and Fig. 3.7; cut off Net of URNG, RTanon HFS
disappearances by vertical axis at 700, certfech<=5; PER
year and source leaving the DOC and and DOC
(press vs. ENT peaks off the
documentary vs. graph
interview)
07.2 7.2 8.4 Number of killings and Fig. 3.31, Net of URNG, RTanon HFS
disappearances by PER/DOC/ENT certfech<=25;
regime and data PER, DOC and ENT
source
201
Chapter Seven: The International Center for Human Rights Investigations
References
Ball, Patrick, Kobrak, Paul, and Spirer, Herbert, 1999. State Violence in Guatemala, 1960-1996: A
Quantitative Reflection. Washington: American Association for the Advancement of Science
and Centro Internacional por Investigaciones en Derechos Humanos.
Hoaglin, David, and Velleman, Paul, 1995. “A Critical Look at Some Analyses of Major League
Baseball Salaries.” The American Statistician (August 1995), pp. 277-285.
Spirer, Herbert, and Spirer, Louise, 1993. Data Analysis for Monitoring Human Rights. Washing-
ton, DC: American Association for the Advancement of Science and HURIDOCS.
Stata, 1997. Stata Reference Manual. Release 5. College Station, TX: Stata Press.
Tufte, Edward, 1983. The Visual Display of Quantitative Information. Cheshire, CT: Graphics
Press.
Tukey, John W., 1977. Exploratory Data Analysis. Reading: Addison-Wesley Publishing Co.
202
Herbert F. Spirer
203
Chapter 8
The Guatemalan Commission for Historical Clarification: Data
Processing
Rocio Mezquita
1
Editors’ note. The reader of other papers in this volume will notice that this structuring is defined elsewhere.
We retained these redundancies so that each paper is self-explanatory.
2
The original mandate of the CEH, specified in the Final Peace Accords was that the period of interest was
1960 to 1996. After a subsequent historical analysis, the CEH Commissioners decided that the “internal con-
flict” started in 1962 and ended in 1996, with the signing of the Final Peace Accords.
205
Chapter Eight: The Guatemalan Commission for Historical Clarification
Perpetrator’s responsibility:
1. Material author
2. Collaborator
3. Intellectual author
4. Informant
5. Does not apply This was used for specific violations such as “the person disap-
peared for unknown reasons”, in which there was no perpetrator, and therefore re-
sponsibility could not be assigned.
Perpetrator’s identity.
1. Deponent is a witness
2. There are other witnesses
3. Deponent suspects
4. It is publicly known
5. Documented evidence exists
6. Does not apply This was used for specific violations such as “the person disap-
peared for unknown reasons”, in which there was no perpetrator, and therefore re-
sponsibility could not be assigned.
The data processing team had therefore to identify the following elements in a case:
who the victims were
what happened to them
who was responsible
where
when
violation’s certainty
perpetrator’s responsibility
certainty about the perpetrator’s identity
3
The victim was directly related to the pattern, but victim information was recorded separately.
206
Rocio Mezquita
The perpetrators could be identified in a collective way by the group to which they belonged.
The listing of such groups is given in Appendix 4. If the perpetrators’ names were known, they
could be related directly to the violation.
The victims
The victims were directly related to the pattern. Accordingly, there were three types of “count-
able” victims:
identified
anonymous
collective
Identified victims
The identified victims were those victims of whom we knew at least two of the three fields used
to identify the victim in the database, one for the name and two for the last names (father’s and
mother’s).
Example #1:
NAME: Francisco
1st LAST NAME: Pop
2nd LAST NAME: X (unknown)
Example #2:
NAME: Juana
1st LAST NAME: X (unknown)
2nd LAST NAME: Ramirez
Example #3:
NAME: X (unknown)
1st LAST NAME: Cu
2nd LAST NAME: Caal
Anonymous victims
Anonymous victims were victims for whom there was no personal information. Until almost the
end of the project, the program also counted the “xx” (individual victims whose names we did not
know even though we knew their sex, age or ethnicity), as anonymous victims. Initially the program
did not count this valuable information in its statistics, and it was lost. Finally, this was changed
and the “xx” victim was automatically converted into a “collective” of one person. Thus, in the end,
this information was counted in the statistics.
Collective victims
This term denoted two or more victims for whom we had some information, such as sex, group
identity (e.g., catechists, or peasants from a specific village, or an ethnic group).4
Difficulties Encountered
The definition of a massacre
How to define a massacre was an issue of ongoing concern throughout most of the CEH proj-
ect. As the term “massacre” was never a violation in itself, the identification was made through key
words at the beginning and at the end, through the title.
Initially, a key word code was used when the when the case testimony mentioned a massacre.
At that time it was the testimony that identified a massacre, and not the case, as there was a many-
to-many relationship between cases and testimonies.5 A massacre was at that time defined as the
4
A discussion of the nature of the definitions of collective victims and the relationships inherent in these
definitions appears in Chapter 9.
5
One testimony could relate information about many massacres; each massacre might be described in many
testimonies.
207
Chapter Eight: The Guatemalan Commission for Historical Clarification
killing of a “significant” number of people, but that number was not given in advance. Thus, what-
ever number the data processor thought to be significant determined whether the event was de-
noted a massacre. Unfortunately, a specific key word for massacre cases was not used. The case
could be a massacre or its testimony could – among other things – relate a massacre, which was
not the violation of the case itself. This was a confusing situation.
Later in the project, when almost all of the data processing of the cases had been completed, a
final definition was agreed to. This final definition was: “A massacre shall be considered the execu-
tion of five or more people, in the same place, as part of the same operation and whose victims were
in an indefensible state.”
A code was created to identify those cases, which fell under the new definition. All the cases
that were already in the database and which had five or more victims of executions were revised
and re-coded. A problem was that some cases which were massacres did not appear in the list of
“five or more victims.” Thus, they could not be re-coded under the new definition. They were not
in the list because the victim type was collective and when we did not know how many victims
there were we counted the collective victim category as including only two victims.
When this problem was found, it was decided to automatically apply the first massacre defini-
tion to the second one. It was assumed that the first coding was more limited and that all the cases
that had this code should also fall under the new more specific definition of massacre. Unfortu-
nately, it was forgotten that the first massacre code has been used not only for massacre cases, but
for all the cases in which the testimony mentioned a massacre whether it belonged to the case or
not.
As a consequence, the new code for massacre lost its relation to the new massacre definition
because it included testimonies which spoke of a massacre, or where the case itself was not a mas-
sacre, or from massacre cases under the old definition and massacre cases under the new definition
There was no way to identify all the massacres in the database that fell under the new defini-
tion, as the code had been altered. To solve this newly created problem, it was decided to search
for massacres by the “title” section, which appeared on the summary case form. This approach to a
solution had its difficulties. Unfortunately, not all the massacres were identified correctly in the title
because some interviewers used the wording “indiscriminate attack.” Another source of problems
was a spelling error, which made it impossible for the program to correctly identify the code. Final
resolution was achieved after a number of revisions of the entire database and all the massacres
were identified and listed.
Names of the categories
At the time of data processing, the database did not distinguish different violations for differ-
ent perpetrators. For example, extrajudicial execution for state actors (as stated in international
human rights law) and assassination or killing for the guerrilla forces. Extrajudicial execution was
accepted and coded equally for any perpetrator. At the time of the final report, the category name
was changed and the killing of a person was defined as “arbitrary execution.”
All such violations, which were denoted homicides in the sense that the reason for the killing
was personal and not political, were considered extrajudicial execution whether committed by state
agents or the guerrilla forces. The reasoning for this designation was that the perpetrators commit-
ted these violations under the impunity that the context of war offered.
The ambiguous category, wounded or killed, was created to keep a record of the combatants
who were mentioned in the different cases and were either wounded or killed. This category was
often used to identify those people who joined the guerrilla forces, and never came back. In such a
case, it was assigned with a certainty of “it is suspected.” There was no certainty that they were
killed or wounded. They may even have become refugees in another country. This information
could have been useful if it were decided to look for those combatants who never returned, and
whose families continue to search for them.
The identification of forced disappearances also had many problems. In some exceptional in-
stances, there was no specific information on whether a witness observed the kidnapping or the
testimony clearly stated that no one witnessed it. However, the context strongly suggests a forced
disappearance. For example, the victim may have previously been threatened, or belonged to a
group likely to suffer political violence in the Guatemalan context. In these special instances, the
violation was classified as forced disappearance.
Other special cases arose when the body of the person was never found. In all of the following
cases, the violation was denoted a forced disappearance:
208
Rocio Mezquita
Deponent
Every case was constructed of one or more interviews. The relation of the person who gave
the testimony was directly related to the case itself, never to the specific information that appeared
in it. If more cases appeared which mentioned the same act of violation as another one, the addi-
tional deponent information was also added to the case.
Frequently the interviews were collective interviews. Sometimes entire communities would as-
semble and collectively give their testimony. This collective deponent was seldom identified by the
interviewer, and therefore, was not recorded at the database.
For security reasons it is important to decide how and where the deponent’s form is going to
be filed. At the CEH, the deponents’ forms (which contained their names and signatures) were filed
separately from the case, for security reasons. Every testimony had to have at least one deponent
sheet. This sheet either had a name, or if the person did not want to give their name, a note saying
“deponent is afraid of giving the name” or stating whatever reason the name is suppressed. Only in
this way, could we use the database to count all the people who gave testimony.
209
Chapter Eight: The Guatemalan Commission for Historical Clarification
The disadvantages included the difficulty of reading other people’s handwriting, which was a
continuing problem. Team personnel would have to contact the persons who carried out prior tasks
to get clarification. This caused some lost time and may have resulted in undetected errors by per-
sons who did not realize they should seek help. Also, most of the interviewers, data processors,
and data entry personnel were not Guatemalan and were not always sure of the spelling of names.
In a few cases, the data processor might fail to enter some information that was required. Errors due
to these causes were avoided, but time and effort was expended correcting the errors.
However, specialization had its advantages. Data entry personnel were able to increase the
speed of entry as they became more experienced (further along the learning curve). Data proces-
sors could concentrate on reading the testimonies and dealing with the specific problems that ap-
peared when determining and interpreting the acts comprising a violation or a statement.
The database coordinator and her assistant trained the data processors. The topics covered in
the training were the following:
• Information to be obtained from testimonies (violation, date, perpetrator’s group name,
perpetrator’s individual name, etc.)
• CEH violations
• Key words
• How to properly fill in the forms
• Use of codes and textual entries
• Coding categories, for example, how to determine an identified, collective or anonymous
victim or perpetrator
• Explanation of how to properly complete the forms, specifying which information should
be coded and which should be text
Lessons Learned
Problem Lesson learned Issues
Codes and definitions A unique code should be kept for each definition. If a Enable the system to keep
were not unique new definition is to be coded, do not code it under an track of all the information in a
throughout the project. existing code. Give that definition its own unique code. separate way. Then, the
system will be robust with
respect to changes
introduced by heads of the
investigation, interviewers or
analysts.
Inefficiencies resulting Assure that the database team personnel have a good It would be good to have
from the lack of knowledge of the language, history and geography of country nationals “seeded”
knowledge of the country. This can be achieved through training and throughout the teams if the
Guatemalan languages, education. proportion of foreign
history and geography. personnel is high.
The same problems were Make consistent rules for resolution of problems and
not always resolved in distribute to team personnel.
the same way.
Different rules for Make consistent rules for resolution of duplicates and This is the most important
determining duplicate distribute to team personnel. special case of the prior
names were used by lesson learned.
different team members
at different times.
210
Rocio Mezquita
Problem of apparently Consider on an individual project basis. For example, Juan Perez is an
anonymous victim who identified victim and it is
bears a known known that another victim is
relationship to an his son, but cannot be fully
identified victim. identified by name.
Not all violations had a Do not allow violation coding without a controlled
controlled vocabulary at vocabulary.
all times.
Filtering for duplicate Filter suspected duplicate names early in the process When duplicate names are not
names was inefficient of data entry. cleansed early in the process,
and consumed too much each duplicate entry is
time at end of project Provide an entry on forms for indicating that an processed. Thus, many data
(two months!). interviewer, data processor or other person believes processing operations are
that a form contains a duplicated name. Store this duplicated unnecessarily.
notation with the record in the database.
Inefficient re-coding Identify who does coding so that person can re-code if
because it is done by a needed.
different person.
Balancing too little Education of data processing team on objectives of It is not always clear in
information with too project, especially if they are changed. advance what information will
much. be useful. Too much is better
than too little, except that
resources, especially time,
limit what can be done.
Inefficiencies at the start Establish principles and rules at start of project; inform Unfortunately, there was still
of the project because of data processing team members, making sure that they disagreement over some of
incomplete organization, are aware of changes as they occur. these principles going on after
confused ideas about data collection had started.
information to be
collected, and lack of
understanding of
principles of
case/violation structure.
211
Chapter Eight: The Guatemalan Commission for Historical Clarification
Appendix 1
Victim:
1st last name
Name
Identity document. number and date of issue; one of the following was accepted: Identity Card,
Birth Certificate, Refugee Card, Demobilization Card, Passport. This
information was almost never completed
Date of birth or age at the moment of there were several levels of certainty options: 1) total 2) 1-2 years 3) 3-5
the violation. Certainty of this years 4) 6-10 years 5) none
information
Place of birth department, municipality, town, village, with a code number from a coded
geographical dictionary of Guatemala
Mother’s language A coded list of languages spoken in Guatemala, as well as other languages,
was used to answer this question
Type of victim multiple and non exclusive options were allowed here
Where did the victim live at the moment text, not coded
of the violation?
Marital status at the moment of the violation, options were: 1) single, 2) married 3)
widowed 4) divorced
Name and age of the daughters and for the age, the deponent had the option to tell the age at the moment of the
sons. violation, or the age when the testimony was given
212
Rocio Mezquita
Perpetrator (individual):
1st last name
Name
Sex M/F
Identity document number and date of issue. One of the following was accepted: Identity card,
Birth Certificate, Refugee card, Demobilization card, Passport
Date of birth or age at the moment of there were several levels of certainty options: 1) total 2) 1-2 years 3) 3-5
the violation. Certainty on this years 4) 6-10 years 5) none
information
Place of birth department, municipality, village. With a code number from a coded
geographical dictionary of Guatemala
Mother tongue a coded list of languages spoken in Guatemala, as well as other languages,
was used to answer this question.
Groups)/organizations) to which the multiple and non-exclusive options were allowed here. A coded list of
perpetrator belonged groups of perpetrators was used.
Violations for which the perpetrator is The violation had to be related to the pattern and to the order in the
responsible sequence of violations. This section included the kind of responsibility:
perpetrator, intellectual responsibility or informant, and the type of evidence -
- deponent is witness, other people saw him, deponent suspects,
“everybody knows,” there are documented proofs.
213
Chapter Eight: The Guatemalan Commission for Historical Clarification
Deponent
1st last name
Name
Sex M/F
Identity document number and date of issue. One of the following was accepted: Identity
card, Birth Certificate, Refugee card, Demobilization card, Passport. This
information was almost never completed.
Mother tongue a coded list of languages spoken in Guatemala, as well as other languages,
was used to answer this question.
Other characteristics of the deponent one of following options should be selected if it corresponds: a) refugee b)
displaced c) reinserted ex-combatant d) returned refugee, e) survivor of a
massacre f) victim of non typified violations g) witness or survivor of the
political violence h) other text
Relation of the deponent with the as the deponent could be the same in several cases, this section specified
victims the victim, the victim’s number, and the case in which that victim appears.
Then, the relation was specified. There was a coded list of the most
common relationships.
Other people who the deponent knows Text: name of the witness, how to find her/him, and any security problems
were also witnesses of violations there may be for this witness if s/he is looked for by a CEH interviewer
Language in which the interview was a coded list of languages spoken in Guatemala, as well as other languages,
taken was used to answer this question.
214
Rocio Mezquita
Appendix 2
CEH Violations
The CEH used definitions for most violations that were derived from international humanitarian
law, except for some violations that had CEH definitions. The following were the CEH accepted
violation categories with the approved definitions.
a) Human rights violations and This is a general category. However, the database processing
cases of violence resulting in team used this category to classify all conflict-related deaths that
death did not correspond to any of the approved specific violations
under this general category. For example, people who died as a
consequence of torture -- not immediately, but several months or
years later – were included in this category, as well as related
suicides.
Extrajudicial execution Legal definition, including incidents where a guerrilla was the perpetrator.
Also, when a person died immediately after or as a clear consequence of
torture.
Death by forced displacement People who died as a consequence of the displacement that persecution,
fear, and massacres produced. Includes deaths due to hunger, sickness,
depression, lack of medical attention due to displacement (pregnant women
who died in labor, etc.).
Civilian death during hostilities According to the international humanitarian law definition for “hostilities.”
Civilian death by indiscriminate attack According to the international humanitarian law definition for “indiscriminate
attack.”
Civilian death by the use of mines Death as a consequence of touching or walking over an anti-personnel
mine.
Death resulting from use as a human Deaths in events in which civilians or paramilitary state-approved forces
shield (e.g., PACs) were used by military forces on patrol to protect themselves
from guerrilla forces. Not easy to identify, as not all deponents would have
known that these people were being used as human shields at the moment
they were killed.
b) Human rights violations and This is a general category. However, the database processing
cases of violence resulting in team used this category to classify all conflict-related wounds
grave injuries and injuries, which did not correspond to any of the approved
specific violations under this general category.
Wounded during an attempt on one’s Victims who survived an attempt at extrajudicial execution, but were
life wounded in the attempt.
Wounded during forced displacement People who were wounded as a consequence of the displacement that
persecution, fear, and massacres produced. Includes wounds due to
hunger, sickness, depression, lack of medical attention due to displacement
(pregnant women injured in labor, etc.).
Civilian wounded during hostilities According to the international humanitarian law definition for “hostilities.”
Civilian wounded by indiscriminate According to the international humanitarian law definition for “indiscriminate
attack attack.”
Civilian wounded by the use of mines Injury as a consequence of touching or walking over an anti-personnel
mine.
215
Chapter Eight: The Guatemalan Commission for Historical Clarification
Wounded while being used as a human Injuries or wounds in events in which civilians or paramilitary state
shield approved forces (e.g., PACs) were used by military forces on patrol to
protect themselves from guerrilla forces. Not easy to identify, as not all
deponents would have known that these people were being used as
human shields at the moment they were injured or wounded.
c) Torture and cruel, inhuman and As defined in international humanitarian law, but applied to state
degrading treatment agents, guerrillas or any other group. This category includes ill
treatment.
d) Sexual violations Sexual abuse by any perpetrator. If a person was raped by more
than one perpetrator at the same place and date, the database
team counted only one act of sexual violation.
Forced disappearance Used in general only when the victim was seen being disappeared by a
perpetrator. In cases where the context (e.g., prior threats, membership in
a targeted group) strongly suggested a forced disappearance, this
category was also assigned.
Disappearance by unknown cause This category was used in the absence of information about the
circumstances of the disappearance (e.g., a person left his/her house or
was last seen somewhere, and after that, never seen again.). If there was
a suspicion that a specific perpetrator disappeared the person, then the
violation should be “suspected forced disappearance”.
f) Kidnapping This category was used only where the kidnapping was from
guerrilla actions, when extortion was involved. If the person died
as a result of the kidnapping, the violation should be classified as
kidnapping and arbitrary execution.
g) Others This category is for violations not included in the original list, but
are violations or other events that the team considered valuable
for future analysis or investigation.
Threats, intimidation Originally to be used only when there was no other CEH violation which
made it possible for the violation to be recorded as a case. Later, also used
when this violation was important to a case.
Burning crops Rarely used. Defined to allow recording of this act as part of a context.
Deprivation of one’s liberty Any action which aims to deprive the victim of physical liberty. This
category was mainly used to classify those violations in which a victim
was kept as a captive for a specific period of time, and then freed,
tortured, killed or disappeared. This violation could happen several times to
a same person, if she or he was transported from one place to another.
For example, often a victim would be kept in a specific military building, and
then moved to another one where s/he could have been tortured, and then
to another location, and so on until s/he was freed, killed, or never
appeared again.
Forced recruitment Not easy to distinguish from other deprivations of liberty. Only when the
testimony gave specific information that the victim had been forced into
military service was this violation assigned.
Homicide Not a CEH violation. This violation was used when the testimony gave
clear elements to conclude that the death of a person was not the result of
political violence.
Dead or injured combatants Not a CEH violation. For historical record only.
216
Rocio Mezquita
Prisoners of war Not a CEH violation. Recorded combatants who were taken prisoner by
the military. If other CEH violations (i.e., torture) occurred during the
detention, these were recorded.
217
Chapter Eight: The Guatemalan Commission for Historical Clarification
Appendix 3
Key Words
Key words are categories that make it possible to classify information according to qualitative
criteria. These key words were classified into the following 12 major groups. A summary of the key
words follows:
Human rights violations and cases of violations that were not registered as types; i.e., forcing a person to commit
violence acts of violence against one’s family or community; forcing a person to
witness acts of violence; publicly displaying cadavers; committing extremely
cruel actions; destroying housing; violating other rights, such as civil or
political rights; persecuting populations; etc.
Violations to cultural integrity violations committed in relation to the ethnic background of the victims
(indigenous people)
Strategies of parties to the peace strategies and actions employed by armed actors against the population:
accords among others, forced recruitment; massive oppression; sociological actions;
prisoners of war; infiltration; scorched earth policies; use of paramilitary
Specific military strategies groups; use of informants; accusing a person of being a guerrilla or
collaborator with the military, etc.
Specific guerrilla strategies
Modus operandi themes related to the way in which parties to the peace accords acted: the
use of disguises; acting like other parties to the peace accords; wearing
hoods; use of arms; use of specific vehicles, etc.
General context information on local conflicts; local power structures (social, economic,
political, religious, and the state); impunity, etc.
Community reaction mechanisms alternatives sought by the civilian population to protect themselves from
violence: forming popular organizations; forming communities of people in
resistance; fleeing; displacement; hiding in the mountains, etc.
Consequences of armed conflict the effects of war: poverty; displacement; dispossession of land; physical
and psychological illnesses; marginalization; returnees, etc.
Women victims of violence specific violence against women and children respectively: sexual assault;
exploitation and forced work against women; persecution; abandonment;
Child victims of violence trade and exploitation of children, among others
218
Rocio Mezquita
Appendix 4
Perpetrator groups
The perpetrators could be identified in a collective way by the group to which they belonged.
The following is the listing of the perpetrator groups that were used in the project.
Security forces
Military Commissioners
Paramilitary groups (“Death Squads”) at least 10 different death squads were identified
Armed groups
Unidentified
Public employees
Mexican military
Mexican police
Ex-military
Ex-guerrilla
219
Chapter Eight: The Guatemalan Commission for Historical Clarification
220
Rocio Mezquita
221
Chapter 9
The Guatemalan Commission for Historical Clarification: Database
Representation
Humberto Sequeira
Introduction
The purpose of the database for the Guatemalan Commission for Historical Clarification (CEH)
was to make it possible to process the human rights violations reports collected by the investiga-
tors who did the field work in the areas most affected by the violence in Guatemala from 1960 to
1996. The design was straightforward and simple to implement, and allowed for improvement and
expansion of the database to other areas of the project.
When I joined the CEH, my experience in Information Systems was in traditional commercial
areas: Point of Sales, Inventory, Export/Import, Container Control, etc. The concept of a human
rights information system was new to me, and my first reaction was negative. My initial concern
was that the case capturing system and database designs were almost complete. In addition, Visual
FoxPro, the database application program in use, was not what I would have expected to use for a
large-scale project. But I focused on the project as a whole and decided to do the best work I could
with the available tools.
The choice of Visual FoxPro was a good one, despite some initial problems. Its ease of design
and programming proved invaluable. In addition, we were not limited in the number of users ac-
cessing the database. Hence, in-line SQL statements proved to be excellent for testing and search-
ing the database for errors and basic information, and for producing the tables needed for the CEH
team’s analyses and studies. While the selection of Windows NT Server as our server software
was not optimal from the standpoint of speed, it was easy to administer. In fact, the server went
down only twice in a year’s work. Speed was the major problem with our server and if the CEH had
been network-ready, we would have had difficulty accessing, manipulating and distributing the
information. Our server was too large for the database, considering it took less than 500 MB to run
the database and its software. With distributed systems we could have done much more than only
serve the database team. Distributed systems would have supported more extensive collaboration
on various subjects, and the ability to share resources would have been an asset.
One major disappointment was that the CEH was not network-ready in their central offices.
This was the most troublesome aspect of the otherwise excellent work environment. Without a
network we could not give some teams the timely information they needed to do their jobs. Also,
we could not track the information the database was producing and some members lacked the
technical and statistical knowledge needed to understand some of the graphs, statistics and re-
ports they received. This created an uncomfortable relationship between the database team and
other CEH workers.
In this paper, I discuss the information we used to design, implement and develop a database
system that addresses human rights violations, which was our principal interest. I describe the cur-
rent system with proposed improvements, its design, development, testing, training, implementing,
contingency proposals and time/cost estimation, with proposed improvements. In addition, I offer
recommendations for future human rights database systems in Appendix 1.
Database Representation
At first, an in-house programmer was hired to design the system to collect information from
testimonies received by the CEH during their fieldwork. However, the pressure of deadlines, work-
ing with new software and intense work environment caused him to resign. Subsequently, Assist, a
Guatemalan software firm was hired to carry out the design.
From a user’s point of view, the current system was divided into the following parts:
1. Case Summary
2. Violation(s) Pattern
3. Victim(s) General Information
223
Chapter Nine: The Guatemalan Commission for Historical Clarification
4. Person(s) Denounced for the Violation(s), that is, the Perpetrator(s), the person(s) respon-
sible for the violation(s)
5. Person(s) who Denounced the Perpetrator (Deponent)
This information comprises a Case. Using this structure, we obtained valuable information. I
explain and discuss each of the above parts in detail in the following sections.1
Case Summary.
The case summary is where we kept general information about the violation(s) being reported. This
is general information, and was not be used for statistical analysis.
In this summary we had information such as:
• researcher name
• interview date and place
• if the interview was taped and how many cassettes were used
• documentation given to the CEH
• given case name (from the analysts’ team)
• general date of the violation or the period in which the violations occurred (also from the
analysts’ team)
• case summary (general overview of the case)
• comments and questions about the case raised by the database analyst
• general geographic information
• key words in the violation(s) pattern
• whether this case was reported to other organizations
• number of victims mentioned in the case
We added markers to validate the cases and violations. A team of CEH lawyers made legal judge-
ments about the cases and violations to provide the validations.
Some information collected in the Case Summary is redundant, but it was useful for an overall
view of the case. Among the redundancies were date and place where the violations(s) took place
and the number of victims mentioned in the case. This information was also stored in Violations
Pattern (2) and the victims were also counted in 3. Victim(s) General Information (3), discussed be-
low.
These redundancies were a primary cause of discrepancies in the accounts of victims, but were
an aid to quickly showing the number of victims in cases. Thus, massacres were identified more
easily. In my opinion, these redundancies should not be eliminated from the Case Summary. They
should appear in the Case Summary and the system should provide an account of them, but this
should not be a user-editable field.
Keywords gave us the ability to store information that otherwise would have been lost. This
was where data such as Modus Operandi, Military Strategy, Violence Against Children and
Women, Cruel Actions, Destruction of Goods, and Religious Violations, etc., was stored.
However, researchers should not make conclusions based on statistical analysis using the
keywords for two reasons. First, the key words qualified cases, not victims or violations, and cases
do not have clear substantive boundaries that allow them to be meaningful quantitative units. Sec-
ond, the key words were not always applied with the care and precision that were customarily em-
ployed in other classifications. The purpose of the key words was to allow analysts to group cases
that share some idea such that the cases could be found again and revisited for qualitative in-depth
analysis.
We were able to group Cases in important categories using the Name Assigned to the Case.
We used it to differentiate between Normal Cases and Massacre Cases. We were originally using
the keyword Massacre to identify the Cases that had more than five Arbitrary Executions. How-
ever, problems in the definition of massacre caused some cases to be identified as massacres and
others to be left out. So in the final stage of the CEH, we agreed to use the first part of the case
name as Massacre to identify massacres. This was important, because we then could differentiate
between Massacres and Normal Cases. A keyword validation is a more useful method, because the
1
Editors' note. The reader of other papers in these proceedings will notice that this structuring is defined
elsewhere. We retained these redundancies so that each paper is self-explanatory.
224
Humberto Sequeira
name can change and all references to a case being considered as massacre will be lost. However,
with the use of a key word to identify a case called a massacre we have an historical analysis of the
changing definition of massacre during the project.
Violation(s) Pattern.
The Violation(s) Pattern is where we kept track of what, where, when and to whom. As discussed in
(Ball, 1996), this structure is essential when the case has more than one violation. Clarifying the
structure of this information component was the most discussed aspect of how the data were rep-
resented in the database.
Here we stored the following information:
• Violation order
• Type of violation
• Certainty level for the violation
• Place of the violation
• Date of the violation
• Group responsible for the violation
• Type of responsibility for the group responsible
• Type of evidence about the responsible group
• Number of identified victims
• Number of collective and anonymous victims affected by the violation(s)
• Number of victims who died
• Disposition of body
• Location of the burial site
• Information about victim death
• Was body identified
• Was body found
• Was the victim tortured
• What kind of torture the victim received
• How the person reporting the violation knew who committed the violation.
As you can see, the Violation(s) Pattern was the category where we stored most of the infor-
mation given to members of the CEH. It includes some redundant information, which concerned the
victim’s account according to the violation pattern. The violation pattern was the easiest way for
us to represent violations to one or more persons. Some information here, like torture information,
was also presented in graphical form.
Due to this design the representation of multiple victims receiving one or more violations was
excellent, but it was not the best way to represent one person receiving one or more violations. The
reason that this was not a good way to deal with one person receiving one or more violations was
that is that the basis of our system design was to use violations as the unit of analysis rather than
victims. We could give our co-workers breakdowns based on victims, but we wanted them to think
in terms of violations. Some of the officials at the CEH could not grasp this approach. This is un-
derstandable, because the information they wanted to share with others were personal accounts,
such as how many people were affected, not how many times they were affected.2
For example, some violations, like Arbitrary Execution (or any other in which the victim died or
disappeared), were straightforward mappings of persons to violations in a one-one relation. But
others like Freedom Deprivation were not so clear. In such a case, one person can be deprived of
his/her liberty on more than one occasion, not only in the same case, but also if the person ap-
peared in another case and was again deprived of liberty. Thus, we would have a one-to-many rela-
tion of one person to multiple violations.
It is important to remember not to have more than one death-causing violation in one pattern.
This may seem pretty clear, but it happened to us occasionally. We took the necessary steps to fix
this problem. We used a listing on the person’s name, place and time of violations, along with age
and parents’ name. We also got some help in dealing with this problem when we stored violations.
For instance, we have the Violation Order field, as we capture violations in an orderly fashion. So if
2
For an explanation of the importance of this approach, see (Ball, 1996). Patrick Ball’s concepts and guidance
were of great assistance to me in my work on this project.
225
Chapter Nine: The Guatemalan Commission for Historical Clarification
a Freedom violation occurred first and then a Sexual Violation, then the Freedom violation will be
given a lower order number. The order field was also useful in pinpointing the responsible forces
participating in the violation, so in the same order number we could have the Army participating in
the same violation with the Military Commissioners.3 The restriction for this use is that the viola-
tion must occur on the same place and date.
The following is an example in which the Army, Military Police and Civilians participate in one
Sexual Violation.
1 Military Police
1 Civilians
Some information was infrequently used, either for information analysis or questioning by the
investigators making the interview. An example is information about the disposition of the body
and whether the burial site of the victim was known. Also, in this category was information as to
the death of the victim, and if the body was identified, or not found at all.
As we discussed, during the course of the project we realized that we had to update the Viola-
tion Pattern, and we came up with a solution, which will be discussed in detail in the section on
database design.
3
Ejercito and Comisionados Miltares¸ respectively.
226
Humberto Sequeira
As mentioned earlier, the victims were categorized as Individual or Group. The information
stored for both categories was the same except for some information that was unnecessary for
Group victims such as age, date of birth, relatives’ names, etc.
In the Group information we entered information regarding a group of persons, such as Sex,
Birthplace, Location Where Found, Age Range, and some other general terms. We used this infor-
mation in the name we gave to the group. For example, if a group of children was found dead near
the river Rio Negro, we used the Last Name, Second Last Name and Name to call them “Children of
Rio Negro.”
We tried to determine an approximate number for how many people were affected by the viola-
tion using these classes; 2, 4, 5, 10, 11-20, 20-50, more than 50.
Sometimes it was not possible to use the classes, and terms such as “many,” “a few,” and “a
lot” would appear, indicating that we had no approximate number to use. As time went on we
needed a number to assign to these victims. We were conservative and assigned an approximate
value of two. We also were faced with the anonymous victim assignment problem and gave them
the same approximate number of two. The Anonymous victims were people about whom we knew
almost nothing, and for that reason we stored their information in the Violation(s) Pattern.
The most secret materials kept at the CEH were the perpetrator’s name along with the informa-
tion about the person reporting the violation, because it was essential that the information given to
the CEH should not be used for revenge. In fact, the most important aspect of the Database is to
produce information that is without bias and not manipulated in any way to benefit any group or
person in particular.
The data stored here was information about the individual perpetrators identified in a violation
or violations. We soon found out that an individual perpetrator could appear in more than one
case. This is the same as for victims or persons reporting the violation to the CEH, and we made the
necessary adjustment to reflect this relation. A victim can also be a person denounced for a viola-
tion or/and a person reporting a violation in some other case. The lack of accurate information was
responsible for the problem that a person might have been mistakenly referenced as a perpetrator;
however the analysis team made its best effort to avoid this situation.
There can be many persons responsible for a case, but only one can be represented as the
Person Denounced for the Violation(s), the Perpetrator who is the person responsible for the viola-
tion(s). Also, we did not accept a general description such as “Person living in the village of Bar-
raza in 1991.” We demanded a name or at least, an alias.
227
Chapter Nine: The Guatemalan Commission for Historical Clarification
Once again it is important to note that we should always protect the identity of the persons
who have trusted the CEH with the information stored by this screen.4 It should remain only in the
archives of the CEH.
It is essential not to change the stored information in some tables. For example, David Sequeira
is a victim (victim #2), his mother is Magda Sequeira and his father is Walter Sequeira. We save this
information on the victims screen. In this case Magda is also a victim (victim #6). Since David is
also the person reporting the violation, some users entered incorrect information stating that the
relationship was son/daughter, thus changing the relationship between Magda and David.
Database Design
The following flow chart is an overview of database design at the CEH.
4
When we refer to a screen, we are referring both to the screen as shown by the computer for data entry and
the physical form that may have been used to record the information.
228
Humberto Sequeira
One to Many
Relationship
Case Violation
Pattern
Violation Violation
Responsible
(Institution)
Person Person
• Deponent • Perpetrator
• Victim
As mentioned previously, Patrick Ball designed the CEH relational database so that we could
use SQL syntax to code a wide range of information. The design not only counted violations as the
primary unit of analysis, but was flexible enough to support other units of analysis.
The first table used is the CASE table, 5 which is the main table in the database. In this table,
we stored the basic case information, such as the information stored in the Case Summary. The
field investigators assigned most of the case numbers according to a limited set of numbers given
for each area, so that an area such as Guatemala City could range from 1 to 1000, for Zacapa, 1001
to 2000, for Quiché, 2001-3000. In a similar fashion, ranges of numbers covered all the sites where
CEH offices operated.
We used a technique called automatic number generation, in which case #9 will not necessar-
ily be #9 internally in the database. Instead, case #9 could be internally represented by CA0000257.
The cases were numbered, as entered, one by one by the database and given a sequential number.
Although you might normally expect that case #9 would be entered before case #1125, with auto-
matic number generation, this is not necessarily true.
In addition to the information in the Case Summary, we stored here the Case Creation Date,
Case Modification Date, User Who Entered the Case and User who Modified the Case.
The master key for this table was the CASE_ID, which was used throughout the system.
Following with the case information we also had a table to store the keywords used to qualify
a case, CASE_CLV. A case can have more than one qualifying keyword. The information was
CASE_ID, PALA_COD and AUTHOR. Using this table, it is possible to record some essential in-
formation that was not recorded in the VIOL table and otherwise would have been impossible to
obtain. PALA_COD is the keyword code.
If a case was reported to another institution besides the CEH we used a table, CASE_DEN.
Here we stored the CASE_ID, the person who reported it, the institution that received the report
and the date.
The PATTERN table is the glue that holds together the information in the database, as here we
store the Pattern number along with the number of identified, collective and anonymous victims,
and the estimated magnitude of the number of persons mentioned (“many,” “a few,” “a lot,” etc.).
This table is directly linked to the CASE table in such a way that a case can have several
PATTERNS. The key for this table is CASE_ID+PATTERN_ID.
The PATTERN table is the parent table of the VIOL table. In the VIOL table, we stored the pat-
tern number, order number, the violation, the date, the place, and some geographic description to
5
In the discussions that follow, I follow conventional practice in information system design and implementa-
tion of capitalizing the names of tables, keys, codes, and field names when appropriate.
229
Chapter Nine: The Guatemalan Commission for Historical Clarification
help us identify the place and violation certainty level. The full key for this table is
PATTERN_ID+ORDEN_NUM+VIOL_ID, which we can link to the CASE table using the
PATTERN table. Thus, if we had only the VIOL table, some of the information would be available,
such as what, when and where but we would still be missing who were the perpetrator(s), which is
the table I describe next.
We stored information about the organization responsible for the violation in the table
VIOL_RSP. We stored the VIOL_ID code for the perpetrator group, type of responsibility, evi-
dence, and place of the group assigned to the violation. In this table we can show if more than one
group shared the violation, up to n groups. The key for this table is the VIOL_ID.
The table PATR_VICT stores the victim’s id number and the pattern in which the victim’s vio-
lation occurred, so we only need the PATR_ID, PERSON_ID and victim number. Thus, we have all
the violations that a person suffered in a given case or in more than one case. It doesn’t matter if
the victim is individual or collective; we only need the PERSON_ID number. We also use the VICT
table as a backup for PATR_VICT; the only difference here was that we also stored the CASE_ID
number.
The PERS table stored all the information regarding a person, any kind of person. The key for
this table is PERS_ID. Victims (Individual or Collective), Persons Accused of Violations, Persons
Reporting a Violation (Individual or Collective), Brothers, Parents, Son/Daughters, Wives, Hus-
bands, etc. Every name of a known or unknown Individual or Collective person is here. By July 25,
1998, we had about 30,000 persons in the PERS table. We had three fields to mark if a person was a
Victim, a Perpetrator of Violations and a Person Reporting a Violation. We also had a field to show
if the victim was an Individual or a Collective person. Important information regarding the person
such as: Full Name, Alias (es), Sex, Age, DOB, Nationality, Language, Documents, Place of Birth,
Civil Status, Comments, Age for Deponent.
Because we wanted full information about a person we were faced with a problem in time rep-
resentation. For example, a person who doesn’t remember his DOB (a frequent occurrence) reports
a violation he suffered in May 1982 when he states his age as 21. He then reports a violation he
suffered in July 1993, when he states his age as 34. But this is 11 years later, and if his 1982 age
were correct, then he should be 33. The report was filed in 1996 and in it he says he is 36 years old.
If the 1982 age were correct, he should be 35. On the other hand, if the 1993 age was correct he
should be 37. Which age do you accept as the age of the person? We represented the last age en-
tered in the database, because the person can’t be entered in the database again to show a differ-
ent age. If this person was reporting a violation, we stored his age in a different field, and when
performing calculations we used the best age possible. Still, it is not a guarantee that we had the
person’s correct age.
We also had problems with the Comment field. Some comments about the person were about
when he was a victim. Others were about when he was reporting a violation and others when he
was accused of a violation. This was because the victims, persons reporting a violation and perpe-
trator all share the same information; the individual can have more than one role. This was a pro-
gramming error and was fixed as soon as I found out about it.
For any dates used in the CEH database, we used the Russian date format, yyyymmdd (year,
month, day). We used this format so that we could use zero (00) in representing the month and day.
We had to, because on some dates we could not get any certainty at all, for example:
• In the winter of 1986
• Some day in January 1981
• At the end of Easter 1983
• I don’t remember when
The IHCH table is where we kept the information about the individual accused of the violation
(the perpetrator), such as CASE_ID, PERS_ID and IHCH number. This is basic information about
this person and useful if we knew that a person participated in a violation, but were not sure what
the violation was. More complete information on the person’s participation can be found in
IHCH_VIOL which included the CASE_ID, PATR_ID,VIOL_ID and evidence and responsibility.
The PERS_CLS table stores information on the activities of the victim, or to what organizations
the person belonged (such as union, religious organization, etc.). The information needed is the
PERS_ID and VICTCLS_COD (Type of victim according to our catalog).
230
Humberto Sequeira
Still working with the PERS table, we used the PERS_HIJO to store relations between people.
Here we show the relationships of the victims or persons reporting the violations to the CEH. This
is a good approach and worked for a while, but catalog limitations (imposed by ourselves) made us
misplace some information. For example, if a person only had one parent, we could not know if it
was the person’s father or mother, because we used Mother/Father as one category.
We used the group of tables PATR_DS, PATR_MU and PATR_TO to store information on
whether the victim was disappeared, dead or tortured, respectively. This information is basic and
the only thing required here is the victim’s form or screen number along with the PATR_ID.
The USERS table stored the information about users who entered information on the database.
This information included user code, name, security level and password.
Some of the other tables in the database are the catalogs, which we used to complete the
information, such as the Institutions, Type of Victim, Key word, Relation, Language, Nationality
and others. We hope to see them when the CEH authorizes the use of the information stored in the
database.
Problem Solution
Beginning and Ending date of the Case This can be taken from the Pattern form automatically
(scanning all violations)
Places where the case took place This can be taken from the Pattern form automatically
(scanning all violations)
231
Chapter Nine: The Guatemalan Commission for Historical Clarification
Other institutions where a violation was reported A catalog of Human Rights or other institutions can be
made, including military institutions
Only one interviewer can record the case More than one interviewer can participate in recording
the interview or case
Identified, Collective and Anonymous victims in the This should be eliminated from the case form, and can
Case form be obtained automatically from the Pattern and Victims
form
Identified and Collective victims from the Pattern This should be eliminated from the pattern form, and can
Form be obtained automatically from the Victims form
Where the victim lived when the violation occurred This should be coded from the Places Catalog
Language of the persons This should be the ethnic group to which they belonged;
their language should be stored in a separate location
Comments of the victims Should be shared in all the forms. Should be common to
the person
Group to which the person belongs Group cannot be repeated. The only case is when the
charge inside the group changes; thus we can keep a
history of the person (mostly related to perpetrators)
232
Humberto Sequeira
I created another sub-system for the Recommendations Team. In it, I showed the comments of
the deponents and allowed building keywords on each comment. We could produce statistics on
the most common keywords to represent the concepts appearing in the information being collected.
In the Appendixes to this paper, I give recommendations for system automation in similar proj-
ects (Appendix 1), detailed information on our SQL queries (Appendix 2), and my recommendations
for information integrity and security (Appendix 3).
233
Chapter Nine: The Guatemalan Commission for Historical Clarification
Lessons Learned
In Appendixes 1 and 3 I give the body of my recommendations for future similar projects. In
this section, I present other, more specific lessons that were learned in the course of my work on
the CEH project.
Case reports Started giving general date Show all the places and Must not be a user-editable
and place, Members of the dates of the violations in a field.
CEH increasingly wanted more concise manner, take Only so much information
exact information in their information from the can be shown in a listing.
reports violation (i.e., where it
occurred and when).
Team must decide how to
handle use of information in
case reports.
Violation pattern Sometimes users would open Know who opens the
screen screen and then stop. violation pattern and then
stops
Personal information Inconsistent ages reported by Make table with PATR_ID, This table is a modification of
individuals in different cases PERS_ID and AGE, to store the PATR_VICT table in use
the information of the at CEH, without the AGE
person’s age in more than field.
one case.
Screens Inconsistent information Make all the information on Discipline in the design
the form or screen process
consistent
Y2K problems Unpredictable Store dates in the Russian Easy to do, was done for
format using 00 for month CEH
and days
Hardware & Some machines had problems Remove the offending Knowing in advance which
software with certain installed virtual drivers in advance. drivers are likely to cause
drivers problems
Workplace Pressure on users resulting Database administrators May call for change in
from daily quotas and other should review with data management style of
factors, limited individual entry personnel to get database administrators
capabilities for pressure and speed without compromise Verification of experience
for work of accuracy, individual Getting users with similar
progress reports rather capabilities
than competition
234
Humberto Sequeira
Appendix 1
6
Recommendations on System Automation.
Introduction
In this Appendix, I make recommendations for system automation in any similar project. In my
work at the CEH, I became concerned about the CEH structure’s weakness with relation to the
group work that was about to be performed for the CEH report. Also, I make observations on the
CEH computing structure and compare its structure with other more general projects and data-
bases. I believe that if followed on similar human rights projects, the recommendations in this Ap-
pendix will enable future systems to avoid these problems.
Today’s technology makes our work easy: it helps us to plan, organize and administer work. In
collaborative interfaces such as that found in the CEH, the lack of planning in topology for com-
puter networks and data servers for document administration had negative effects, causing set-
backs and problems in the administrative process.
I document the need for software (locally/internally produced programs or packages from es-
tablished companies) and hardware (equipment and accessories). Without these two components,
it would be impossible to achieve these tasks. Today, the need for access to, and administration of,
information mandates these programs communicate with each other. Hence, the organization can be
seen as a sensitive, responsive, and interconnected system. Some of the necessary programs for
working on all aspects of the documents are so simple and standardized that they can be internally
or locally produced. However, others are so complex that the best investment might be to buy an
external package that produces the desired results and meets our information and communication
needs. The decision-makers in future projects should make these decisions explicitly to fit their
particular needs.
All of the programs should be multipurpose and have the same user-friendly and intuitive in-
terface. And of course, in the final analysis, they should meet both the needs of the user team and
the organization.
As to hardware, the shared use of available resources (laser printers, color printers, modems,
scanners, and hard discs) is one of the most pressing issues. By sharing resources, the organiza-
tion can focus on acquiring the correct peripheral equipment and reduce costs. For example, a
printer for every computer is no longer necessary. Information could be stored in a centralized
manner in archive servers, and these servers could be divided into work groups.
Our purposes are special and human rights organizations are almost invariably under-funded.
One solution is to contact the vendors of software in the beginning of a project, and possibly the
Business Software Alliance (BSA), to obtain licenses for each package. By explaining the purposes
and objectives of the organization, it is likely that the organization could obtain savings or dis-
counts in acquiring software. The software vendors may be responsive to knowing that the use of
their product will become public knowledge. This has in fact occurred frequently in similar situa-
tions. Often the vendors are willing to donate or provide at a large discount the version prior to the
current version they are marketing.
6
I thank Walter Sequeira and Eduardo Meyer for their participation in this paper.
235
Chapter Nine: The Guatemalan Commission for Historical Clarification
groups in which they would be allowed access to different documents generated and/or used in the
project. This card would be valid only during the employee’s physical participation in the project.
Physical security could be centralized with the personal identification number on the user’s
card; both the personnel and security offices would administer access to offices.
This magnetic code is generally the same as that seen on credit cards. The following benefits
would occur with the use of the magnetic identification system for project members:
• Any team member would be easily located.
• There would be oversight of the project team members’ time.
• Access to documents would appear more dynamic and beneficial (in combination with
point three) since every document would have an owner at all times (whether its place of
origin or the person who requested it) and would be easily located, ready for its return or
subsequent processing.
The project member’s photograph should be included on the identity card in order to make
visual confirmation simpler and faster. The identity card should be non-transferable and should be
destroyed once the employee terminates participation in the project.
Bar code identification system for documents, supplies, and office equipment
The bar code is a simple and economical way to label multiple classes of physical objects. In a
project of this magnitude, management of documents is a priority. Documents should be main-
tained in good condition while still being easy to access. Their security and sensitivity should also
be taken into account. Documents can be one page, lists, graphics or complete books; this makes
their complete identification necessary.
The use of the bar code is a viable and, at the same time, economical option. All of the docu-
ments that are accessible to project members should have a unique code created locally in the proj-
ect.
Frequent problems that occur in locating, receiving and delivering documents would be drasti-
cally reduced. With the combined use of project member identity cards, the following information
on documents could be known:
• to whom it was delivered
• who has it
• where it is
• how many times it has been requested and by whom
• what time it has been used and by whom
The use of bar codes for the office team cuts administrative costs and time. Thus, office team
control will be transparent and, above all, organized. The organization will have knowledge of the
location of teams and to whom they were assigned.
The administration of supplies in a project of this magnitude is of vital importance since it of-
fers:
• Projections that can be made monthly, tri-annually, biannually and annually.
• Reports on different teams’ and departments’ demands.
• Help in estimating the project budget using different available projections.
The use of bar codes together with the fourth theme of workflow would form a formidable
combination in relation to the access and classification of documents when the option of data entry
is used.
Structured wiring and necessary equipment for the project systems
None of the proposed systems in this paper would be feasible if the project did not have an
adequate infrastructure. The network is the spinal cord of the whole system. The design should
plan for such extra capacity in relation to the physical capacity of the location at the time of plan-
ning, to maximize workstations, and to estimate the different types of users that can occupy these
places and their volume of demands and production.
Different types of project teams and their members generate different demands, such as:
• Groups that work exclusively as a team with multiple documents.
• Groups of members modifying the same documents.
• Team members with pressing demands for manipulating graphs.
• Team members with high levels of demand for printing, some in color.
• Specific employees from the different databases used in the project.
• Mobile uses, such as portable or cellular PC’s.
236
Humberto Sequeira
All of these demands can be centralized in equipment (computers). The number of users will
determine these demands and their specific needs.
Once a locale is chosen, requests for networking should be solicited, always keeping in mind
an available route in case of growth.
One important consideration is the physical security of the wiring, since it cannot in any way
jeopardize the information that will flow in it. This is the main purpose the wiring fills for a company
or professional person, since the wiring necessary for linking more than one floor in a building is
laborious and the limitations of distance and security require different hardware and software selec-
tions and communication protocols.
Many kinds of available networks must be taken into consideration, and the types, models,
and features are constantly changing. At the current time (mid-1999), for an organization with fewer
than 200 members and individual teams with fewer than 30 members, I recommend a local area
Ethernet type network, the most popular, easiest to administer and most economical in a PC inter-
face.
Workflow system for automating the office
The organization of information, both electronic information and traditional, is a project of vital
importance. A new term in this realm is the workflow. With the combined use of hardware and
software in the form of computer programs, scanners, faxes, modems and others, documents can be
administered in an orderly manner.
With these new technologies documents can be:
• Indexed with a unique key
• Stored in a specific location
• Consulted on screen
• Printed on some device
• Sent by fax
• Accessed by different search methods
• Modified for various users
Often, a document needs to be revised or modified by more than one person until it arrives at
its final destination. Here is a typical workflow for working with such a document; it goes from
Workstation 0 (origination) to Workstation 5 (delivery of final document) after having been revised
by employees 1, 2, 3 and 4. The workflow is defined in such a way that the document can be re-
turned to the previous employee if it contains errors, or passed to the next employee if the previous
revision is satisfactory. The document can be accessed for consultation at any point in the flow
and other users can make modifications on it (if that is how it was defined in the flow). Maximum
time periods that an employee can have the document can also be established. If the system de-
tects that the maximum time has passed, administrative alerts are sent to the predefined users.
One must take into account that at no moment has the document been printed; all of the revi-
sions and modifications have been done electronically. This is a small example of a document. The
flow can be increased in complexity, demands, users, functions, etc. All of these functions define
an electronic office. Imagine how at the moment they are generated, all of our memoranda, faxes,
lists, graphs, etc., would have the ease of being sent to the necessary work posts.
One of the disadvantages of these systems is that the people who design the workflow must
have a clear idea of the needs and demands of every flow, since not all require the same restrictions
nor do they go to the same users. Another disadvantage is that it may be difficult initially for peo-
ple to work from the screen, rather than with hard copy. However, by providing them an adequate
monitor, an optimal resolution screen and an infrastructure of fast networks, these obstacles can be
overcome. The workflow will not completely replace paper since some lists, reports, and other
types of documents that call for revision will be printed.
There are different kinds of workflow software on the market today. Strategic project planning
is what determines if only one will be used or if the configuration will be a combination in relation
to the size of the work groups. That decision will determine the type of software to be used.
237
Chapter Nine: The Guatemalan Commission for Historical Clarification
others will have almost all the correct technology available. Availability of product and support in a
given geographical location can also influence the choice of hardware and software. My choice for
software is to go with Microsoft® products, except for the database, where you can choose from
many vendors with more robust products.
Operating System WINDOWS NT SERVER 4.0 or better WINDOWS NT WORKSTATION 4.0 or better
Any UNIX (Aix, Digital, SCO) or LINUX Windows 95/98 (If necessary access
flavor through emulation, ODBC or Middleware)
Software Network Will depend on OS, but TCP/IP for large Will depend on OS, but TCP/IP for large
Protocol projects. projects.
238
Humberto Sequeira
Appendix 2
Figure 1. Listings I.
Query From Table Comments
Figure 2. Violations, I.
Query From Table Comments
** This means we had three columns in the table to count the victims.
With the availability of the results from these queries, the number of requests from members of
the CEH team increased rapidly. Due to the organization of our Catalog, some of the killed were
7
Figure and table numbers in this appendix are sequential within the appendix and do not relate to the cap-
tions in the full paper.
239
Chapter Nine: The Guatemalan Commission for Historical Clarification
adding to the violation count (for example, Arbitrary Execution, Death as a Result of Violence). For
that reason we made the grouping shown in Figure 3, below.
Note: By using Generic Violation, you can obtain all the dead.
The analyses and investigations of the CEH team came to the point where they needed to in-
clude particular keywords. A typical such request would be expressed in narrative for as “a listing
of cases that include the keywords ‘Violence Against Children,’ ‘Territorial Movements and/or
Religious Attacks’ in the year 1982 in the province of Huehuetenango, including ‘Responsible
Groups.’” The response would be a level case listing such as in Figure 4, below.
Violations VIOL
240
Humberto Sequeira
Age (Categorized/Grouped) Initially for only the Identified Victims, but later for Collective and Anonymous
victims as well. We used the number –1 to identify the age for collective and
anonymous victims, and mapped the violations to the general violation table.
Date (Month-Year/Trimester/Semester/Year) This was to make it possible to divide the year in any way users wanted.
Place (Department and Cities) At first only Departments, afterwards including cities where desired by the
CEH teams.
Sex Initially for only the Identified Victims, but later for Collective and Anonymous
victims as well.
Forces Responsible (Institutional level This is where the violations count will be higher than that from the general
Perpetrator) (base) violations table because more than one perpetrator can participate in a
violation. The first production represented all the forces responsible. The next
version would group them to represent groups of interest to the CEH (i.e.
URNG, ORPA, and all the other guerrilla groups will be grouped in the
“Guerrilla Group.” The same was done with the Government Institutions.
Of course the options were mixed, and we ended up with table analyses for many topics. These
included: Age/Sex, Place/Language/Sex/Age, Place/Language, Place/Type of Victim, Place/Date,
Place/Force Responsible, Place/Date/Force Responsible, and analyses for specific purposes: Mas-
sacres, Non-Massacre Analysis, Non-Guerrilla Analysis, Government Forces Analysis, Range of
Victims Analysis, Non-Massacre Analysis, and so forth.
When half of the work was done, we decided to create table structures that would produce
consistent information. We built structures for Identified, Collective and Anonymous Victims.
These structures will lead to tables with information that will satisfy all the future requests we
could conceive at the time. If we had done this earlier, it would have simplified our work. These
structures were a success from both the user and programmer standpoints. With time, I became
quite glad that I had learned how to use SQL, which enabled us to easily and quickly program que-
ries to facilitate the work of the CEH teams.
241
Chapter Nine: The Guatemalan Commission for Historical Clarification
Appendix 3
References
Ball, Patrick. 1996. Who Did What to Whom? Planning and implementing a Large Scale Human
Rights Data Project. Washington, DC: American Association for the Advancement of Sci-
ence.
242
Humberto Sequeira
243
Chapter Nine: The Guatemalan Commission for Historical Clarification
244
Chapter 10
The Guatemalan Commission for Historical Clarification: Generating
Analytical Reports
Eva Scheibreithner
Introduction
245
Chapter Ten: The Guatemalan Commission for Historical Clarification
1
Open Database Connectivity is a Microsoft standard using drivers to access database files in a variety of
formats.
246
Eva Scheibreithner
would have had to assure that the result was correct if I had made calculations or manually added
columns as well. In addition, I would have needed an updated record of all files with all the sheets I
wanted to update.
Check New and Old Excel Files After the Import or Update of Data
I used a process to check the files that may appear time-consuming but which I felt was neces-
sary. I created a checking form (Figure 1, below) to standardize the process. I checked all the up-
dated data for the main questions: what, where, when, who, to whom and a special star-question, a
complex specific question involving the context (an example is given in the appendix).
First I compared the main violations. I required that all the totals agreed with the general over-
view. Then I looked for more specific details, comparing all the Excel files, one to the other, always
using three different attributes.
Generally I spent more time checking the information than making graphs or other statistical
outputs. Sometimes the process took weeks. Checking and updating about 40 different files, each
with at least eight sheets, and verifying with the checking form was time-consuming. But in the end
I found mistakes that had been made by people in the database chain (programmers, typists, and
analysts), confirming my belief that checking is imperative and cannot be neglected. Don’t auto-
matically trust what you see on the screen!
Update Data
Before an update in the database I always discussed with the programmer the DBFs I needed
updated. There were DBFs for which we had no further use, and with every step developing new
possibilities, we decided not to update old DBFs (those with no further use). When the programmer
got my list of still-useful DBFs, he updated them, and, when completed, passed the checked list to
me. By the time the CEH report was finished, only about 20% of all the DBFs created in the whole
process had been updated.
In November 1998 we updated the database. This was a busy period for statistical analysis.
The investigators were finishing their reports, which created a high level of demand for graphs and
statistics. I had a list of about 60 files to update. When I gave this list to the programmer asking him
to update the DBFs, he was concerned about the amount of work required in view of the total
workload on the system. So I took back the list and reviewed it. I eliminated another 25 files and in
the end we updated only about 35 DBFs. The programmer’s reaction was correct, as it is time-
consuming to update files, and such time should not be spent on files that aren’t going to be used
in the future.
Final Update
The final update was a more extensive process to be done in a limited time. I still had many files
to check, which I had not eliminated. After the normal update check I checked all the files for “white
cells” (cells without information). I then passed them on to the programmers to have the cases
checked individually to see if there was indeed no further information to enter in the cells or if an
error had been made. If needed, the correction was made and we then completed the update to final
form.
To be prepared for every possible request I still kept many of the files until the final update. If I
had eliminated more of them I wouldn’t have had to spend so much time checking them. After
checking I did a “white-cells-check,” looking for cells without any entries for age, gender, violation,
247
Chapter Ten: The Guatemalan Commission for Historical Clarification
region, etc. Usually the white cells errors were typing errors, but on occasion some of programming
was at fault. I still looked for the outliers, e.g., age=260, etc., and found some I had overlooked pre-
viously. As we had a special flag for “massacres,” I also checked the case number files for “massa-
cre.” We flagged a case as a “massacre” if it had at least five executed victims. We could detect
some “massacres” which had not been marked as “massacres” before, and others which were
flagged incorrectly. I had two files with the case numbers, so I could pass the case numbers I fil-
tered with these checks on to the data processors to check the cases again. These errors were both
typing and analysis errors.
Kinds of Analysis
Generally the information provided by the database was descriptive statistical information,
easy-to-read graphs and easy-to-understand figures (examples of some graphs are given in the
appendix). We primarily did calculations based on violation, but some special calculations were
based on victims. In general, there was no analysis based on cases. Usually we only produced sta-
tistical output based on the information the CEH collected.
I produced some graphs based on victims for the chapter on indigenous identity, to show the
percentage of individual victims identified for their ethnic characteristics. There were two excep-
tions. I made some graphs based on cases with the key word Massacre, in particular a time line
graph for the areas of military operation. In addition, I made some graphs from the information pro-
vided by the military (e.g., how many military commissioners they had recruited in the different re-
gions in the years of the armed conflict) that had not been collected by the CEH.
Lessons Learned
Problem Alternative used Lesson learned
248
Eva Scheibreithner
Copies from graphs originally Only black and white graphs It is better to prevent the problem than
printed in color, which led to used. to trust that everybody will know that
unidentifiable graphs. copies usually are only black and
white.
Output has to go out with some Layout of the sheet for Carefully check every outgoing graph
basic information for identification providing the basic information to see if all the variables of the layout
and checking purposes. which every output received. are changed and updated.
There was no check after me. Had to check even more It would be better to have someone
carefully. else as a security check.
249
Chapter Ten: The Guatemalan Commission for Historical Clarification
Lessons Learned
In this section, I discuss both the lessons learned and their implementation.
There was considerable similarity between the work for the CIIDH and CEH projects. Accord-
ingly, the analysts for these two projects, Herbert F. Spirer and myself, jointly prepared recommen-
dations for future large-scale human rights data analysis that appear at the end of this paper in Ap-
pendix 1, Data Analysis Recommendations.
No permanent person working Have the same person work on If not possible to have same analyst(s)
on statistical analysis, which led statistical analysis and output from start throughout project , establish a uniform
to unique outputs, and to finish. logical structure at the start.
inconsistent ways of archiving
and naming different layouts.
No records were kept. Because of the considerable effort that If it is too time-consuming to restructure
would have been necessary to recover the existing material and if you are still at
and reorganize the materials stored the beginning of the statistical analysis,
under the former inconsistent structure, create a good new system and take the
I started with a new recording system. loss of former material.
There was no detailed Immediate detection and correction of Start by asking for all the details you
information about how the data mistakes resulting from of need to know for working with the data
were processed before they misunderstandings concerning what (former calculations, what the variables
were used in the statistical was in the input data. mean, how are they calculated, etc.).
analysis
Sometimes the DBFs provided Discussed with the programmer until we Specify exactly the needs for producing
by the programmer contained found a middle way: I received only the the statistical output. This means exactly
too much information. blocks I wanted (plus a few more) specifying the variables requested.
Data has to be checked before I designed a large checking system with You can never check too much. It’s not
using for statistical analysis. a first rough total check on import of the so important how, but the important
data and another specific widespread thing is that there are checking steps.
check afterwards.
Updating only the minimal There were always too many files to Eliminate the files with no further use,
number of files used to meet the update. This led to a long updating as there will be new ones as the
needs of investigators. process. archive always grows.
Data after the final update has to Extensive checking methods for the final There may always be some mistakes
be as completely checked as update. that you will overlook, but it’s always
possible and cleaned. worth trying to eliminate all error.
We found mistakes when We found mistakes from typists, Detecting errors is necessary and
checking data. analysts, and programmers. positive, but it does not mean blaming
someone! Errors happen.
The investigators complained Alleviated by having one full-time The statistical analyst making the output
that they didn’t receive what analyst (me). is the one receiving the requests, and
they had been told they would should explain at the beginning how the
receive, because the investigators should make their requests
investigator receiving the output and what they can expect to get.
didn’t really know what
statistical output would look like.
250
Eva Scheibreithner
The person receiving the Constantly working to stay in touch. Only statistically skilled persons should
requests from the investigators receive requests.
was not the same person as the
one doing the statistical
analysis.
The person handing the output Constantly working to stay in touch. Set up system so that analyst physically
over to the investigator was not gives analyses to users. Analyst should
the same person as the one explain the meaning of outputs.
doing the statistical analysis.
Investigators didn’t receive May not be needed with statistically
explanations on what the output knowledgeable users.
was about, how it was
calculated, where the data had
come from.
When I started work, other I was able to correct this situation, but Only one qualified person produces
persons without statistical could not undo the problems of the past. statistical analysis to maintain control
understanding were obtaining and records.
statistical outputs. There was no
control over the outgoing Or, if more than one person produces
information. It wasn’t statistically statistical analysis, one person has to
checked, so mistakes went out, check everything for statistical
and incorrect records and correctness and maintain the records.
different layouts frustrated
investigators
Investigators haven’t been I held a class and tried to inform Periodic workshops for investigators on
educated in reading graphs and everyone about statistics. the use and interpretations of basic
understanding statistics, no statistics, explanation of the basic
explanations were provided. graphs.
Investigators deduced incorrect
explanations of the figures in
their chapters, even to the point
of misunderstanding the
meaning of the title of the graph.
Also, investigators
misinterpreted analytical findings
and made hypotheses that did
not correctly reflect the
analytical findings.
Many people working on the See above. The path from the producer to the final
project had problems using user of the statistics should be a short
statistical reasoning. This is as possible to guarantee a correct
quite common where there has result in the final version. Each added
been no training in statistical intermediary is a potential source of
methods. For example, people error or confusion, especially if they are
can confuse statements such not fully qualified. Education for
as “20 percent of the women in everyone is important; statistical
Rabinal were assaulted” and “20 reasoning can be unfamiliar to people.
percent of the women who
were assaulted were from
Rabinal.”
Programming develops within Identified within the file directory tree, Better to have everything written down
the process. The ability to every step another branch. and recorded in a logical structure.
identify the different steps must
be provided.
Output must be easily identified. Manually using a registration number Registration is necessary, but my way
and recording in the registration book. was very “artificial” as it was kept
manually in books.
251
Chapter Ten: The Guatemalan Commission for Historical Clarification
Output must be easily and Double file record archiving system to Copies of output are very useful for
quickly retrievable. provide the possibility to look for the examples, and as proof and
output by two different criteria. replacement, if needed later
The following table reviews some of the specific actions that I took in order to put learned les-
sons to work during the project. The positive effects of these actions are also shown in the table.
How it was Positive effects
I always put updated graphs and output in visible As the whole database team is involved in the process
places in the team’s offices to keep the team providing the data for statistics, they know the whole chain
informed. As almost everybody working in the and are interested in knowing what’s at the end. This led to
database offices had been living and working in a better identification with the group, refreshed their energy
Guatemala before the CEH started working, they and strength, and reduced the widely held distance to
were knowledgeable about the history and actual statistics. They also made their own personal hypotheses
situation. and interpretations leading to interesting discussions in the
team.
We produced a general updated overview with It was necessary to use common interpretations of technical
every database update, where I changed some terms to make the overview more understandable and easy
expressions into more understandable words before to read. Then the commissioners, the central team and all the
handing it over to the commissioners as part of our people working with statistical output received an easy-to-
agreement to keep them informed. read overview periodically and were pleased that they were
included in the process and could understand what they
received.
The output had the same layout and basic Led to a professional impression by the investigators and
information. made any one graph more official.
I started with one three hour workshop, inviting all The small audience that attended that meeting appreciated
the investigators, commissioners, team leaders, etc. I the effort and reported that they had learned a lot. From this
explained the main graphs used up to this moment; favorable experience came my idea of periodically providing
the data processor explained the different variables basic workshops in statistical reasoning.
and terms used and the programmer talked about the
lists of cases provided by him.
I started keeping records of the visits from the It was easier to prevent misunderstandings and to
investigators (unfortunately only for two months), reproduce acceptable materials later for the same person.
noting their concerns and wishes.
252
Eva Scheibreithner
Appendix 1
Introduction
As part of the process at the Experts’ Meeting, we jointly reviewed our experiences and les-
sons learned and have integrated them into this set of recommendations for data analysts who will
be carrying out similar missions in the future.
We make some recommendations that are explicit statements of procedures that we believe
should be followed to maintain the integrity of the data while producing analytical results that
faithfully report on the findings of the project. Such recommendations are those required for Verifi-
cation.
We make recommendations that are general and meant as guidance to the analysts. They are
for control of datasets, choice of statistical program, chart standards, an output identification sys-
tem, and education. In these cases, we hope and expect that analysts will recognize the validity and
value of our guidance and use it to formulate their own procedures and practices that are consis-
tent with the context in which they are working. Such recommendations are those concerning
Graph Standards.
Control of Datasets
As we have discussed, avoidance of error is critical in the analysis stage to maintain the credi-
bility of the final results. We have found that the following requirements are the minimum needed
to assure this freedom from major sources of error.
• The statistical analyst must maintain a current data dictionary. This data dictionary must
contain as a minimum, the variable (field) name, the meaning of the variable, and a list or
verbal description of the values that can appear in the corresponding field for the variable.
• The analyst must also maintain a cross-reference table of files and variable (field) names
so that the analyst and others will know which variables appear in which datasets and
which datasets contain a given variable.
• To avoid confusion among different versions of a dataset with a given name, the analyst
should use a separate directory (folder) for each version, numbered in accordance with the
sequence of the version. If database personnel produce these datasets through queries
and store the datasets in directories, they should organize the datasets in this manner.
2
Choice of Statistical Program
We used Excel in performing our analyses and both of us found it to have problems as de-
scribed in this paper. In addition to our statistical issues with this program, it had the disadvantage
of limited graphic output capability. This latter limitation caused significant problems in the produc-
tion of the reports. Those problems could have been avoided by the use of Encapsulated Post-
Script files.
Encapsulated PostScript (EPS) is a standard format for importing and exporting graphic files in
all environments. The EPS file is included as an illustration in other PostScript language page de-
scriptions and can contain any combination of text, graphics, and images. Unfortunately, not all
PostScript-enabled printers are able to print the EPS files, creating a hardware or software issue that
must be resolved to facilitate the analyst’s work.
In addition, Excel does not produce a log recording the actions taken by the analyst and the
use of Visual Basic macros for this purpose is dangerous. Unless the analyst is diligent in keeping
records, in the absence of the analyst, other personnel on the project or outside auditors may have
2
These observations also apply to the statistical work of the TRC.-PB.
253
Chapter Ten: The Guatemalan Commission for Historical Clarification
no way to recreate the analysis, except to try to repeat the process. Unfortunately, the analyst can-
not recover the actions taken to produce a result from that result except by reverse engineering.
We believe that the use of a particular program should not be dictated, and the analyst needs
the freedom to choose a program consistent with experience, abilities, and preferences. Balancing
of costs and benefits will lead to the best choice of a program. These considerations include the
skills and knowledge of the analyst. In a particular context compromise may be necessary.
Accordingly, we suggest the following as desirable goals.
• The graphical and tabular output will be in the form of Encapsulated PostScript files.
• Either the analyst or the program (preferred) will produce a detailed log of the actions
taken in manipulating the dataset to produce results.
• The analyst will use standard programs to make it easier for replacement analysts to check
the work rather than exotic programs or those not widely known.
Graph Standards
Choosing the appropriate graph to display information is a combination of technology and art,
essentially a creative process. To give specific rules is to stifle that creativity and in the long run,
will lead to results of limited value. Our approach to the visual display of our analyses conforms to
Tufte’s standards for Excellence in Graphical Representation, quoted in the paper The International
Center for Human Rights Research Investigation, in the section, Graphs: The Visual Display of In-
formation.
In addition to that general guidance, we recommend that:
• The purposes and needs of the data analysis be met in large part by strategic use of the
following types of graph: univariate time series plot (time line), overlaid time series plot,
vertical bar chart, horizontal bar chart, stacked bar chart, and histogram.
• The analysts avoid pie charts, which can be difficult to interpret and are often misleading.
• The analysts strive to avoid clutter, which means, among other things: use ticks, but don’t
use gridlines, don’t set charts in visible frames, and don’t use markers unless there is a
clear need.
• Any tables be spare, and without clutter. There are a number of examples of such table
layouts in the CIIDH report (Ball, Kobrak, and Spirer, 1999, pp. 70, 119, 122-3, 128-130)..
Verification
The need for verification derives from both the human and machine elements at work in the
process of statistical analysis. Among the sources of error are:
• programmer errors in preparing the datasets
• analyst errors in doing the analysis
• program faults inherent in the current version of the analysis software
• consequences of computer crashes
• hardware limitations, inherent in the hardware and possibly unknown to the analyst
• key-entry errors, which can occur at any stage from the initial to the final output
The ideal situation is when none of these errors occur. Analysts, programmers, and others can
with experience and motivation, reduce the number of errors generated, but they can never elimi-
nate them. No software is ever completely bug-free, and hardware is prone to both inherent flaws
and degradation. Thus, to have credible analytical results, we need a verification process for de-
tecting errors. To this end, we recommend the following to statistical analysts:
• Have programmers producing working datasets supply totals and extremes for all numeri-
cal dataset variables as a part of each version.
• Use these totals and extremes as a check on the changes from the prior version of the da-
taset.
• Check the dataset as received from the programmer.
• Base checks on Table 5, following. The analyst should maintain the summary described in
this table. If the analyst uses a program generating a log and allowing the use of stored
programs, such as Stata or SPSS, a summary will be automatically retained.
254
Eva Scheibreithner
Totals
Extremes
*-questions
Key tabulations of
categorical variables
Note: A *-question is some question about the data that will provide a check of context, such
as “what proportion of women were disappeared in the month of…?”
• Check the dataset as received.
• Check the dataset at every critical transition. When in doubt, check the dataset at every
change. Checking means comparing totals, extremes and *-questions for before and after
values.
• Be skeptical, vigilant, and scrutinize constantly.
• Since modifications of the dataset produced by the analyst to obtain particular outputs
produce new internal data sets, successive versions can be tracked by the use of upper
case suffixes; and versions from which outputs are produced by lower case prefixes.
Hence, BRTANONV14A could represent the data set obtained by filtering out all
violations except killings (RTANONV14A) and the specific subset of that data set used to
create the second variation of a bar chart as BRTANONV14A.
We note that the use of some identification and tracking systems is the key part of our recom-
mendation. Here we recommend a particular system based on our experiences, but other contexts
may call for other approaches.
255
Chapter Ten: The Guatemalan Commission for Historical Clarification
Education
The outputs of statistical analysis are a major part of the end result of a large-scale human
rights data project. They represent the physical realization of the logical process of drawing mean-
ingful conclusions about the data. To come to that point, a great deal of interaction among team
members is needed. Since most team members will not have had either education or experience in
statistical and analytical reasoning, we recommend that education in these topics be included in the
project plan and execution.
Education of the type we discuss will have the benefit of more effective, efficient work, and
better relationships among project team members. We make the following recommendations, under-
standing that their implementation will depend on the context and issues of resource limitations.
• Educational objectives are (1) how to interpret graphs and tables, (2) methods of descrip-
tive and exploratory data analysis, (3) the meaning of statistical statements, (4) how to
read titles and notes, (4) how to work with absolutes and percentages, and (5) how to
work with conditional statements about data.
• Project management should decide on what is best for the given project; whether the edu-
cational process should involve all team members, or functional groups, workshops or
classes, or continuing and periodic or episodic sessions.
• Because of the serious problems in how “statistics” is taught in schools, many people are
averse to the subject and it is usually necessary to mandate attendance.3 Team members
should know that practical data analysis often bears little relationship to the content of
conventional first statistics courses.
The amount of time required of team members for the educational process should be strictly
limited. Because much of the education in these methods will take place in the workplace, workshop
time can be limited to less than eight hours throughout the project.
References
Ball, Patrick, Kobrak, Paul, and Spirer, Herbert, 1999. State Violence in Guatemala, 1960-1996: A
Quantitative Reflection. Washington: American Association for the Advancement of Science
and Centro Internacional por Investigaciones en Derechos Humanos.
3
Teachers of statistics know of these problems. These issues are discussed at almost every meeting of the
profession. Unfortunately, there continues to be a difference between what teachers teach in the first course
in statistics and what these same people do when working in the field.
256
Eva Scheibreithner
257
Chapter Ten: The Guatemalan Commission for Historical Clarification
258
Chapter 11
The Guatemalan Commission for Historical Clarification: Generating
Analytic Reports
Inter-Sample Analysis
Patrick Ball
Introduction
This paper reports on an analytical study requested by the Commission for Historical Clarifica-
tion (CEH) and carried out by the American Association for the Advancement of Science.1 The
purpose of the study was to answer the question: How many people were killed in Guatemala dur-
ing the period of the CEH mandate, 1960-1996? To answer this question, we used the information in
three databases of human rights violations, resulting from three projects – one conducted by the
CEH, one by the CIIDH, and one by REMHI. These databases reported data from interviews with
direct witnesses and victims. As a consequence of having three sources, we must first ask a) how
many unduplicated killings are documented by the three projects? and then attempt to answer the
second question, b) how many killings were there in all during the internal armed conflict?
Our analysis deals with these two questions. We deal first with the information collected by
the three projects in light of the objectives of this analysis. We then explain the scientific methods
used to estimate rates and quantities that answer the specific empirical questions derived from
these objectives. We then present and interpret the results of applying the scientific methods to
the information from the three databases. We subsequently analyze four regions of Guatemala in
which genocide may have occurred during the period 1981-1983. Finally, using other statistical
methods we show that the three projects lead to similar implications about the patterns of violence
in Guatemala.
Note that in some tables, there are numbers that are not counts, but result from arithmetic op-
erations subject to rounding error. Thus, totals shown in the table will not exactly add up to the
totals of the related columns or rows. In some graphs we have retained the Spanish labels, as it is
our intent to present tables and graphs as they appeared in the CEH report.2
The Information
The three databases were created using information gathered from interviews with witnesses
and victims. Each contained a list of named victims who were killed, as well as numbers of people
who were killed but who could not be identified by name. The three projects did not define “politi-
cal killing” in the same ways. Therefore the measure we use in this study is deaths, and not the
more juridically precise term “extrajudicial execution” that is used elsewhere in the CEH report. The
three projects have unique definitions of murder, and to join them, it was necessary to use the
broadest possible definition of the killing as a human rights violation. Thus, the totals of deaths in
the AAAS study should be compared with the totals of deaths in the CEH report, and not with the
totals of extrajudicial execution.
Table 1 shows the number of documented killings (victims with and without names), by time
period, region, and database. Many killings were not reported to any project, and therefore, the
quantities presented in Table 1 are less than the total actual number of victims who were killed in
political violence in the period 1960 to 1996. Table 1 shows only those victims who were reported to
one or more documentation project.
1
At the request of the CEH, this analysis was conducted by Dr. Patrick Ball, Deputy Director of the Science
and Human Rights Program of the American Association for the Advancement of Science, with the assistance
and advice of Dr. Herbert F. Spirer (Adjunct Professor, Columbia University and consultant to the AAAS),
Dr. Frederick Scheuren (Senior Fellow of The Urban Institute, and Adjunct Professor at George Washington
University), and William Seltzer (Senior Researcher at Fordham University).
2
The meaning of the Spanish labels will be clear to any reader.
259
Chapter Eleven: The Guatemalan Commission for Historical Clarification
3
Table 1 : Number of documented killings (victims with and without names), by time period,
region, and project
4
Region CEH CIIDH REMHI
The three projects did not equally cover all of the regions. All conducted investigations in the
Ixil area (Region I) during the period 1978-1996, but only the CEH collected adequate information in
San Martín Jilotepeque (Region IV) 6. It is also clear that none of the projects adequately covered
3
Table 1 excludes the victims for whom the year or place of death is not known.
4
. Although the CIIDH also collected information from journalistic and documentary sources, this analysis
only includes the information from direct sources supported by the witness’ signature.
5
The definition of Region VI (the Zacualpa area) includes the municipios of Chiche and Joyabaj, and therefore
does not correspond exactly to the definition of the region in the section of the CEH report that examines
genocide. That section includes as Region VI only the municipio of Zacualpa.
6
The regions were defined in order to isolate areas in which there were big differences in the coverage rates
among projects.
260
Patrick Ball
the period 1960-1977, including the massacres of the 1968-1973. Any estimate must take these limi-
tations into account.
If no victims were reported to more than one project, the total of documented victims would be
the sum of the three totals, that is 24,910+8,533+21,200 = 54,643. However, many of the same victims
were reported to two or three projects. Thus, we cannot assume that the total number of victims is
equal to this simple sum.
The projects were managed independently, and each victim could have been reported to more
than one project. For example, assume that a victim Juan Pérez was murdered. His wife may have
reported the killing to the CIIDH in 1994; his son may have given testimony to REMHI in 1996; and
Peréz’s neighbor might have related the story to the CEH in 1997. If we simply sum the three data-
bases, Peréz’s killing will be counted three times.
Duplicated reporting of deaths in more than one database is called “overlap.” To estimate the
total number of victims reported by the three databases, the overlap between databases must be
estimated to reduce the contribution of each database by its overlap rate.
Two possible conditions demonstrate the limits of the overlap problem. If none of the victims
in any database appear in any other database, then the total number of victims of killing and disap-
pearance is equal to the sum of the number of victims in the three databases (54,643). This is the
upper limit to the number of such victims. The lower limit can be found in the extreme case that the
largest of the three databases (here, the CEH database) contains all the cases reported in the other
two (REMHI and CIIDH). In this case, the total number of killings is simply the number of killings
reported in the largest database (405+24,505=24,910). The sum of the total number of unique victims
in the three databases must fall within these two limits, that is, between 24,910 and 54,643. The pur-
pose of our analyses estimating the total number of documented killings is to narrow this range.
Many killings were not reported to any of the three projects. In the following section, we carry
out a scientific analysis to estimate the total number of killings, 7 including those not reported to
CEH, the CIIDH, nor to REMHI. The estimate from this analysis is that between 119,300 and 145,000
killings were committed, with the most likely figure being around 132,000. Figure 1 shows the prob-
abilities that the real value falls within various ranges around the estimate of 132,000.
7
Note that this analysis does not cover forced disappearances (they are handled separately). There was
insufficient time and resources to extend this analysis to disappearances.
261
Chapter Eleven: The Guatemalan Commission for Historical Clarification
Figure 1: Probability that the actual number of deaths falls within the indicated interval
0.045
0.040
0.035
0.030
0.025
Probability
0.020
0.015
0.010
0.005
0.000
112,470-113,127
113,783-114,440
117,724-118,381
119,038-119,695
122,979-123,636
124,292-124,949
128,233-128,890
129,547-130,204
130,861-131,517
134,801-135,458
140,056-140,713
141,370-142,026
145,311-145,967
146,624-147,281
150,565-151,222
115,097-115,754
120,352-121,008
125,606-126,263
136,115-136,772
137,429-138,086
142,683-143,340
147,938-148,595
116,411-117,067
121,665-122,322
126,920-127,576
132,174-132,831
133,488-134,145
138,742-139,399
143,997-144,654
149,251-149,908
Interval
Analysis of Overlap
The information in the three databases lists victims identified or estimated by witnesses. Some,
but not all, of the victims were identified by name.8 The total number of killings in each database is
referred to using the notation described below.
MCEH = the total number of victims in the CEH database
MCIIDH = the total number of victims in the CIIDH database
MREMHI = the total number of victims in the REMHI database
None of the three databases directly estimates the total number of killings in the country dur-
ing the full period of the CEH mandate. Each database is a list of victims of killings who were re-
ported directly to the project and verified according to the methodology of that particular project.
As has been mentioned, many victims were not reported to any project. The total number of victims
killed in Guatemala and reported (or not) to different projects can be expressed by eight categories,
defined below.
N000 = victims who were not reported to any of the three projects: not to the CEH, the
CIIDH, nor to REMHI
N111 = victims who were reported to all three projects
N110 = victims reported to the CEH and to the CIIDH, but not to REMHI
N101 = victims reported to the CEH and to REMHI, but not to the CIIDH
N011 = victims reported to the CIIDH and to REMHI, but not to the CEH
N100 = victims reported only to the CEH, and not to the CIIDH nor to REMHI
N010 = victims reported only to the CIIDH, and not to the CEH nor to REMHI
N001 = victims reported only to REMHI, and not to CEH nor to the CIIDH
The total number of victims of killing in Guatemala, N, is the sum of these eight values. The to-
tal number of victims reported to one, two, or three projects, Nk , is the sum of the seven categories
that are calculated directly from the databases, that is, N111 to N001 as shown in Equation 1, below.
8
This analysis treats victims, not violations, but for killings, the two measures are identical and so this
distinction is not significant. See Ball (1996).
262
Patrick Ball
Matching
It is difficult to find the same victim in any two or all three of these databases using a computer
program. Victims are reported with varying information. Identical names may be spelled differently,
sometimes because they were inconsistently or idiosyncratically translated from Mayan languages.
Dates of birth and death can be uncertain or wrong.
Thus, it is neither practical nor accurate to match databases by automated means with com-
puter programs. To find a person from one database (the source) in another of the databases (the
target), an analyst must compare all of the data relevant to the killing, including the name, place,
and date of the killing from the source with all the records in the target. This process we call
“matching.”9
Many victims are not identified by name in the databases. Often the original witnesses would
mention only a group of people. Different witnesses of the same event often estimate different
numbers of victims who suffered the same violations. In our analysis we assume that the match
rates between unnamed victims are the same as the rates among the identified victims.
Matching databases is difficult, tedious and time-consuming. Instead of trying to match all the
records of each database against all the records in the other databases, stratified random samples
of the victims identified by name in each database were selected and matched against the records
in the other two. The samples were proportionally stratified by region to assure that all regions
were covered. The number of records taken in each sample is denoted by the letter m (mCEH, mCIIDH,
mREMHI). Including all the regions, the total number of records sampled and matched was 1,412,
1,351, and 1,122, respectively (see Table 2).10
Each person sampled (from each of the three databases) was compared against the records in
the other two databases. When the same person was found in one of the other two databases, it
was noted as a double-match; when the record was found in both of the other two databases, it
was noted as a triple match.
Four groups of samples were chosen from the three databases. One analyst from the CEH
matched one group, and a second analyst matched the other three groups. Many records were de-
liberately included in both samples. Only in a small number of cases were differences between the
analysts' decisions found. The implication from this finding is that the error resulting from non-
sampling factors was minimal.11 The numbers of matches are shown in Table 2.
9
Furthermore, many victims are not identified by name in the databases. When witnesses mentioned a group
of victims without specifying the victims’ names, different witnesses often refer to different numbers of vic-
tims. Given the already-mentioned difficulty that witnesses often confuse the exact dates of the events, it is
not possible to match groups of unnamed victims. This analysis assumes that the match rates between un-
named victims are the same as the rates between named victims.
10
Of the records mentioned in the text, 498 were resampled and matched a second time. We refer to these
records in the analysis of the reliability of the matching.
11
In the match analysis, what concerns us is that records that are true matches do not escape the analysts. Of
the 498 records matched twice, 171 were true matches. Comparing these 171 records matched by two
different analysts, 88% were coded identically.
263
Chapter Eleven: The Guatemalan Commission for Historical Clarification
m111 21 73 19
m110 48 153
m011 121 27
m100 1,133
m010 1,004
m001 850
Table 2 shows that of the sample of 1,412 victims selected from the CEH database, 21 were
found in the CIIDH database and in the REMHI database; these 21 are triple matches. Forty-eight
victims in the CEH database were found in the CIIDH database but not in the REMHI database. In
addition, 210 more victims in the CEH database were found in the REMHI but not in the CIIDH da-
tabase. A total of 1,133 of the victims sampled from the CEH database were not found in either of
the other two databases. The interpretation of the other two columns is the same.
We obtained overlap rates shown in Table 3 by dividing each mx y z in Table 2 by the total num-
ber of victims sampled in each database.
r 100 80.2%
r 010 74.3%
r 001 75.8%
To interpret this table, note that r110 on the second line indicates that 3.4% of the victims in the
CEH database are also in the CIIDH database. The database of the CIIDH is smaller than the CEH
database; the same estimation from the point of view of the CIIDH is that 11.3% of the victims re-
corded in the CIIDH database are also in the CEH database.
Note that the differences in the estimations of the rates are not exactly in proportion to the dif-
ferences in size among the databases. The differences occur because of the variability that results
in the process of taking a random sample, and from the errors in matching. We treat these issues in
the later section on the analysis of error.
264
Patrick Ball
N100 19,663
N010 6,317
N001 15,955
However, to calculate Nk , the number of victims common to all three databases, the several es-
timates of the number of matched records (N111, N110, N101, and N011) must be reconciled. We used
the average of each of these four components, providing the totals shown in Table 5, below.
Table 5: Estimated number of killings in all three databases (CEH, CIIDH, and REMHI).
Mean
N111 393
N110 898
N101 3,943
N011 634
N100 19,663
N010 6317
N001 15,955
Nk 47,803
Thus, our estimate of the unduplicated number of reported killings in the three databases is
47,803. However, we show below, this number is subject to a number of controllable biases.
265
Chapter Eleven: The Guatemalan Commission for Historical Clarification
Figure 2b: Distribution of databases in the universe of all violations (total overlap)
Figure 2c: Distribution of databases in the universe of all violations (partial overlap)
In Figure 2a, the databases share no violations. In Figure 2b, all the violations are contained in
the largest of the three databases. In Figure 2c, some violations are shared. From the previous sec-
tion, it is clear that Figure 2c is the correct representation of the three databases.
Assume that the three projects operated independently and consequently, that the probability
that a project has testimony about a certain violation has no influence on whether another project
has testimony about the same violation. What implication does this have for the universe of viola-
tions? In Figure 2a, the implication is that the universe of violations is large because working inde-
pendently, the databases do not overlap. In Figure 2b, the implication is the inverse, that the one
database is contained within the next larger, and the next larger is contained within the largest. In
Figure 2c, which corresponds to our situation, the levels of overlap are partial. With the assump-
tion of independence and the reality of overlap, the number of violations in the universe can be
inferred.
266
Patrick Ball
Consider the case of two projects, PA and PB, whose databases have an overlap M in a uni-
verse of violations N.12 Note that the probability of any given killing being documented by Project
PA is Pr( A)= A that is N = A , and the probability of any given killing being documented by
N Pr(A)
Project PB is Pr( B)= B . The probability that a killing was documented by both databases, Pr(M), is
N
equal to Pr(M) = M/N, and by the definition of an event composed of two independent events,
Pr(M) = Pr(A|B) = Pr(A) * Pr(B).
Interchanging the terms, Pr( A) = Pr( M ) , which reduces to Pr( A ) = M / N = M
Pr( B ) B/ N B
Combining the first relation Pr( A)= A with the previous result gives us = M , and there-
A
N N B
fore N = AB . In order to estimate only the killings that were excluded from the two projects,
M
( A − M )( B − M ) , or in the notation of the three-database system,
N 00 =
M
N10 * N 01 (2)
N 00 =
N11
With the same logic, it is possible to derive an estimator for n000: the measure of the number of
killings that were not documented by any of the three projects.13 This estimator is presented below
in Equation 3.
N N + N 100 N 001 + N 010 N 001 (3)
N = 100 010
N 110 + N101 + N 011
000
12
This explanation is taken from Marks, Seltzer, and Krótki (1974, pp. 13-17).
13
See (Marks, Seltzer, and Krótki, 1974, equation 7.188) . Two possible estimators are given, but we chose
the one preferred in cases such as ours, where there is likely to be correlation bias.
14
This section is largely based on Wolter (1985, pp. 154-155).
267
Chapter Eleven: The Guatemalan Commission for Historical Clarification
The other beneficial result of the jackknife method is that the values of θ̂α are distributed
approximately normally.15 The standard error of the estimator (the square root of the variance) is
estimated in Equation 6.
k
1
SE (θˆ ) = ∑ (θˆ − θˆ )2
k (k − 1) α =1 α
(6)
15
The “pseudovalues” θˆα should be approximately independent and distributed identically. This assumption
was tested with a normal probability plot for each set of pseudovalues, and in each case the results were
consistent with this assumption.
268
Patrick Ball
269
Chapter Eleven: The Guatemalan Commission for Historical Clarification
N100 8,260 3,187 221 1,028 1,325 1,597 1,642 226 156 1,720 182 19,545
N001 5,228 3,999 295 926 59 765 1,166 77 1,099 2,054 106 15,773
Nk (without 17,679 11,870 844 3,328 1,418 2,569 3,416 347 1,339 4,501 396 47,706
duplication)
N000 38,856 17,397 466 6,467 0 5,548 5,836 561 2,265 5,052 2,019 84,468
SE (N000) 3,809 2045 105 1,152 0 1,826 1,890 350 3,062 995 1,840 6,388
N̂ 56,535 29,267 1,310 9,795 1,418 8,117 9,252 908 3,604 9,553 2,415 132,17
4
SE (Nˆ ) 3,918 2175 127 1,218 11 1,870 1,964 357 3,087 1,072 1,844 6,568
In Table 7, it can be seen that in Region 0 there were Nk = 17,679 killings documented in all
three projects. Over all regions, there were 47,706 killings documented, being the sum of the re-
gional estimations16. The standard error SE(Nk ) is not the simple sum of the regions, but rather the
square root of the sum of the squares of the regional values (i=0, I, …, X):
X
SE ( N kl ) = ∑ SE( N ki )2 Equation 7
i =0
Similarly, the values for N000 and for N are the sum of the regional values and the standard error
of N000 and N̂ is the square root of the sum of the squared regional values. In this way, we estimate
16
The estimation for Nk was 47,706 murders documented between all three projects, with a standard error of
228, yielding a 95% confidence interval of 47,559 to 48,152. Note that this range includes the value estimated
in Table 5, 47,803. The closeness of the value in Table 5 with the value estimated by the sum of the regions
by the jackknife method implies that there is not much bias in the simple estimation. Nonetheless, the bias
that required the disaggregation by regions may not have affected Nk, but yet might still affect N000, and is
therefore still necessary.
270
Patrick Ball
that there were approximately 84,468 killings that were not reported to the CEH, to the CIIDH, or to
the REMHI project. Summing Nk and n 000 to N̂ , we have as our final estimate, that there were
132,174 killings in Guatemala between 1978-1996, with a standard error of 6,568.
Matching errors
If the analysts who conducted the matching failed to find victims who were in multiple data-
bases, by accident or because there were inadequate data in the original sources, these omissions
would tend to depress the level of measured overlap and in consequence bias the estimation of n000
upwards. In preliminary investigations, (all that are possible given the partial state of many cases)
only minimal effects of this kind were found. At most, they amounted to about 12% of the final es-
timate of n 000, implying about 8% of N̂ . Considering the other sources of error listed in this section,
and recognizing that the data for this analysis were limited, we decided not to quantify this error (or
a correction for it) in the final analysis.
Internal duplication
All of the projects that receive information from primary sources may have problems with in-
ternal duplication that results from multiple reports of the same events.17 Internal duplication tends
to artificially increase the number of killings that are reported in a single database. All three projects
worked hard to clean their data to reduce internal duplications, but some always remain. In a pre-
liminary analysis, insufficient duplication appeared to require a correction for this source of error.
17
See, in this context REMHI (1998, pp. XXXI-XXXII), and Ball, Kobrak, and Spirer, (1999, pp. 62, note
12).
271
Chapter Eleven: The Guatemalan Commission for Historical Clarification
different amounts of identifying information. As mentioned earlier, we assumed that the match rates
for unnamed and named victims were the same.
To calculate the killing rate, the number of victims is estimated. We did this estimation twice,
first to get the total number of documented victims, and then to get the estimated number of victims
using the methods outlined above.
The following are the steps in this estimation process:
1. The number of murders that occurred 1981-1983, less those attributed to the URNG, were
calculated by the ethnic group classifications indigenous, not indigenous, and unknown,
for each of the six regions in the three databases. This step is analogous to Table 1 above,
272
Patrick Ball
but limited to killings attributed to the state during the period 1981-1983 and disaggre-
gated by the ethnicity of the victims.
2. The number of matched victims and the corresponding rates were calculated for each of
the six regions (logically following the method shown for Tables 2 and 3).18
3. The number of victims for each ethnic group was estimated using the regional rates of
overlap and the number of victims in each database (similar to Table 4).
4. The estimates were made for each region by taking the average of each of the three data-
base estimates (similar to Table 5).
5. The jackknife method was applied to each defined group, following equations 4, 5, and 6,
in order to estimate Nk and N̂ (and their standard errors) for each ethnicity in each region.
The values of Nk are presented below in Tables 9a and 9b, and those for N̂ in Tables 11a
and 11b.
6. Taking from Table 9a the victims with known ethnicity, the victims without known ethnic-
ity were apportioned to the categories “indigenous” or “not-indigenous” according to the
proportions shown below in Table 10, creating the figures shown in Table 9b.
7. With the information from Tables 8, 9, and 10, the proportion killed of each ethnic group in
each region was calculated, along with its standard error. The data, presented below in
Figure 3, explain inter alia that according to the information documented by the CEH,
CIIDH, and REMHI, more than 14% of the indigenous population in the Ixil area in 1981
were murdered by 1983, while in the same period and area, only 2% of the non-indigenous
population were killed.
18
The overlap rates were not calculated by ethnic group. Instead the regional match rates for the period 1981-
1983 were applied equally to the ethnic groups in that region. This application assumes that the overlap rates
did not vary significantly among ethnic groups.
273
Chapter Eleven: The Guatemalan Commission for Historical Clarification
Table 9a: Number of documented killings (Nk) in three databases, 1981-1983, by region and
ethnicity
Region I Region II Region III Region IV Region V Region VI
Non- Nk 32 2 13 16 8 6
indigenous
SE(Nk ) 0.49 0.07 0.33 0.14 0.12 0.13
Table 9b: The number of documented killings (Nk) in three databases, by ethnic group,
including victims without known ethnicity, 1981-1983
Region I Region II Region III Region IV Region V Region VI
Non-indigenous Nk 127 3 33 16 13 13
Table 10: Percentage of indigenous of victims with known ethnicity of all victims in Table 9a
Region I Region II Region III Region IV Region V Region VI
274
Patrick Ball
Figure 3: Documented proportion of the population killed by State forces in Guatemala 1981-
19
1983, by region and ethnic group, with the 95% confidence interval
16%
8%
6%
4%
2%
0%
Región I Región II Región III Región IV Región V Región VI
8. Note that the data presented for Region VI (Zacualpa area) in Figure 3 do not correspond
exactly to the statistics presented in the genocide section of the CEH report because the
definition of Region VI used here includes the municipios of Chiché, Joyabaj and
Zacualpa. In the genocide section of the report, only the municipio of Zacualpa is consid-
ered part of Region VI. The statistics for Region VI (and for all the regions) in Figure 3 and
in the genocide section were calculated with the same methods but with different popula-
tion and violation bases.
9. The projected totals ( N̂ ) by ethnicity and region were calculated using the same methods
described with equations 2-6, and with the same data as shown in Tables 8-10. The statis-
tics are presented in Tables 11a and 11b and rates are shown in Figure 4.
19
Source of the graph: 1981 census; testimonies received by the CEH, direct sources to the CIIDH, and
testimonies received by the REMHI project.
275
Chapter Eleven: The Guatemalan Commission for Historical Clarification
Table 11a: Number of projected killings ( N̂ ) in three databases, by ethnic group and region,
1981-1983
Region I Region II Region III Region IV Region V Region VI
Table 11b: Number of projected killings ( N̂ ) in three database, by ethnic group and region,
including victims without identified ethnicity, 1981-1983
Region I Region II Region III Region IV Region V Region VI
Non-indigenous N̂ 371 4 70 16 13 13
Figure 4: Projected proportions of ethnic groups killed by state forces in Guatemala 1981-
1983, by region and ethnic group, with 95% confidence interval
60%
pct. Indigena
50%
pct. No-Indigena
proporción asesinada
40%
30%
20%
10%
0%
Region I Region II Region III Region IV Region V Region VI
276
Patrick Ball
Note that Figures 3 and 4 have two interpretations.20 First, regions I and III were the most af-
fected by state violence (based on rates). In these two regions there are clear quantitative signs
that the killing was so massive that it could have been genocide. Second: in all the regions the vic-
tims were disproportionately indigenous. Note, for example that as shown in Figure 4, in Region I
more than 40% of the indigenous population was killed while approximately 8% of the non-
indigenous population were killed. The difference between the killing rates is a factor of five. In the
structure of violence committed by the Guatemalan state, these are revealing differences.
Coincidence in Time
If the months are ordered in terms of how many killings are reported according to each of the
three databases, a relatively high level of agreement is found. In Table 12, the top ten months are
shown ordered as described, presenting the percentages of the total number of killings during the
entire period 1979-1984.
Table 12: The ten most violent months in the three databases, 1979-198421
CEH CIIDH REMHI
Range Month Total Pct. Month Total Pct. Month Total Pct.
1 82-01 2,256 9% 82-02 610 12% 82-03 1,330 12%
2 82-03 2,253 9% 81-06 390 7% 82-02 807 7%
3 82-02 1,880 8% 83-03 297 6% 82-07 792 7%
4 82-08 1,819 8% 82-06 279 5% 82-05 657 6%
5 82-07 1,719 7% 82-07 234 4% 81-09 629 6%
6 81-01 1,423 6% 82-01 233 4% 82-01 470 4%
7 82-06 1,146 5% 82-04 222 4% 82-04 428 4%
8 82-04 937 4% 82-05 210 4% 81-07 397 4%
9 82-05 895 4% 83-08 180 3% 80-02 364 3%
10 81-09 754 3% 81-02 174 3% 82-10 360 3%
Ten month total 15,082 63% 2,829 54% 6,234 56%
Total for 1979-1984 23,890 100% 5,275 100% 11,065 100%
The shaded five months are those for which the three databases show concordance. Within
the ten worst months in each database, the three systems agree on five months: January, February,
20
Note that in both absolute and relative terms, the standard error for each statistic in Figure 3 is greater than
that for the analogous statistic in Figure 2. This is consistent, as the projections in Figure 3 incorporate more
uncertainty than the estimations in Figure 2. The size of the samples on which the estimation of the
unduplicated totals were based (Nk ) are sufficient for those estimations which do not have such high errors as
to make them unusable. The projection, however, still contains significant uncertainty, reflected in the higher
error rates.
21
In Table 12, only killings identified with dates precise to the month are included.
277
Chapter Eleven: The Guatemalan Commission for Historical Clarification
April, May, and July of 1982; other months of the same year (March and June) coincide in two of
the three databases.
The databases are in agreement that approximately half of the killings occurred in the ten worst
months (63%, 54%, and 56%). This concentration follows Pareto’s Law, which states that 80% of
any given phenomenon will occur in 20% of the categories. However, the closeness of these
months to each other in time (all of them occur toward the first half of 1982) is strong evidence that
this period is the most intense period of political violence. Furthermore, the level of agreement be-
tween the databases implies that although the projects did not investigate exactly the same regions
of the country, they found the same trends in time.
Guerrilla 5% 2% 6% 5%
22
This only includes violations with the perpetrator identified and with a date precise to the year. It is worth
reemphasizing that this analysis includes all killings, not only arbitrary executions.
23
The table is based on Figure 5, corresponding to Table 5 divided by the perpetrating entity, shown in
disaggregated form below.
278
Patrick Ball
Figure 5: Overlap rates for victims of killing documented by the CEH, the CIIDH, and by REMHI,
for violations committed by the state and guerilla forces (with bars to indicate the 95%
confidence intervals.).
16%
14%
12%
10%
8%
6%
4%
2%
0%
Estado Guerrilla
With the results of Figure 5, we rejected the hypothesis that there is a significant difference
between the level of coverage of violations committed by the guerrilla forces and those committed
by the state forces. Although there is a small difference in the overlap rates of the state forces
(12.4%) and the guerrilla forces (8.8%), the difference is within the standard error. Thus, the differ-
ence cannot be distinguished from the sampling error of the matching process.
The difference between the overlap rate for killings committed by the state and by the guerril-
las is significant, neither in technical terms, nor in analytic terms. The analysis of the effect on the
estimate proportions follows. The technical test is the following:
p E * (1 − p E ) p G * (1 − p G ) = 0.0193,
SE = +
NE NG
which yields the confidence interval +/- 3.8%. The difference between the two rates is 12.4% -
8.8% = 3.6%; the confidence interval is more than the difference, which means that we cannot reject
the hypothesis that the difference is equal to zero. This calculation confirms the intuitive interpreta-
tion from Figure 5.
The implication of Figure 5 is that all three projects investigated violations committed by the
guerrillas and violations committed by the state with approximately the same level of coverage and
intensity. Therefore there is no systematic disproportionality in the intensity of investigation be-
tween the two entities sufficient to change the interpretation of the proportions of responsibility
attributed to each.
279
Chapter Eleven: The Guatemalan Commission for Historical Clarification
There are two ways to consider the effect of that the overlap rates on the proportion of re-
sponsibility attributed to the state and the guerrillas. The proportions of attributed responsibility
that result from Nk estimated in note 23, are presented below. The average does not come from all
three databases, as implied in Table 13 in the text. Note that this analysis excludes violations for
which responsibility is unclear; adding the unknown category would reduce both proportions
slightly.
Nk 41,147 1,860
N̂ 114,769 5,567
To see the insignificant effect of the disproportionality on coverage, n 000 is calculated (using
Equation 3 for the state, but using Equation 2 for the guerrillas because the CIIDH did not report
sufficient guerrilla violations). The estimation of n 000 includes the information about the overlap
rates, and in this way n000 controls the effect of the disproportionality in coverage. Note that the
calculated proportions of N̂ are the same as those calculated for Nk . The conclusion is that the
disproportionality in coverage of the state and the insurgents does not change the final analysis
about their relative responsibility.
280
Patrick Ball
Appendix 1
281
Chapter Eleven: The Guatemalan Commission for Historical Clarification
Region I Quiché San Juan Cotzal Region X Santa Rosa San Juan Tecuaco
Region III Baja Verapaz Rabinal Region X Santa Rosa Pueblo Nuevo Las
Viñas
Region IV Chimaltenango San Martín
Jilotepeque Region X Santa Rosa Nueva Santa Rosa
282
Patrick Ball
283
Chapter Eleven: The Guatemalan Commission for Historical Clarification
References
Ball, Patrick, 1996. Who Did What to Whom? Designing and Implementing a Large-Scale Human
Rights Data Project. Washington: American Association for the Advancement of Science.
Ball, Patrick, Kobrak, Paul, and Spirer, Herbert, 1999. State Violence in Guatemala, 1960-1996: A
Quantitative Reflection. Washington: American Association for the Advancement of Science
and Centro Internacional por Investigaciones en Derechos Humanos.
Marks, Eli S., Seltzer, William, and Krótki, Karol J., 1974. Population Growth Estimation: A Hand-
book of Vital Statistics Measurement. New York: The Population Council.
REMHI 1998. Project Report, Guatemala: Nunca Más Tomo IV, Victims del Conflicto. Guatemala:
Oficina de Derechos Humanos del Arzobispado de Guatemala.
Wolter, Kirk M., 1985. Introduction to Variance Estimation. New York: Springer-Verlag.
284
Patrick Ball
285
Chapter Eleven: The Guatemalan Commission for Historical Clarification
286
Chapter 12
The Guatemalan Commission for Historical Clarification: Database
Representation and Data Processing
Sonia Zambrano
Introduction
In this report I review the processing and representation of information concerning human
rights violations and other violence that occurred during armed conflict in Guatemala from 1960 to
1996. The tasks of processing and representing information were conducted by the database team
of the Guatemalan Commission for Historical Clarification (CEH), which presented its final report in
February, 1999.
I analyze the database work as part of an integrated process that goes beyond the representa-
tion of information and involves all parts of the organization of a truth commission responsible for
reporting on large-scale violence. To achieve these aims, this report contains three parts. The first
part describes the internal capacity of the database and information processing. The second part
describes database functions in coordination with other CEH sectors. The third part contains my
conclusions and lessons learned based on my experience as the director of the CEH database.
287
Chapter Twelve: The Guatemalan Commission for Historical Clarification
6. The archive assistant was responsible for organizing the database archives, answering
demands for information by interviewers, and controlling the physical movement of infor-
mation to guarantee its integrity and security.
7. Nine analysts were responsible for analysis and preparation of information for subsequent
input to the database.
8. Six data entry specialists were responsible for inputting information to the database using
the program designed for that purpose.
The formation of the team involved a selection process (interviews and reviewing applicants’
backgrounds), hiring and training the selected people, and finally a process of frequent discus-
sions to guarantee methodological uniformity when processing information. Unfortunately, team
members were hired at different times. Not having the whole team together from the start meant a
loss of time, since bringing each new person up to speed entailed a new training and preparation
process before starting work. This affected the workflow and efficiency of the team members who
were already at work.
For example, information analysis started with five analysts. Unfortunately, a team of only five
people could not process the huge quantity of information in the desired time. This meant expand-
ing the team of analysts in the middle of the process, which called for repeating training for the new
people and discussions to establish a uniform methodology.
Forming the database infrastructure
This refers to forming teams, establishing the network, setting up programs, defining security
systems and protecting information, among other tasks.
Constructing the electronic database
This task involved designing and implementing the program, creating tables, defining rela-
tions, constructing the interface, testing and correcting the program, and so forth.
The consultants who started to work on creating the CEH database had to withdraw and could
not complete their part of the process. When new project managers arrived to help create the data-
base, we had to continue setting up the database at the same time field interviewers were collecting
information. Due to these difficulties, neither the database infrastructure nor the program for data
entry were finished when the interviewers had started collecting information in the field. This mis-
timing caused a setback in the data entry and analysis process. Consequently, it took longer than
planned before the database entry personnel could start to integrate the information that was ar-
riving from the field.
This delay caused a backlog of information at the input to the database and a gap between
collecting and systematizing information, which affected the coordination between these two
phases. This situation demonstrated to us how important it is that both the physical and electronic
infrastructure of the database be completely finished before starting to collect information so that
the database can start to input information as soon as collection begins.
Creating the Database Archives
This phase consisted of creating a system for receiving, classifying and filing cases, similar to
designing a consulting system and service to interviewers who requested information from the
database.
Creating the case archives was an activity that entailed considerably more work than was an-
ticipated. The archival work starts at a high level from the moment the work on the database begins.
Receiving, classifying and filing cases, and controlling information to guarantee its security were
tasks that required care and the full-time attention of one person to perform them.
The archival work included many unanticipated activities that required a lot of time but were
essential to complete. For example, the database had to respond to interviewers’ requests to con-
sult the physical archives. These requests were based on a list of cases the database prepared for
interviewers according to their specific needs. This activity continued during the time it took for
the analysis and preparation of the final report, and the systematized information in the database
was a vital resource for the CEH.
Controlling and guarding information — that is to say, its security and integrity inside the da-
tabase — depended on the organization of the archives. To achieve these goals, we had to devise
a strategy for information classification and movement (lending and filing) that allows for control-
ling and maintaining the integrity and security of the information.
288
Sonia Zambrano
Collecting Information
Although this process was not strictly part of the database, I briefly discuss it because it is the
step immediately prior to analysis and recording information and is directly related to information
processing.2
This phase consists of information collection by CEH interviewers, who collected approxi-
mately 11,000 testimonies (collated into 7000 cases) on human rights violations and other violence
that occurred in Guatemala during the armed conflict.
This was made possible by setting up 14 regional CEH offices in central locations around the
country. This large number of regional offices was needed to get the widest possible coverage.
Information was collected during about eight months, when the interviewers received testimonies.
These testimonies account for the primary direct information below:
1. Testimonies on cases of human rights violations and cases of other violence.
2. Testimonies on the general situation or the context in which violations were committed.
In addition, CEH interviewers also collected substantial information from other sources; docu-
ments, books, official institution and NGO reports, among others. To file and systematize all of this
information, we set up a documentation center, in which electronic databases and physical archives
were maintained.
The database team was in charge of systematizing testimonies that were received by CEH in-
terviewers. Interviewers wrote regional reports in which all of the information on context that inter-
viewers collected in the field was retained. These reports were also kept in the documentation cen-
ter. The information in the database and the information in the documentation center were com-
pared and used in conjunction for making theoretical analyses, formulating hypotheses and draft-
ing the final CEH report.
Methodology of information collection
The methodology for collecting information that was prepared for interviewers was limited to
creating information collection instruments, or record forms of cases and a glossary of violation
types. The classification of information was based on these tools.
Basic concepts, criteria for analysis and general categories of classification were not defined in
this step but were left for a later process that was necessarily developed for the most part inside
the database. There the basic parameters of information classification could be defined. This proc-
ess corresponded to a previous phase of defining the general CEH methodology, and was estab-
lished during the information analysis phase. This obligated the CEH to create parameters and on
many occasions reformulate criteria applied in the information collection process that was already
underway.
We tried to overcome such difficulties through ongoing contact between the interviewers and
myself as database director. We used these meetings to work towards common standards and on
the minimum necessary modifications, while trying not to affect the collection process that had
already begun. For example, the violation types were created before starting the collection process
and were ready when interviewers went to the field. However, there was no careful discussion by
different sectors of CEH concerning the violation types. Later when they were applied in the field
and interviewers were more familiar with them, the violation types were reformulated to adjust them
to the reality in the field. Unfortunately there were no uniform standards for collecting and inter-
preting deponents’ testimonies or accounts.
In the first months of data collection, quantitative information was given priority over qualita-
tive. Thus, the work focused on filling out record forms accompanied by a short summary of
events. Subsequently, the database sector started to insist on the importance of the testimonies
and the need to recover qualitative information to achieve a more complete report. This suggestion
led to more detailed testimonies with more information with which the database could achieve much
more.
Nevertheless, the collection of testimonies continued without uniform criteria. For example,
there was no clear definition of the importance of the testimony and what could be obtained from it.
Every interviewer oriented his/her interviews according to his/her training and personal interests.
As a result, testimonies often differed significantly and it was not easy to apply systematic criteria
2
A detailed discussion of this process is given in Chapter 8.
289
Chapter Twelve: The Guatemalan Commission for Historical Clarification
to classify them. Lawyers looked for testimonies geared toward knowledge of legal instruments that
would elucidate the facts, while those concerned with the sociology of the events emphasized so-
cial aspects in the context of the region and the consequences of events on the affected popula-
tion. Those who had political training favored interpretations that supported their hypotheses.
Since these three versions were fundamental to attain a complete overview, each was dealt with
independently. As a consequence, there were frequent omissions of information in testimonies
making for incomplete descriptions of reality.
Some interviewers, who were interested in specific themes such as violence against women
and children, emphasized this aspect in the testimonies, while in my view others did not place
enough importance on such themes. One could only count on the few testimonies that were taken
by interviewers who were interested in a particular theme, and they were insufficient to quantita-
tively analyze the phenomenon.
Similarly, while some interviewers wrote the testimony just as they had collected it (that is,
they recorded the original testimony given by the deponent), other interviewers filed their reports
introducing their own interpretations of events. Thus, the database analysts could not distinguish
between the deponent’s version and that of the interviewer. When performing analyses, it was
difficult to create adequate bases for analysis in all situations.
Another difficulty was the lack of clarity in the way forms were filled in. The most obvious
case concerned the question on the victim form regarding the “mother tongue.” The purpose of
this question was to determine the ethnic identity of the victim, an important element in Guatemala
where the indigenous population was the main victim of violence. Some interviewers correctly re-
corded the mother tongue spoken by to the victim or the victim’s community (Mayan, Spanish, and
others), but other interviewers recorded the language that the victim or the victim’s community
currently spoke. Since in many Guatemalan communities Mayans speak Spanish, the data was in-
correct in those cases. Even if the victim speaks Spanish, s/he was indigenous and that was the
primary concern.
A problem was detected when the information was analyzed, and it was no longer possible to
return the cases to interviewers to recover the correct information. In this situation, the database
team, with the help of several Guatemalan interviewers, backtracked case by case, cross-checking
the information with data obtained from the indigenous category in the glossary of victim types
and recovering the correct information. Fortunately, the process was a success, and the resulting
information gave statistical and qualitative support to prove that the indigenous population repre-
sented the great majority of the victims of violence.
The above shows that the definition of methodology and clear parameters for collecting infor-
mation are important, since they had direct bearing on subsequent information processing and the
effectiveness of its results and the efficiency of the process. When these parameters are drawn
later in the analysis phase, it necessarily affects the collection process.
Information collection instruments
To collect information, seven forms were created on which basic or necessary information was
recorded to subsequently obtain statistical results.
The forms that were created are shown in Table 1, following:
Case Summary Form Contains information on the case number, date and place of events, number
of victims, a summary of the case, and key words found in the summary
Individual Victim Form Contains information on names and surnames, age, sex, marital status, type
of victim, place and date of birth, and personal data regarding the victim
Collective Victim Form Description of the collective victim (group, family, village, etc)
290
Sonia Zambrano
Violation Pattern Form Contains specific data regarding the violation or violations that occurred;
date and place where it occurred, the perpetrator, level of certainty that the
violation occurred, level of certainty regarding responsibility for the violation,
level of certainty regarding the alleged perpetrator, and total victims who
suffered the violation or set of violations
Individual Case Form Contains information on names and surnames, age, sex, institution to which
the individual belongs, and his or her position
Individual Deponent Form Contains information on names and surnames, age, sex, type of deponent,
relation to the victim, and date and place of birth
Collective Deponent Form Specific data on the group, community or village that gave the collective
testimony
Although the forms were made before information was collected, they needed some modifica-
tions once they were tested in the field and discussed with interviewers. These alterations affected
the collection process that was already underway.
On some forms important questions were lacking and others were not precise. For example,
several questions that were important and that did not appear on the original form had to be added
to both the victim and deponent forms. The question regarding the type of victim (based on the
glossary of victim types that was described in detail in the section on analysis) was introduced in
the victim form. Also, the names and ages of victims’ children were registered. This information is
basic to determining whether more than one form listing the same given name and surname relates
to one or to several different persons.
A change concerning the type of deponent was introduced in the deponent form so that peo-
ple who approached the CEH to testify (victims, relatives, survivors, witnesses, refugees, displaced
persons, etc.) could be identified later. Likewise, the question on the relation of the deponent to the
victim was modified since initially the question presented the victim with respect to the deponent.
This created inaccuracies since the deponent frequently referred to several victims, and the original
only had space for one relation.
On other forms questions were repeated. For example, the control form was abandoned after
being used for some time since information was repeated in the summary form. This meant a loss of
time for interviewers who had to write the same information several times on both forms. Further-
more, there were difficulties inputting information in the database since the recorded data on every
form did not always coincide. For example, in many cases the summary form presented a different
date for an event than that which appeared on the control form, even though they were both an-
swers to the same question. Analysts could not ascertain which was the correct date since the in-
formation on which they relied was insufficient. That obliged them to ask the interviewer who had
recorded the case, but the interviewer usually could not recall.
The summary and the violation pattern forms suffered from a similar problem. Both asked the
same question to which there were frequently contradictory responses. For example, the summary
form asked the initial and final date. Only one date for every violation could be noted on the pattern
violation form, so that often the dates did not coincide. Analysts could not easily ascertain the
correct date, nor could interviewers recall the correct data or the reason why different dates ap-
peared.
Frequent contradictions were generated between the number of victims recorded on the sum-
mary form and the number recorded on the violation pattern form. To determine the count, informa-
tion contained in the pattern form was considered valid, since that form referred to every violation
and was more precise. The violation pattern form also created many difficulties for interviewers,
since they did not have enough time to complete the form and in many cases did not understand it.
The database team ended up having to complete and modify this form. These are just a few of the
examples of the problems that the forms created.
Modifications that were executed resulted in different forms (the initial and the final forms)
which made subsequent systematization of information difficult, since not all of the forms had the
same information, and consequently all of the information could not be fully utilized in analysis.
291
Chapter Twelve: The Guatemalan Commission for Historical Clarification
According to CEH’s experience, one can conclude that it is more convenient to reduce the
number of forms, and that forms should only contain the necessary information for a statistical
count and for case analyses. In my opinion, this information must be given priority over the whole
story or qualitative descriptions of events. The forms should not keep information that might not
be used, and above all must not repeat information. They should be previously tested in the field
before applying them definitively to adjust them to the reality of the country under study.
292
Sonia Zambrano
the final stages.3 This was so despite the importance of this knowledge for the general analysis
tasks. Such was the case for the category “disappearance by unknown cause” that illustrates and
describes the phenomenon of forced disappearances in Guatemala.
Creating glossaries and tables to classify information
This task was to construct the list of categories on which we based subsequent information
classification. The main glossaries were the Glossary of Perpetrators, Glossary of Victims, and the
Glossary of Key Words. The database team defined every category and kept interviewers informed.
However, it was impossible to create complete uniformity throughout the commission on category
concepts and meanings.
Glossary of Perpetrators
This glossary exemplifies the difficulty of creating complete uniformity. Some interviewers
spoke of paramilitary groups and others spoke of death squads in reference to the same type of
perpetrators. There was no consensus among interviewers on the concept of an “armed group,”
since some used this label when they were not certain of the perpetrator and others used it for
death squads lacking a specific name.
In the case of civilian self-defense patrols (PAC) and military commissioners, there was no
agreement on whether to include state agents. The database team anticipated this difficulty by
leaving the military, PAC and commissioner categories separate to count them independently—or,
if they wished—to count them later as one set.
Every team of interviewers requested different groupings for their analysis. One example is the
case of massacres. Information on massacres was requested where the general category was “fed-
eral agents” including the military, PACs, commissioners and death squads in one single group.
This led to the creation of specific archives solely for analyzing massacres.
Glossary of Victims
The criterion for defining the victim’s category was to consider the victim for his/her charac-
teristics, political or social activities, or conditions facing the armed conflict. Membership in these
groups represented possible causes of violence to victims. Using these categories made it possible
to determine the proportion of people killed in relation with respect to membership to one or more
of these groups. The last two group categories, social sector (peasant, day laborer, farm worker,
student, shop owner, professor, etc.) and civilian population, were not treated as categories that
are similar to the preceding ones. Rather, these categories were opened to record information on
the victim whenever it did not relate to the other categories.
This glossary allowed the team to find important information in cases. However, this informa-
tion could not be used to its full benefit because it was created after the collection of information
was already underway and interviewers did not readily grasp its utility. This did not prevent the
team from making analyses that highlighted tendencies. For example, the principal groups victim-
ized during certain years of violence could be identified. Also, the years when there was an in-
crease in violence against union members, students, or religious leaders could be determined. The
indigenous category made it possible to determine the proportion of the indigenous population
that was subject to the violence.
Glossary of Key Words
This is a list of themes or central ideas that the cases might contain, made with the purpose of
classifying information according to qualitative criteria.
The glossary was of great value, since through its use important information that appeared in
personal accounts was recovered. There was no other way personal accounts could have been
recorded and used to classify information. Interviewers could look up classified cases by themes,
which allow them to quote testimonies to support their arguments and make analyses of grouped
cases to determine tendencies and strategies in the development of violence in Guatemala. For ex-
ample, they could review as one set all cases in which there were massacres and cruel actions;
cases in which economic or labor conflicts were perceived to have been caused by a violent event;
cases in which there was violence against children or women, and so forth.
3
The central team was in charge of coordination of the work of the CEH. It included the Executive Secretary,
the Investigations Director, the Operations Manager and the Report Coordinator.
293
Chapter Twelve: The Guatemalan Commission for Historical Clarification
1 Direct witness
To set the three general levels of certainty of a case, the types of certainty of Tables
2a and 2b were combined as shown in Table 3. To interpret this table, note that (1) a Per-
petrator Certainty of Level 1 and a Event Certainty of Level 1 gives a Combined Certainty
of LEVEL 1 (entry in table), (2) a Perpetrator Certainty of Level 3 and a Event Certainty of
Level 3 gives a Combined Certainty of LEVEL 3 (entry in table), etc.
Certainty of Perpetrator
294
Sonia Zambrano
In accordance with the above table, cases were ultimately classified in three levels of
certainty. Level 1 cases consist of CEH’s best-supported cases on which strong argu-
ments could be based. Level 2 cases had a high level of doubt and Level 3 cases were
cases which usually could not be confirmed by CEH.
In the same manner as other aspects, when the systematization of this information
began, we could see that interviewers did not apply the same meaning to the same levels
of certainty. For example, the level “it is public knowledge” for some interviewers meant
that nearly the entire community assumed or had a general idea of who was responsible.
For other interviewers it meant that all of the people in the community had seen who was
responsible (i.e., direct witnesses of the event). Such situations generated significant dif-
ficulties when it came time to systematize the information.
Type of Responsibility
CEH authorities agreed to structure the types of responsibility as follows:
Actual perpetrator
Collaborator
Mastermind
Informer
As with the levels of certainty, although these categories were on the original forms,
interviewers’ interpretations were not uniform. The database team had to devise a strategy
to standardize the meaning of these categories, which involved case-by-case revision to
determine the correct category.
Using Secondary Sources
The criterion used for the database was that the credibility of the information entered
into the database was assessed based on the received testimony. Priority of credibility
was given by checking the certainty level described above. Other sources cited by inter-
viewers, such as books, NGO printed reports, etc., helped to corroborate information in
the case, but was not used as a source for assessing the certainty of the event if the tes-
timony had minimal conditions for credibility.
Reading cases, revising and classifying information
Every case (both forms and personal accounts) was read and revised using the meth-
odology previously defined and described. As the information analysis progressed, the
analysis methodology and the definition of criteria were being perfected through the on-
going revision and reformulation that the process demanded. This process involved fre-
quent discussions by the analysis team to unify and corroborate or modify criteria that
arose in the studying the cases.
Discussions served to:
• Apply previously defined typologies.
• Apply the glossary of key words, perpetrator types and victim types.
• Apply basic concepts such as massacre, case, deponent, victim and others.
• Define and perfect the operations strategy and coordination of the database team.
Although the discussion process and unification of criteria was done by the database team,
continuous contact was maintained with interviewers and the CEH central team to ensure to the
extent possible the uniformity of criteria and orientation between the database and other sectors of
CEH.
The methodology used in the database guaranteed a minimum rigorous standard for the sys-
tematization of information. It was consistent with the needs and objectives of the CEH and al-
lowed them to make good use of information collected by interviewers in the field. It was also an
important resource in subsequent analysis work, in formulating the hypothesis and composing the
final report.
295
Chapter Twelve: The Guatemalan Commission for Historical Clarification
Inputting Information
At the end of the analysis, every case passed through the data entry team in which informa-
tion was already prepared for input into the database with the use of a program developed in ad-
vance. As in the analysis process, the process of data entry continually gave rise to the need to set
up new or revise existing patterns or standards. For example, how to input data on different people
who have the same first name and surname? In this case the database team decided to assign a
number at the end of the first name to distinguish them: Pedro Coc, Pedro1 Coc, Pedro2 Coc. As to
how to handle the data that were blank on the form, the team decided that gaps should be filled
with the option “not stated” or “none” so that no blank spaces would remain in the database.
These are two examples of the many decisions of a practical nature that had to be made in re-
sponse to problems as the data entry process proceeded. This involved frequent discussions
among the members of the data entry team to agree on how information would be entered and to
establish uniform processing. Here, as elsewhere, the methodology was developed and polished
concurrently with the data entry.
296
Sonia Zambrano
were dates, places where the events occurred, types of victims, types of violations, perpetrators,
themes (list of key words), or any other criterion requested by the interviewer.
Based on lists prepared according to their needs, interviewers could consult physical archives
in the database (the case documents) grouped according to one or more specific criteria. For exa m-
ple, they could specifically search for cases that occurred in the Chajul municipality between 1982
and 1983, each of which could have specifications about the type of victims, alleged perpetrators,
key words that describe the case, number of victims, etc.
With the list, the interviewers could efficiently review case types and themes. They could se-
lect what interested them based on knowing in advance which cases would meet their needs. This
resulted in a reduction in time to get results. In addition, this process facilitated an ongoing ex-
change between the database team and interviewers. Through this exchange, they could precisely
define criteria for grouping lists of cases by searching for the best information to meet the inter-
viewers’ needs.
The production of this information continued throughout the preparation of the report. It in-
volved more than 200 lists of cases constructed to various selection criteria. This information al-
lowed for the efficient use of qualitative information contained in testimonies collected by CEH.
The testimonies could be cited in the report to illustrate analyses that were presented and served
as qualitative support for descriptions and hypotheses concerning the work.
297
Chapter Twelve: The Guatemalan Commission for Historical Clarification
For presentation purposes, the cases were organized under the following criteria applied in the
same order as below:
• Type of violation
• Place where the events occurred
• Perpetrator of the event
• Level of certainty of the event
Drafting the statistical annex
This annex is composed of statistical graphics used in the report and referenced to different
chapters.
Composing the chapter on statistical overview
This chapter consists of the statistical analysis and general interpretation of the database re-
sults.
Writing the chapter on database methodology
Conclusion
Despite the problems that arose during the course of the CEH project, the information system
designed and implemented by the CEH database team achieved its major objective, to be the CEH’s
primary source of information. This information, based on the testimonies collected directly from
victims of violence in the Guatemalan population, was the essential resource for analysis and
preparation of the final report.
The database team assured a rigorous standard of information handling. The results demo n-
strated that when the work of the database team is conceived as a part of a structured and inte-
grated process involving all sectors of the commission, problems can readily be solved and a suc-
cessful outcome achieved.
Lessons Learned
Problem Solution Issues
Lack of uniformity in Frequent discussion involving all concerned Three viewpoints perceived:
taking testimonies. personnel. legal, social science research
Testimonies often were and political. All viewpoints
different, reflecting essential to complete overview.
diverse backgrounds of All should be involved in
personnel. Hence difficult discussions.
to classify systematically.
Initial lack of recognition Recognition by coordinators of need to create Not clear how to make
of the dominant role of working database team at the initiation of the coordinators aware of the
database. project. critical role of the database
when project being defined.
Coordinators to allocate sufficient physical and
financial resources at start of project
298
Sonia Zambrano
Delayed and changing Greater emphasis on preparatory work in Until data collection process
definitions of defining methodology and parameters. has produced results, it may
methodology and clear not be clear just what factors
parameters for collecting are to be taken into account in
information. Collection definitions.
process adversely
affected.
299
Chapter Twelve: The Guatemalan Commission for Historical Clarification
300