0% found this document useful (0 votes)
41 views7 pages

Preface: The Purpose of This Book

This document provides an introduction to natural language generation and describes the purpose and approach of the book. It discusses the key components of an NLG system including document planning, microplanning and realization. It also introduces a case study system that is used throughout the book.

Uploaded by

Yensid82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views7 pages

Preface: The Purpose of This Book

This document provides an introduction to natural language generation and describes the purpose and approach of the book. It discusses the key components of an NLG system including document planning, microplanning and realization. It also introduces a case study system that is used throughout the book.

Uploaded by

Yensid82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Preface

The Purpose of this Book


This book describes natural language generation (nlg), which is a subeld of articial intelligence and computational linguistics that is concerned
with building computer software systems which can produce meaningful texts
in English or other human languages from some underlying non-linguistic representation of information. nlg systems use knowledge about language and
the application domain to automatically produce documents, reports, help messages, and other kinds of texts.
As we enter the new millenium, work in natural language processing, and
in particular natural language generation, is at an exciting stage in its development. The mid-to-late 1990s have seen the emergence of the rst elded nlg
applications, and the rst software houses specialising in the development of
nlg technology. At the time of writing, only a handful of systems are in everyday use; but many more are under development and should be elded within the
next few years. The growing interest in applications of the technology has also
changed the nature of academic research in the eld; in particular, more attention is now being paid to software engineering issues, and to using nlg within
a wider document generation process that incorporates graphical elements and
other realities of the Web-based information age such as hypertext links.
However, despite the growing interest in nlg in general and applied nlg in
particular, it is often dicult for people who are not already knowledgeable in
the eld to obtain a comprehensive overview of what is involved in building a
natural language generation system. There are no existing textbooks on nlg;
most books in the area are either revised doctoral dissertations or edited collections derived from workshops and conferences. Textbooks in natural language
processing typically devote a single chapter to natural language generation.
There are some very good review articles, but most of these concentrate on
theoretical issues in nlg; and in any case, it is dicult to give a comprehensive
overview of a eld as rich as nlg in 20 or 30 pages.
The goal of this book is to ll this void, and to provide a resource which
describes nlg from the perspective of what is involved in building complete nlg
systems. The book is intended to serve students, academics working in related
elds, software developers, and other people interested in the area. In particular
i

ii
we hope to meet the needs of the following communities:
Students should be able to use our book as a textbook for a postgraduate
course in nlg, as supplemental reading in general courses on natural language processing, and as a general resource for learning about nlg in
institutions where formal courses on the subject are not available. We
know of many students who are interested in nlg but discouraged by the
fact that it is dicult to learn about the eld; we hope that our book will
encourage more students to pursue their interests in the area.
Academics working in related areas such as natural language analysis,
writing-support tools, and advanced hypertext technologiescan use our
book to understand the goals, underlying theories, representations, and
algorithms used in nlg, and how these relate to their own elds. We hope
this will encourage more interaction and cross-fertilisation between nlg
and these related areas.
Software developers working on applications which need to produce linguistic output, such as software which generates letters and reports, can use
our book to understand what nlg has to oer in building such systems,
and how to incorporate the relevant aspects of nlg into these systems.
We believe that one of the major impediments to the use of nlg in real
systems is the diculty of learning about the technology, and a major goal
for our book is to help remove this barrier.
Last but not least, we hope that the synthesis of nlg work presented here will
help dene a framework within which new and existing work in nlg can be
discussed. It is often dicult to compare or combine results from the work of
dierent researchers, because dierent people make dierent assumptions about
the inputs, outputs, and expected functionalities of the various components of
an nlg system. The problems in reconciling these dierences are sometimes exacerbated by a lack of relevant detail in the published research literature. In this
book we present an nlg system architecture which embodies one particular set
of assumptions about inputs, outputs, and the modularisation of functionality
within an nlg system. We do not expect researchers in the eld to adopt this
model wholesale, but our aim has been to provide a model that is suciently
well specied that it can be used as a basis for comparison of alternatives. We
have paid particular attention to the choice of the terminology used for component processes and representations, in the hope that this will help reduce
accidental inconsistencies, where apparent dierences between approaches do
not reect fundamental issues but are the consequence of relatively peripheral
decisions which could easily be changed.

The Approach Taken in This Book


The approach we take to nlg in this book is oriented towards the construction
of nlg systems. We discuss theoretical issues and models, but we generally keep

iii
these discussions short and rely on the primary sources for further detail, unless
we can relate these issues quite directly to the construction of nlg systems. For
example, we spend many pages discussing Rhetorical Structure Theory (rst)
and Systemic Functional Grammars (sfg), because a good understanding of
rst is very useful in the context of document structuring, and similarly a good
understanding of sfg is very useful when using popular realisation packages such
as surge and kpml. On the other hand, our discussions of speech act theory
and psychological models of word meanings are very brief: although these are
interesting topics in their own right, it is less obvious that they have a signicant
impact on building nlg systems.
Because of this approach, we also discuss engineering issues which are important to the developer but may be of less interest to the theorist: so, for example, we spend time discussing topics such as requirements analysis, domain
modelling, and knowledge acquisition. We also give some engineering-oriented
evaluations of when competing nlg techniques should be used, for example
when comparing schemas with more dynamic approaches to text structuring,
and when comparing the use of simple templates to more complex approaches
to syntactic realisation.
Throughout the book we illustrate the points we make by giving examples
from complete, working nlg systems. We make use of one central case study, the
sumgen-w system: designed to produce summaries of past weather conditions
on the basis of automatically collected meteorological data, this system was
developed by one of the authors and members of his research group at the same
time that this book was being written, in large part with the goal of being
used as a case study for this book. We hope that tracing the development of a
specic nlg system will help illustrate how the concepts we describe here can
be applied in practice.
Because sumgen-w only illustrates one particular point in the vast space of
possible nlg applications, we also include examples from many other complete
nlg systems. When possible we use examples from systems with which we have
had some personal involvement, such as idas, peba, and ModelExplainer;
but where appropriate we also use examples from other well-known applied nlg
systems, such as FoG and PlanDoc.
Our book is based around a reference architecture. Very briey, the reference architecture decomposes the nlg task into three modules: document
planning, microplanning, and realisation. Document planning is our
name for what is often called text planning, and includes content determination and document structuring. Microplanning includes linguistic aggregation, referring expression generation, and some aspects
of lexicalisation. Realisation includes syntactic processing, morphological processing, and orthographic processing. All of these concepts
are explored in detail in the book.
In terms of intermediate representations, the output of the document planner in our model is a document specification. This is a tree made up of
information-bearing units called messages, often with discourse relations
specied between parts of the tree. The output of the microplanner is a text

iv
specification: this is a tree whose leaf nodes specify phrase specifications
(our category-neutral term for what are often called sentence plans in the
literature) which can be processed by realisers (such as kpml and RealPro),
and whose internal nodes specify the logical structure of the document in terms
of paragraphs, sections, and so forth.
Broad support for an architectural decomposition along these lines can be
found in the literature; as argued in [Reiter 1994], something similar to our
reference architecture is adopted by most extant nlg systems.

Using the Book


The book consists of the following chapters:
Chapter 1Introduction: A denition of natural language generation; the
relation of nlg to natural language understanding, articial intelligence,
and computer science; a short history of nlg; some example nlg applications; the sumgen-w case study.
Chapter 2Building NLG Systems: When is nlg possible?determining
information needs from a corpus analysis; when is nlg appropriate?
contrasting nlg with graphics generation, mail-merge, and manual document authoring; issues in elding nlg systems.
Chapter 3Issues in Building NLG Systems: The inputs and outputs of
an nlg system; basic nlg tasks; the reference architecture; intermediate
representations; other possible architectures.
Chapter 4Document Planning: Domain modelling; messages; discourse
relations and rst; document planner architecture; content determination
summarising and reasoning with data, user and dialogue modelsl; knowledge acquisition; document structuringschemas and dynamic structuring.
Chapter 5Microplanning: Microplanner architecture; sentence aggregation and paragraph formation; referring expression generationtypes of
reference, discourse models, algorithms; lexicalisationrepresenting meaning, decision trees, graph rewriting, multilingual systems.
Chapter 6Realisation: Syntactic realisationbenets, grammar models,
systems (kpml, fuf, and RealPro), modifying a grammar; templates
benets, orthography, morphology, markups.
Chapter 7Beyond Text Generation: Documents vs text; formatting
types, choice rules, implementation; hypertextuses, choice rules, implementation; graphicshow to best present information, similarities between text and graphics, combining text and graphics; speech output
uses, information needed, implementation; further reading.

v
Chapter 8Conclusion: Summary; pointers to resources.
Appendix ALinguistic Concepts: basic syntax and morphology; systemic
functional grammar; rhetorical structure theory.
Appendix BAn Extract from the sumgen-w Corpus:
Appendix CNLG Systems: brief descriptions of 2030 well known systems, with pointers to the literature.
We end this preface by oering some suggestions on how the book can be
used.
All readers should read the Introduction and the Architecture chapter. In
particular, the description in the Introduction of the sumgen-w case study,
and the description in the Architecture chapter of the reference architecture
(including intermediate representations), are essential to understanding subsequent chapters.
The Requirements Analysis chapter is important for system builders, but
can be skipped by readers with primarily theoretical interests.
The Document Planning, Microplanning, and Realisation chapters are the
heart of the book, and describe techniques for performing specic nlg tasks.
Students should read all three of these chapters in depth. System builders and
academics with specialised interests may nd it appropriate to initially skim
through these chapters, identify relevant sections, and read these in depth.
The Beyond Text Generation chapter is optional, but we strongly recommend that students and system builders read it, because it provides a valuable
perspective on nlg. Researchers with specialised interests may nd this chapter
less useful.
In terms of background, we assume throughout the book that the reader
has some background in computer science and articial intelligence, and has
at least been exposed to the basic concepts in linguistics and natural language
processing. Appendix A provides some background information on language
and linguistics for readers with little exposure to these elds.

vi

204

You might also like