[FREE PDF sample] Handbook of Data Structures and Applications Dinesh P. Mehta ebooks
[FREE PDF sample] Handbook of Data Structures and Applications Dinesh P. Mehta ebooks
[FREE PDF sample] Handbook of Data Structures and Applications Dinesh P. Mehta ebooks
com
https://textbookfull.com/product/handbook-of-data-
structures-and-applications-dinesh-p-mehta/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/probabilistic-data-structures-and-
algorithms-for-big-data-applications-gakhov/
textboxfull.com
https://textbookfull.com/product/advanced-data-structures-theory-and-
applications-1st-edition-suman-saha/
textboxfull.com
https://textbookfull.com/product/advances-in-big-data-and-cloud-
computing-proceedings-of-icbdcc18-j-dinesh-peter/
textboxfull.com
https://textbookfull.com/product/big-data-analytics-with-java-1st-
edition-rajat-mehta/
textboxfull.com
https://textbookfull.com/product/handbook-of-statistical-analysis-and-
data-mining-applications-2nd-edition-robert-nisbet/
textboxfull.com
https://textbookfull.com/product/cloud-native-data-center-
networking-1st-edition-dinesh-g-dutt/
textboxfull.com
https://textbookfull.com/product/cloud-native-data-center-networking-
architecture-protocols-and-tools-dinesh-g-dutt/
textboxfull.com
Edited by
Dinesh P. Mehta
Sartaj Sahni
MATLAB R
is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the accuracy of the text or
exercises in this book. This book’s use or discussion of MATLABR
software or related products does not constitute endorsement or sponsorship
by The MathWorks of a particular pedagogical approach or particular use of the MATLAB R
software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
c 2018 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data
and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright
holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us
know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by
any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any
information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or
contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organi-
zation that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a
separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation
without intent to infringe.
PART I Fundamentals
1 Analysis of Algorithms ....................................................................................................................................... 3
Sartaj Sahni
2 Basic Structures ................................................................................................................................................. 23
Dinesh P. Mehta
3 Trees ................................................................................................................................................................... 35
Dinesh P. Mehta
4 Graphs ................................................................................................................................................................ 49
Narsingh Deo
vii
viii Contents
PART V Miscellaneous
29 Tries ..................................................................................................................................................................445
Sartaj Sahni
30 Suffix Trees and Suffix Arrays ........................................................................................................................461
Srinivas Aluru
31 String Searching ..............................................................................................................................................477
Andrzej Ehrenfeucht and Ross M. McConnell
Contents ix
It has been over a decade since the first edition of this handbook was published in 2005. In this edition, we have attempted to
capture advances in data structures while retaining the seven-part structure of the first edition. As one would expect, the discipline
of data structures has matured with advances not as rapidly forthcoming as in the twentieth century. Nevertheless, there have
been areas that have seen significant progress that we have focused on in this edition.
We have added four new chapters on Bloom Filters; Binary Decision Diagrams; Data Structures for Cheminformatics; and
Data Structures for Big Data Stores. In addition, we have updated 13 other chapters from the original edition.
Dinesh P. Mehta
Sartaj Sahni
October 2017
xi
Preface to the First Edition
In the late 1960s, Donald Knuth, winner of the 1974 Turing Award, published his landmark book The Art of Computer Program-
ming: Fundamental Algorithms. This book brought together a body of knowledge that defined the data structures area. The term
data structure, itself, was defined in this book to be A table of data including structural relationships. Niklaus Wirth, the inven-
tor of the Pascal language and winner of the 1984 Turing award, stated that “Algorithms + Data Structures = Programs.” The
importance of algorithms and data structures has been recognized by the community and consequently, every undergraduate
Computer Science curriculum has classes on data structures and algorithms. Both of these related areas have seen tremendous
advances in the decades since the appearance of the books by Knuth and Wirth. Although there are several advanced and special-
ized texts and handbooks on algorithms (and related data structures), there is, to the best of our knowledge, no text or handbook
that focuses exclusively on the wide variety of data structures that have been reported in the literature. The goal of this handbook
is to provide a comprehensive survey of data structures of different types that are in existence today.
To this end, we have subdivided this handbook into seven parts, each of which addresses a different facet of data structures.
Part I is a review of introductory material. Although this material is covered in all standard data structures texts, it was included
to make the handbook self-contained and in recognition of the fact that there are many practitioners and programmers who
may not have had a formal education in computer science. Parts II through IV discuss Priority Queues, Dictionary Structures,
and Multidimensional structures, respectively. These are all well-known classes of data structures. Part V is a catch-all used for
well-known data structures that eluded easy classification. Parts I through V are largely theoretical in nature: they discuss the data
structures, their operations and their complexities. Part VI addresses mechanisms and tools that have been developed to facilitate
the use of data structures in real programs. Many of the data structures discussed in previous parts are very intricate and take
some effort to program. The development of data structure libraries and visualization tools by skilled programmers are of critical
importance in reducing the gap between theory and practice. Finally, Part VII examines applications of data structures. The
deployment of many data structures from Parts I through V in a variety of applications is discussed. Some of the data structures
discussed here have been invented solely in the context of these applications and are not well-known to the broader community.
Some of the applications discussed include Internet routing, web search engines, databases, data mining, scientific computing,
geographical information systems, computational geometry, computational biology, VLSI floorplanning and layout, computer
graphics and image processing.
For data structure and algorithm researchers, we hope that the handbook will suggest new ideas for research in data structures
and for an appreciation of the application contexts in which data structures are deployed. For the practitioner who is devising
an algorithm, we hope that the handbook will lead to insights in organizing data that make it possible to solve the algorithmic
problem more cleanly and efficiently. For researchers in specific application areas, we hope that they will gain some insight from
the ways other areas have handled their data structuring problems.
Although we have attempted to make the handbook as complete as possible, it is impossible to undertake a task of this mag-
nitude without some omissions. For this, we apologize in advance and encourage readers to contact us with information about
significant data structures or applications that do not appear here. These could be included in future editions of this handbook.
We would like to thank the excellent team of authors, who are at the forefront of research in data structures, who have contributed
to this handbook. The handbook would not have been possible without their painstaking efforts. We are extremely saddened by
the untimely demise of a prominent data structures researcher, Professor Gísli R. Hjaltason, who was to write a chapter for this
handbook. He will be missed greatly by the computer science community. Finally, we would like to thank our families for their
support during the development of the handbook.
Dinesh P. Mehta
Sartaj Sahni
April 2004
xiii
xiv Preface to the First Edition
MATLAB R
is a registered trademark of The MathWorks, Inc. For product information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA Tel: 508 647 7000
Fax: 508-647-7001
E-mail: [email protected]
Web: www.mathworks.com
Editors
Dinesh P. Mehta has been on the faculty of the Colorado School of Mines since 2000, where he is currently professor in the
Department of Computer Science. He earned degrees (all in computer science) from the Indian Institute of Technology Bombay,
the University of Minnesota, and the University of Florida. Before joining Mines in 2000, he was on the faculty of the University
of Tennessee Space Institute, where he received the Vice President’s Award for Teaching Excellence in 1997. He was a visiting
professor at Intel’s Strategic CAD Labs for several months in 1996 and 1997 and at the Tata Research Development and Design
Center (in Pune, India) in 2007. He has also received graduate teaching awards at Mines in 2007, 2008, and 2009.
He was assistant department head from 2004 to 2008 and interim department head from 2008 to 2010 in the former Depart-
ment of Mathematical and Computer Sciences and served as president of the Mines Faculty Senate in 2016–2017.
Dr. Mehta is the coauthor of the book, Fundamentals of Data Structures in C++ and coeditor of the Handbook of Algorithms
for VLSI Physical Design Automation. He serves as associate director of the ORwE (Operations Research with Engineering) PhD
program at Mines and is currently an associate editor of ACM Computing Surveys. His current research interests are in chemin-
formatics, computational materials, and big graph analytics.
Sartaj Sahni is a distinguished professor of computer and information sciences and engineering at the University of Florida. He
is also a member of the European Academy of Sciences, a Fellow of IEEE, ACM, AAAS, and Minnesota Supercomputer Institute,
and a distinguished alumnus of the Indian Institute of Technology, Kanpur. Dr. Sahni is the recipient of the 1997 IEEE Computer
Society Taylor L. Booth Education Award, the 2003 IEEE Computer Society W. Wallace McDowell Award, and the 2003 ACM Karl
Karlstrom Outstanding Educator Award. Dr. Sahni earned his BTech (electrical engineering) degree from the Indian Institute of
Technology, Kanpur, and MS and PhD in computer science from Cornell University. Dr. Sahni has published over 400 research
papers and written 15 books. His research publications are on the design and analysis of efficient algorithms, parallel computing,
interconnection networks, design automation, and medical algorithms.
Dr. Sahni is the editor-in-chief of the ACM Computing Surveys, a managing editor of the International Journal of Foundations
of Computer Science, and a member of the editorial boards of 17 other journals. He is a past coeditor-in-chief of the Journal of
Parallel and Distributed Computing. He has served as program committee chair, general chair, and a keynote speaker at many
conferences. Dr. Sahni has served on several NSF (National Science Foundation) and NIH (National Institutes of Health) panels
and he has been involved as an external evaluator of several computer science and engineering departments.
xv
Contributors
xvii
xviii Contributors
Dinesh P. Mehta
Haim Kaplan
Department of Computer Science
Tel Aviv University
Colorado School of Mines
Tel Aviv, Israel
Golden, Colorado
Peter Widmayer
ETH Zürich
Zürich, Switzerland
I
Fundamentals
1 Analysis of Algorithms Sartaj Sahni ..............................................................................................................................3
Introduction • Operation Counts • Step Counts • Counting Cache Misses • Asymptotic
Complexity • Recurrence Equations • Amortized Complexity • Practical Complexities •
Acknowledgments • References
1
1
Analysis of Algorithms*
1.1 Introduction...................................................................................................................... 3
1.2 Operation Counts............................................................................................................. 4
1.3 Step Counts ....................................................................................................................... 5
1.4 Counting Cache Misses.................................................................................................... 7
A Simple Computer Model • Effect of Cache Misses on Run Time • Matrix Multiplication
1.5 Asymptotic Complexity ................................................................................................... 9
Big Oh Notation (O) • Omega () and Theta () Notations • Little Oh Notation (o)
1.6 Recurrence Equations .................................................................................................... 12
Substitution Method • Table-Lookup Method
1.7 Amortized Complexity................................................................................................... 14
What Is Amortized Complexity? • Maintenance Contract • The McWidget Company •
Subset Generation
1.8 Practical Complexities.................................................................................................... 20
Sartaj Sahni Acknowledgments ..................................................................................................................... 21
University of Florida References................................................................................................................................... 21
1.1 Introduction
The topic “Analysis of Algorithms” is concerned primarily with determining the memory (space) and time requirements
(complexity) of an algorithm. Since the techniques used to determine memory requirements are a subset of those used
to determine time requirements, in this chapter, we focus on the methods used to determine the time complexity of an
algorithm.
The time complexity (or simply, complexity) of an algorithm is measured as a function of the problem size. Some examples
are given below.
1. Operation counts
2. Step counts
3. Counting cache misses
4. Asymptotic complexity
5. Recurrence equations
6. Amortized complexity
7. Practical complexities
* This chapter has been reprinted from first edition of this Handbook, without any content updates.
3
4 Handbook of Data Structures and Applications
EXAMPLE 1.1
[Max Element] Figure 1.1 gives an algorithm that returns the position of the largest element in the array a[0:n-1].
When n > 0, the time complexity of this algorithm can be estimated by determining the number of comparisons made
between elements of the array a. When n ≤ 1, the for loop is not entered. So no comparisons between elements of a
are made. When n > 1, each iteration of the for loop makes one comparison between two elements of a, and the total
number of element comparisons is n-1. Therefore, the number of element comparisons is max{n-1, 0}. The method max
performs other comparisons (e.g., each iteration of the for loop is preceded by a comparison between i and n) that are
not included in the estimate. Other operations such as initializing positionOfCurrentMax and incrementing the for
loop index i are also not included in the estimate.
The algorithm of Figure 1.1 has the nice property that the operation count is precisely determined by the problem size.
For many other problems, however, this is not so. Figure 1.2 gives an algorithm that performs one pass of a bubble sort. In
this pass, the largest element in a[0:n-1] relocates to position a[n-1]. The number of swaps performed by this algorithm
depends not only on the problem size n but also on the particular values of the elements in the array a. The number of
swaps varies from a low of 0 to a high of n − 1.
Since the operation count isn’t always uniquely determined by the problem size, we ask for the best, worst, and average
counts.
EXAMPLE 1.2
[SEQUENTIAL SEARCH] Figure 1.3 gives an algorithm that searches a[0:n-1] for the first occurrence of x. The number
of comparisons between x and the elements of a isn’t uniquely determined by the problem size n. For example, if n = 100
and x = a[0], then only 1 comparison is made. However, if x isn’t equal to any of the a[i]s, then 100 comparisons are
made.
A search is successful when x is one of the a[i]s. All other searches are unsuccessful. Whenever we have an unsuccessful
search, the number of comparisons is n. For successful searches the best comparison count is 1, and the worst is n. For the
average count assume that all array elements are distinct and that each is searched for with equal frequency. The average
Analysis of Algorithms 5
EXAMPLE 1.3
[Insertion into a Sorted Array] Figure 1.4 gives an algorithm to insert an element x into a sorted array
a[0:n-1].
We wish to determine the number of comparisons made between x and the elements of a. For the problem size, we
use the number n of elements initially in a. Assume that n ≥ 1. The best or minimum number of comparisons is 1, which
happens when the new element x
a[i+1] = x; // insert x
}
is to be inserted at the right end. The maximum number of comparisons is n, which happens when x is to be inserted at
the left end. For the average assume that x has an equal chance of being inserted into any of the possible n+1 positions. If
x is eventually inserted into position i+1 of a, i ≥ 0, then the number of comparisons is n-i. If x is inserted into a[0],
the number of comparisons is n. So the average count is
n−1 ⎛ ⎞
1 1 ⎝
n
1 n(n + 1) n n
(n − i) + n = j + n⎠ = +n = +
n+1 n+1 n+1 2 2 n+1
i=0 j=1
This average count is almost 1 more than half the worst-case count.
can be regarded as a single step if its execution time is independent of the problem size. We may also count a statement such as
x = y;
as a single step.
To determine the step count of an algorithm, we first determine the number of steps per execution (s/e) of each statement
and the total number of times (i.e., frequency) each statement is executed. Combining these two quantities gives us the total
contribution of each statement to the total step count. We then add the contributions of all statements to obtain the step count
for the entire algorithm.
EXAMPLE 1.4
[Sequential Search] Tables 1.1 and 1.2 show the best- and worst-case step-count analyses for sequentialSearch
(Figure 1.3).
For the average step-count analysis for a successful search, we assume that the n values in a are distinct and that in a
successful search, x has an equal probability of being any one of these values. Under these assumptions the average step
count for a successful search is the sum of the step counts for the n possible successful searches divided by n. To obtain this
average, we first obtain the step count for the case x = a[j] where j is in the range [0, n − 1] (see Table 1.3).
Analysis of Algorithms 7
1
n−1
(j + 4) = (n + 7)/2
n
j=0
This value is a little more than half the step count for an unsuccessful search.
Now suppose that successful searches occur only 80% of the time and that each a[i] still has the same probability of
being searched for. The average step count for sequentialSearch is
0.8 ∗ (average count for successful searches) + 0.2 ∗ (count for an unsuccessful search)
= 0.8(n + 7)/2 + 0.2(n + 3)
= 0.6n + 3.4
where the load operations load data into registers and the store operation writes the result of the add to memory. The add
and the store together take two cycles. The two loads may take anywhere from 4 cycles to 200 cycles depending on whether
we get no cache miss, L1 misses, or L2 misses. So the total time for the statement a = b + c varies from 6 cycles to 202 cycles.
In practice, the variation in time is not as extreme because we can overlap the time spent on successive cache misses.
ALU
main
L2 memory
R L1
Suppose that we have two algorithms that perform the same task. The first algorithm does 2000 adds that require 4000 load,
2000 add, and 2000 store operations and the second algorithm does 1000 adds. The data access pattern for the first algorithm
is such that 25% of the loads result in an L1 miss and another 25% result in an L2 miss. For our simplistic computer model, the
time required by the first algorithm is 2000 ∗ 2 (for the 50% loads that cause no cache miss) + 1000 ∗ 10 (for the 25% loads that
cause an L1 miss) + 1000 ∗ 100 (for the 25% loads that cause an L2 miss) + 2000 ∗ 1 (for the adds) + 2000 ∗ 1 (for the stores)
= 118,000 cycles. If the second algorithm has 100% L2 misses, it will take 2000 ∗ 100 (L2 misses) + 1000 ∗ 1 (adds) + 1000 ∗ 1
(stores) = 202,000 cycles. So the second algorithm, which does half the work done by the first, actually takes 76% more time
than is taken by the first algorithm.
Computers use a number of strategies (such as preloading data that will be needed in the near future into cache, and when a
cache miss occurs, the needed data as well as data in some number of adjacent bytes are loaded into cache) to reduce the number
of cache misses and hence reduce the run time of a program. These strategies are most effective when successive computer
operations use adjacent bytes of main memory.
Although our discussion has focused on how cache is used for data, computers also use cache to reduce the time needed to
access instructions.
Figure 1.7 is an alternative algorithm that produces the same two-dimensional array c as is produced by Figure 1.6. We observe
that Figure 1.7 has two nested for loops that are not present in Figure 1.6 and does more work than is done by Figure 1.6 with
respect to indexing into the array c. The remainder of the work is the same.
You will notice that if you permute the order of the three nested for loops in Figure 1.7, you do not affect the result array c.
We refer to the loop order in Figure 1.7 as ijk order. When we swap the second and third for loops, we get ikj order. In all,
there are 3! = 6 ways in which we can order the three nested for loops. All six orderings result in methods that perform exactly
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com