Rust Book en Us Shieber
Rust Book en Us Shieber
Shieber
2023.03.28
Preface
The emergence of the transistor sparked a revolution in integrated circuits and chips, leading to the
development of the central processing unit, large-capacity storage, and convenient communication fa-
cilities. Unix [1] was born out of the failure of Multics [2] , which subsequently gave rise to the Linux
kernel [3] and its various distributions [4] . By combining open source with network technology, the rapid
development of IT was made possible. Technological progress provides a platform and tools for prac-
tical ideas, while social progress creates new demands and passions, further promoting technological
progress. Although the upper layer of the computer world has given rise to the Internet, cloud comput-
ing, blockchain, AI, and the Internet of Things, the basic principles underlying the bottom layer remain
unchanged. The fundament of computing is the combination of hardware and its abstract data types and
algorithms. Whether it’s a regular computer, a supercomputer, or a quantum computer [5] , their functions
are built on some abstraction of data structures and algorithms.
This book focuses on the design, implementation, and use of abstract data types as they play an
important role in computer science. By learning to design abstract data types, one can better implement
programming and deepen their understanding of them. The algorithm implementations in this book are
not the most optimal or general-purpose engineering implementations, as these tend to be verbose and
fail to focus on key principles, which can be harmful to learning. Instead, the code in this book takes
different simplification measures for different cases, some using generics and others using specific types.
These implementation measures are intended to simplify the code and ensure that each segment of code
can be compiled and executed separately to obtain a result.
Prerequisite Knowledge
Understanding abstract data types does not require specific language, but writing code requires con-
sideration of specific forms, which in turn requires a certain level of proficiency in Rust. Although
readers’ familiarity with Rust and coding style may vary, the basic requirements remain the same. To
read this book, readers should ideally have the following abilities and interests:
• Ability to implement complete programs using Rust, including the use of Cargo, rustc, test, etc.
• Ability to use basic data types and structures, including structs, enums, loops, matches, etc.
• Ability to use Rust generics, lifetimes, ownership system, pointers, unsafe code, macros, etc.
• Ability to use built-in libraries (crates) and external libraries, and to set up Cargo.toml.
If readers lack these abilities, please refer to the section on Rust learning materials in Chapter 1 and
find some recommended books and resources to learn Rust first, before returning to this book. The code
for this book is available on Github, organized according to chapter and name. Readers are welcome to
download and use it, and to point out any errors they find.
This book and all code were written in the Ubuntu 20.04 environment, with Rust version 1.58. The
code environment has two types: code with and without line numbers. The former is used to show code,
while the latter displays results or other content. All code except for simple or explanatory code, which
does not give results, provides outputs. Shorter outputs are commented within the current code box,
while more complex outputs are placed in a code box without line numbers.
1
Preface
2
Preface
Acknowledgments
Rust is an excellent language that provides efficiency, safety, and convenient engineering manage-
ment tools, and it is gradually being integrated into the Linux kernel, making it a potential replacement
for some C/C++ work in the future. Although several Rust books are available, the author noticed that
there are no algorithm books about Rust, which led to many obstacles in the learning process. Therefore,
the author decided to create a simple and convenient Rust book that can help newcomers learn algo-
rithms and data structures. After extensive research [6] , thinking, organizing, and combining the author’s
learning experience, this book was completed. Despite the steep learning curve of Rust, with the right
direction and good resources, one can learn the language well. The author hopes that this book will make
a small contribution to the Rust learning community.
The primary aim of this book is to learn and promote Rust and give back to the entire open-source
community, which has enabled the author to learn and grow. The author expresses gratitude to Ping-
Cap for developing TiDB, the operating open-source community, and online courses. The author also
thanks Zhang Handong, Mike Tang and other members of the Rust Chinese community for organizing
Rust conferences and maintaining the community, Ling Hu Yi Chong for sharing Rust learning videos
on Bilibili Danmu website, Zhang Handong for promoting Rust language through his book, ”The Tao
of Rust Programming,” and RustMagazine Chinese monthly. The author also acknowledges the Rust
Foundation [7] , established by Mozilla, AWS, Facebook, Google, Microsoft, and Huawei, for creating a
platform that motivates the author to write this book, given the shortage of learning resources.
Finally, the author thanks the University of Electronic Science and Technology of China(UESTC)
for providing resources and environment. also thanks the care and help of the author’s mentor and fellow
students in the KC404 teaching and research room where the author learned various technologies and
cultures, grew up, and found the direction for his life journey.
S h i ebe r
3
Contents
Preface 1
1 Rust Basic 1
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Install Rust and its Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Books and Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Related Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.3 Community and Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 The History of Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Keywords, Comments, Naming conventions . . . . . . . . . . . . . . . . . . . . 4
1.4.3 Constants, Variables, Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.4 Statement, Expression, Computors and Flow Control . . . . . . . . . . . . . . . 10
1.4.5 Function, Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.6 Ownership, Scope, Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.7 Generic, Trait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.8 Enum and Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.9 Functional Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.10 Smart Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.11 Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.4.12 Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4.13 Code Organization and Dependency . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.14 Project: a password generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2 Computer Science 45
2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 What is Computer Science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 What is Programming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 Why Study Data Structures and Abstract Data Types? . . . . . . . . . . . . . . . . . . . 47
2.6 Why Study Algorithms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Algorithm Analysis 49
3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 What is Algorithm Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Big-O Notation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Anagram Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Brute Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
CONTENTS CONTENTS
5 Recursion 108
5.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 What is Recursion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.1 The Three Laws of Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.2 Converting an Integer to a String in Any Base . . . . . . . . . . . . . . . . . . . 111
5.2.3 Tower of Hanoi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3 Tail Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3.1 Recursion VS Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4.1 What is Dynamic Programming? . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4.2 Dynamic Programming VS Recursion . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
CONTENTS CONTENTS
6 Searching 123
6.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 What is Searching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3 The Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.1 Implementing an Sequential Search in Rust . . . . . . . . . . . . . . . . . . . . 124
6.3.2 Analysis of Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4 The Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.4.1 Implementing a Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.4.2 Analysis of Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4.3 The Interpolation Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.4.4 The Exponential Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.5 The Hash Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.5.1 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.5.2 Collison Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.5.3 Implementing a HashMap in Rust . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.5.4 Analysis of HashMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7 Sorting 145
7.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2 What is Sorting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.3 The Bubble Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.4 The Quick Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.5 The Insertion Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.6 The Shell Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.7 The Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.8 The Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.9 The Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.10 The Bucket Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.11 The Counting Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.12 The Radix Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.13 The Tim Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8 Trees 186
8.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.2 What is Tree? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.2.1 Vocabularies and Definitions of Tree . . . . . . . . . . . . . . . . . . . . . . . . 189
8.2.2 Tree Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.2.3 Parse Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.2.4 Tree Traversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.3 Binary Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.3.1 The Binary Heap Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . 202
8.3.2 Implementing a Binay Heap in Rust . . . . . . . . . . . . . . . . . . . . . . . . 203
8.3.3 Analysis of Binary Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.4 Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.4.1 The Binary Search Tree Abstract Data Type . . . . . . . . . . . . . . . . . . . . 210
8.4.2 Implementing a Binary Search Tree in Rust . . . . . . . . . . . . . . . . . . . . 211
8.4.3 Analysis of Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.5 Balanced Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.5.1 AVL Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.5.2 Implementing a AVL Tree in Rust . . . . . . . . . . . . . . . . . . . . . . . . . 224
CONTENTS CONTENTS
9 Graphs 237
9.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.2 What is Graph? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.2.1 Vocabularies and Definitions of Graph . . . . . . . . . . . . . . . . . . . . . . . 238
9.3 Graph Storage Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.3.1 Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.3.2 Adjacency List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.4 The Graph Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
9.5 Implementing a Graph in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9.5.1 The Word Ladder Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.6 Breadth First Search(BFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.6.1 Implementing a BFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.6.2 Analysis of BFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
9.6.3 The Knight’s Tour Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.7 Depth First Search(DFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
9.7.1 Implementing a DFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
9.7.2 Analysis of DFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.7.3 Topological Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.8 Strongly Connected Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.8.1 The BFS Strongly Connected Components Algorithm . . . . . . . . . . . . . . 278
9.8.2 The DFS Strongly Connected Components Algorithm . . . . . . . . . . . . . . 282
9.9 Shortest Path Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
9.9.1 Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.9.2 Implementing the Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . . . . 285
9.9.3 Analysis of Dijkstra’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
10 Practices 289
10.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
10.2 Edit Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
10.2.1 The Hammig Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
10.2.2 The Levenshtein Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
10.3 Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
10.4 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.4.1 The Bloom Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
10.4.2 The Cuckoo Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
10.5 Least Recently Used(LRU) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
10.6 Consistent Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
10.7 Base58 Encode and Decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10.8 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
10.8.1 The Principles of Blockchain and Bitcoin . . . . . . . . . . . . . . . . . . . . . 322
10.8.2 A Primary Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
10.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Chapter 1
Rust Basic
1.1 Objectives
• Install Rust and learn its toolchain
• Explore learning resources for Rust in various areas
• Review the fundamentals of the Rust programming language
1
1.3. LEARNING RESOURCES CHAPTER 1. RUST BASIC
$ rustup default nightly
After installation, you can use rustup to check the current version in use.
$ rustup toolchain list
stable-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu (default)
To switch between stable and nightly versions, please use the following command.
$ rustup default stable # nightly
2
1.4. REVIEW CHAPTER 1. RUST BASIC
1.4 Review
Similar to C/C++, Rust is a systems programming language. This means that concepts learned in
those languages can be applied to understand Rust better. However, Rust introduces unique concepts
like mutability, ownership, borrowing, and lifetimes, which can be both advantages and challenges.
3
1.4. REVIEW CHAPTER 1. RUST BASIC
The figure above provides readers with a clear understanding of the resource usage of various pro-
gramming languages. Rust’s energy consumption, time consumption, and memory consumption indica-
tors are all very favorable. While this comparison may not be completely accurate, the overall trend is
clear: Rust is energy-saving and efficient. Considering the critical point of climate change we currently
face, the use of energy-saving and efficient languages like Rust to develop software is in line with the
trend of Carbon Peaking and Carbon Neutrality Goals. I believe that it can be an important tool for
enterprise transformation and look forward to the industry and society reaching a consensus.
Rust has a wide range of applications that extends to command-line tools, DevOps tools, audio and
video processing, game engines, search engines, blockchain, the Internet of Things, browsers, cloud-
native, network servers, databases, and operating systems. Many universities and enterprises at home
and abroad are using Rust extensively. For example, Tsinghua University uses Rust in its new students’
rCore and zCore operating systems, ByteDance’s Feishu, remote desktop software RustDesk, PingCap’s
TiDB database, js/ts runtime Deno, and Google’s Fuchsia operating system.
4
1.4. REVIEW CHAPTER 1. RUST BASIC
1 //! This symbol placed at the top of a module file to control
2 //! the generation of library documentation and to describe
3 //! the functionality of the entire module.
4 /// This symbol is placed above the object being described to
5 /// control the generation of library documentation and to
6 /// describe functions or structures.
Rust’s documentation uses Markdown syntax, so using the # symbol is essential.
1 //! Math mod <-- doc comment, describe mode
2 //!
3 /// # Add <-- doc comment, discribe function, test case
4 /// This function sum the inputs
5 ///
6 /// # Example <-- test code, use case
7 /// use math::add;
8 /// assert_eq!(3, add(1, 2));
9 fn add(x: i32, y: i32) -> i32 {
10 // sum <-- regular comment
11 x + y
12 }
Naming conventions have been a topic of interest in programming, and Rust has its recommended
practices. Rust suggests using UpperCamelCase for class-level content and snake_case for value-level
content.
Item Convention
Crate snake_case
Type UpperCamelCase
Trait UpperCamelCase
Enum UpperCamelCase
Function snake_case
Method snake_case
Constructor new / with_more_details
Converter from_other_type
Macros snake_case!
Local variable snake_case
Static variable SCREAMING_SNAKE_CASE
Constants SCREAMING_SNAKE_CASE
Type Param The alphabeta of UpperCamelCase, like T, U, K
Lifetime lowercase,such as 'a, 'src, 'dest
In UpperCamelCase, initialisms and abbreviations of compound words count as one word. For ex-
ample, Usize should be used instead of USize. In snake_case or SCREAMING_SNAKE_CASE, words
should not be composed of single letters unless it is the last word. Therefore, btree_map should be
used instead of b_tree_map, and PI_2 instead of PI2. The following code exemplifies Rust’s naming
conventions, which this book follows, and readers are encouraged to follow as well.
1 // Enum
2 enum Result<T, E> {
3 Ok(T),
4 Err(E),
5 }
5
1.4. REVIEW CHAPTER 1. RUST BASIC
6
7 // Trait
8 pub trait From<T> {
9 fn from<T> -> Self;
10 }
11
12 // Struct
13 struct Rectangle {
14 height: i32,
15 width: i32,
16 }
17 impl Rectangle {
18 // constructor
19 fn new(height: i32, width: i32) -> Self {
20 Self { height, width }
21 }
22
23 // function
24 fn calc_area(&self) -> i32 {
25 self.height * self.width
26 }
27 }
28
29 // static and constant variables
30 static NAME: &str = "kew";
31 const AGE: i32 = 25;
32
33 // Macro definition
34 macro_rules! add {
35 ($a:expr, $b:expr) => {
36 {
37 $a + $b
38 }
39 }
40 }
41
42 // variable and macro call
43 let sum_of_nums = add!(1, 2);
As Rust’s popularity grows and its usage expands, a unified coding standard becomes necessary.
Currently, Professor Zhang Handong leads a good Rust Coding Standard, which readers can refer to.
6
1.4. REVIEW CHAPTER 1. RUST BASIC
1 // define constant(like #define used in C/C++)
2 const AGE: i32 = 1984;
3 // AGE = 1995; error, mutation not allowed
4
5 const NUM: f64 = 233.0;
6 // const NUM: f64 = 211.0; error, defined
Variables are defined using let and can have a value that is mutable or immutable depending on
whether mut is used during its definition.
1 let x: f64 = 3.14; // let defines x which can be reassigned
2 // but not mutated.
3 // x = 6.28; error, x is unmutable
4 let x: f64 = 2.71 // reassign x
5
6 let mut y = 985; // let mut defines y which can be reassigned
7 // and mutated.
8 y = 996; // y is mutable
9 let y = 2019; // reassign y
Finally, static variables are defined using static and can be mutable or immutable depending on
whether mut is used during its definition.
1 static NAME: &str = "shieber" // static variable which can be
2 // used as a constant
3 // NAME = "kew"; error,NAME is unmutable
4
5 static mut NUM: i32 = 100; // static mutable variable
6 unsafe {
7 NUM += 1; // NUM is mutable
8 println!("Num:{}",NUM);
9 }
The ”mut” keyword is a constraint in Rust programming, restricting variables from being changed
unless ”mut” is added in front of it. Otherwise, the variable is immutable. This differs from other
programming languages. Although static variables and constants have similarities, they’re different in
reality. Constants are replaced inline as many times as they’re used, while static variables are referenced,
with only one global instance. Static variables defined with ”static mut” are wrapped in ”unsafe” to
indicate that they’re not safe. It’s recommended to use constants and variables instead, forget about
static variables, and avoid coding errors.
Data types are the building blocks of a language, and Rust’s data types are similar to C, while some
are similar to Go. There are two types of basic data types in Rust: scalar and compound. Scalar types
represent a single value, such as integers, floating-point numbers, Boolean types, and character types.
Compound types combine multiple values into one, such as tuples and arrays, which are Rust’s two
native compound types.
Integers in Rust are numbers without a decimal part, classified into 12 types based on whether they’re
signed or unsigned and their length, denoted by ”i” for signed types and ”u” for unsigned types. A 64-bit
machine can handle a 128-bit number by storing it in segments and processing it using multiple registers.
”isize” and ”usize” are integer types that match the machine architecture. So on a 64-bit machine, ”isize”
and ”usize” represent ”i64” and ”u64,” respectively. On a 32-bit machine, they represent ”i32” and ”u32.”
Size Signed Unsigned
8 i8 u8
7
1.4. REVIEW CHAPTER 1. RUST BASIC
16 i16 u16
32 i32 u32
64 i64 u64
128 i128 u128
arch isize usize
Floating-point numbers in Rust are numbers with a decimal part, having two types: ”f32” and ”f64,”
with the default type being ”f64,” and both types being signed.
Size Signed
32 f32
64 f64 (default)
The boolean type in Rust is represented by ”bool” and has only two values: true and false, consistent
with other programming languages.
The character type ”char” in Rust is the most primitive type, similar to the C language. Characters
are declared using single quotes, while string literals are declared using double quotes. The variables
”c” and ”c_str” are completely different types, with characters being four-byte Unicode scalar values,
and strings represented as arrays.
1 // unicode scalar value
2 let c = 's';
3
4 // dynamic arrays
5 let c_str = "s";
Tuples are a type of composite value that combines multiple values of other types. Once declared,
the length of a tuple cannot be increased or decreased. Tuples use parentheses to enclose the values,
separated by commas. To retrieve values from a tuple, pattern matching and dot notation can be used,
with indices starting at 0.
1 let tup1: (i8, f32, i64) = (-1, 2.33, 8000_0000);
2 // pattern match
3 let (x, y, z) = tup1;
4
5 let tup2 = (0, 100, 2.4);
6 let zero = tup2.0; // use symbol . to get value
7 let one_hundred = tup2.1;
In Rust, the let keyword is not just used to define variables, but can also destructure values through
pattern matching. For example, x, y, and z can each obtain the values of -1, 2.33, and 80000000, re-
spectively, through pattern matching. Defining variables is also a form of pattern matching, as let is
essentially a pattern matching operation. The unit type in Rust is represented by an empty tuple (), and
when there is only one value, it is written as (). The unit value is a special value of this type, and is
implicitly returned when an expression does not return any value.
Unlike tuples, arrays in Rust must have elements of the same type, and the length cannot be changed
once declared. If a mutable collection is needed, Vec can be used instead. It allows for dynamic resizing
and is the preferred option in most situations.
1 // define arrays
2 let genders = ["Female", "Male", "Bigender"];
3 let gender_f = genders[0]; // indice element
4
5 // [type; num] define an array
8
1.4. REVIEW CHAPTER 1. RUST BASIC
9
1.4. REVIEW CHAPTER 1. RUST BASIC
10
1.4. REVIEW CHAPTER 1. RUST BASIC
11
1.4. REVIEW CHAPTER 1. RUST BASIC
4 } else {
5 false
6 };
Another way is with an if let statement, which executes code by pattern matching the value on the
right.
1 let some_value = Some(100);
2 if let Some(value) = some_value {
3 println!("value: {value}");
4 } else {
5 println!("no value");
6 }
In addition to the if let matching statement, match can also be used to control code execution.
1 let a = 10;
2 match a {
3 0 => println!("0 == a"),
4 1..=9 => println!("1 <= a <= 9"),
5 _ => println!("10 <= a"),
6 }
Rust provides various loop types, including loop, while, and for in. Using the continue and break
keywords, you can jump and stop code execution on demand. The loop keyword controls repeated code
execution until a condition is met.
1 let mut val = 10;
2 let res = loop {
3 // break loop and return value
4 if val < 0 {
5 break val;
6 }
7
8 val -= 1;
9 if 0 == val % 2 {
10 continue;
11 }
12
13 println!("val = {val}");
14 }; // a semi-colon here
15
16 // dead loop
17 loop {
18 if res > 0 { break; }
19
20 println!("{res}");
21 } // no semi-colon here
In contrast, a while loop computes the loop condition externally.
1 let num = 10;
2 while num > 0 {
3 println!("{}", num);
4 num -= 1;
12
1.4. REVIEW CHAPTER 1. RUST BASIC
5 }
6
7 let nums = [1,2,3,4,5,6];
8 let mut index = 0;
9 while index < 6 {
10 println!("val: {}", nums[index]);
11 index += 1;
12 }
Iterating through an array in Rust can be achieved through the for in loop, which is more convenient
than using index and length.
1 let nums = [1,2,3,4,5,6];
2
3 // iterate over array
4 for num in nums {
5 println!("val: {num}");
6 }
7
8 // iterate over array in reverse order
9 for num in nums.iter().rev() {
10 println!("val: {num}");
11 }
This approach is not only concise, but also reduces the chance of errors, as it avoids specifying the
array length explicitly. Additionally, iter().rev() can be used to traverse the array in reverse order.
The combination of while and let can be used to form a pattern match, eliminating the need to write
a stop condition. The let syntax automatically determines when the condition is met before continuing
while.
1 let mut v = vec![1,2,3,4,5,6];
2 while let Some(x) = v.pop() {
3 println!("{x}");
4 }
Overall, Rust provides many useful methods to control code flow, including match, if let, let if, while
let, which align with Rust’s coding specifications and are recommended.
13
1.4. REVIEW CHAPTER 1. RUST BASIC
1 // main function
2 fn main() {
3 let res = add(1, 2);
4 println!("1 + 2 = {res}");
5 }
6
7 fn add(a: i32, b: i32) -> i32 {
8 a + b
9 }
It is important to note that function definitions can be written before or after the main function, and
the return value can be written directly on the value line without adding a semicolon, although this syntax
is unique to Rust and adding a semicolon is not recommended. Functions in Rust modules are private
by default and require the pub keyword to export them to other programs.
In Rust, a program is composed of several parts.
package/lib/crate/mod
variable
statement/expression
function
trait
label
comment
The following example shows the various elements of a program.
1 // rust_example.rs
2
3 // import module from standard library
4 use std::cmp::max;
5
6 // public module
7 pub mod math {
8 // public function
9 pub fn add(x: i32, y: i32) -> i32 { x + y }
10
11 // private function
12 fn is_zero(num: i32) -> bool { 0 == num }
13 }
14
15 // struct
16 #[derive(Debug)]
17 struct Circle { radius: f32, // radius }
18
19 // implement a converter to convert f32 to Circle
20 impl From<f32> for Circle {
21 fn from(radius: f32) -> Self {
22 Self { radius }
23 }
24 }
25
26 // comment: custom function
14
1.4. REVIEW CHAPTER 1. RUST BASIC
15
1.4. REVIEW CHAPTER 1. RUST BASIC
16
1.4. REVIEW CHAPTER 1. RUST BASIC
On line: let z = &mut x;, an error is reported, indicating that there can be only one variable borrow.
This is done to avoid data competition, such as writing data at the same time. More than one immutable
reference can exist at the same time, because multiple immutable references do not affect the variable;
only multiple mutable references cause errors.
Now look at another situation. If the reader buys an electronic version of this book, then you can
give a copy to your friend by copying it (Please follow the intellectual property laws), so you both have a
copy of the book, and both have ownership of the book. Extending this concept in Rust gives us ”copy”,
”clone”.
1 fn main() {
2 let x = "Shieber".to_string(); // create a string
3 let y = x.clone(); // clone x
4 println!("{x}, {y}"); // x and y both are valid
5 }
The clone function, as the word literally implies, makes a deep copy of the data to y. Borrowing is just
getting a valid pointer, which is fast, while cloning requires copying the data, which is less efficient and
doubles the memory consumption. If the reader tries to write the following without the clone function,
and finds that it compiles and runs without errors, does it not satisfy the ownership rule?
1 fn main() {
2 let x = 10; // create an integer on the stack
3 let y = x;
4 println!("{x}, {y}"); // x and y both are valid. paradox?
5 }
In fact, let y = x; does not give 10 to y, but automatically copies a new 10 to y, so that x and y each
have a 10, so there is no conflict with the ownership rule. Because these simple variables are on the
stack, Rust implements a uniform trait for this kind of data called Copy, which allows for fast copying
without releasing the old variable x. In Rust, values, booleans, characters, etc. all implement the Copy
trait, so moving such variables is equivalent to copying. Here you can call clone as well, but it’s not
necessary.
As mentioned earlier, a reference is a valid pointer. However, invalid pointers are often found, similar
to the invalid dangling pointers in other programming languages.
1 fn dangle() -> &String {
2 let s = "Shieber".to_string();
3 &s
4 }
5
6 fn main() {
7 let ref_to_nothing = dangle();
8 }
The above code will compile with an error message similar to the following (with deletions).
error[E0106]: missing lifetime specifier
--> dangle.rs:1:16
1 | fn dangle() -> &String {
| ^ expected named lifetime parameter
| help: function's return type contains a borrowed value,
| but there is no value for it to be borrowed from
| help: consider using the 'static lifetime
1 | fn dangle() -> &'static String {
| ~~~~~~~~
17
1.4. REVIEW CHAPTER 1. RUST BASIC
In fact, the problem can be found by analyzing the code or the error message, the function returned
an invalid reference.
1 fn dangle() -> &String { <--- return a reference to a String
2 let s = "Shieber".to_string();
3 &s <---- return a reference to s
4 } <---- s is dropped here, and it's memory goes away
According to the ownership analysis, s release is satisfied, &s is a pointer, returned, and seems to be
fine, at most the pointer position is invalid. s and &s are two different things, the ownership system can
only check the data according to the three rules, but it is impossible to know that the address &s points
to is actually invalid. So why does the compilation error? The error message indicates that the lifetime
specifier is missing. You can see that the suspension reference and the lifetime are in conflict, and that’s
why the error is reported. This means that even if your ownership system passes, but the lifetime does
not, the error will be reported.
In fact, every reference in Rust has its own lifetime, i.e., the scope for which the reference remains
valid. The ownership system does not guarantee that the data is absolutely valid, and requires a lifetime
to ensure validity. Most of the time the lifetime is implicit and can be inferred, just as most of the time the
type can be automatically inferred. When there is a reference in the scope, it is necessary to indicate the
lifetime to show the interrelationship, so that the reference actually used at runtime is absolutely valid.
1 fn main() {
2 let a; // ----------+'a, a lifetime begins
3 // |
4 { // |
5 let b = 10; // --+'b | b lifetime begins
6 // | |
7 a = &b; // -+ | b lifetime ends
8 } // |
9 // |
10 println!("a: {}", a); // ----------+ a lifetime ends
11 }
a references b, and a’s lifetime ’a is longer than b’s lifetime ’b, so the compiler compiles with an
error. For a to reference b properly, then b’s lifetime must be at least as long as the end of a’s lifetime.
By comparing lifetimes, Rust can find references that don’t make sense and thus avoid dangling reference
problems.
To use variables legally, Rust requires that all data carry a lifetime specifier. Lifetimes are indicated
by single quotes ’ followed by a letter, such as &’a, &mut ’t. References in functions also require lifetime
markers.
1 fn longest<'a>(x: &'a String, y: &'a String) -> &'a String {
2 if x.len() < y.len() {
3 y
4 } else {
5 x
6 }
7 }
The pointed brackets are lifecycle arguments, which are generic arguments that need to be declared
in the pointed brackets of the function name and argument list to indicate that both arguments and the
returned reference will live as long as they do. Because Rust automatically infers the lifetime, it is
recommended to omit the lifetime specifier.
18
1.4. REVIEW CHAPTER 1. RUST BASIC
1 fn main() {
2 // static lifetime, lives for the entire duration
3 let s: &' static str = "Shieber";
4
5 let x = 10;
6 let y = &x; // can rewrite as let y = &'a x;
7 println!("y: {}", y);
8 }
In this section, we learn about ownership system, borrowing, cloning, scope rules, lifetime. The
ownership system is a memory management mechanism, borrowing and cloning are ways of using vari-
ables that extend the ownership system, and the lifetime is a complement to the ownership system, used
to solve problems such as dangling references that the ownership system cannot handle.
19
1.4. REVIEW CHAPTER 1. RUST BASIC
20
1.4. REVIEW CHAPTER 1. RUST BASIC
The outer_say_hello function above adds the generic constraint T: Greete, known as trait bound,
indicating that only a type T that implements Greete can call the say_hello function. the previous add
function can rewrite as below.
1 fn add<T: Addable>(x: T, y: T) -> T {
2 x + y
3 }
The Addable trait is a constraint that ensures that only types implementing the Addable trait can be
added together. This allows any type that doesn’t meet the constraint to be reported as an error at compile
time rather than waiting until runtime.
Another way to write the trait constraint is by using the impl keyword, as shown in the following
example. Here, t must be a reference implementing the Greete trait.
1 fn outer_say_hello(t: &impl Greete) {
2 t.say_hello();
3 }
Multiple trait bounds and multiple argument types can be combined using a comma and a plus sign.
1 fn some_func<T: trait1 + trait2, U: trait1> (x: T, y: U) {
2 do_some_work();
3 }
To avoid using multiple trait bounds in parentheses, Rust has introduced a where syntax that separates
the trait bound from the parentheses.
1 // where clause
2 fn some_func<T, U> (x: T, y: U)
3 where T: trait1 + trait2,
4 U: trait1,
5 {
6 do_some_work();
7 }
The implementation of the say_hello method in Greete for Student and Teacher allows each type
to encapsulate its own unique properties. While Student is more like a direct inheritance from Greete,
Teacher overrides Greete. Although they use the same say_hello method, Student and Teacher display
different states. The concepts of encapsulation, inheritance, and polymorphism are present here, which
seems to imply that the concept of classes has been implemented through impl, trait, and thus object-
oriented programming.
21
1.4. REVIEW CHAPTER 1. RUST BASIC
22
1.4. REVIEW CHAPTER 1. RUST BASIC
The match expression must be exhaustive, which means it should cover all possible matching pat-
terns. However, if you only need to match a specific pattern, using match can be cumbersome. Fortu-
nately, Rust generalizes the matching pattern of match and introduces the if let matching pattern. The if
let match counts all bills larger than $1.
1 let mut greater_than_one = 0;
2 if let Cash::One = cash { // only match Cash::One
3 println!("cash is one");
4 } else {
5 greater_than_one += 1;
6 }
23
1.4. REVIEW CHAPTER 1. RUST BASIC
The add_val function captures the external variable val. Closures can capture external variables
by taking ownership, reference, or mutable reference, which Rust defines using three function traits:
FnOnce, FnMut, Fn.
• The FnOnce trait consumes the captured variable from the surrounding scope and moves it into the
closure when it is defined. This trait indicates that the closure can only be called once.
• The FnMut trait obtains a mutable borrowed value, allowing it to change the external variable.
• The Fn trait obtains an immutable borrowed value from its environment.
These traits correspond to the move, mutable reference, and reference implementation of the owner-
ship system. All closures can be called at least once, making them implement the FnOnce trait. Closures
that use mutable references but do not move variable ownership into the closure implement the FnMut
trait, while those that do not require mutable access to the variables implement Fn.
The move keyword is used to force the ownership of an external variable to be moved into a closure.
1 let val = 2;
2 let add_val = move |x| { x + val };
3 // println!("{val}"); error: use of moved value: val
In Rust, the ”for in” loop is used to iterate through an array, which is an iterator that implements
the iteration functionality by default. Iterators pass each element in a collection to a processing logic in
sequence, allowing specific operations to be performed on the items in a sequence. They are responsible
for iterating through each item in the sequence and determining when to stop.
By default, iterators must implement the Iterator trait, which has two methods: iter() and next().
iter(), return a iterator
next(), return the next element
The iter() method can be divided into three types depending on whether the data can be modified
during iteration.
method return type
iter() return a immutable iterator
whose element type is &T
iter_mut() return a mutable iterator
whose element type is &mut T
into_iter() return a immutable iterator
whose element type is T, original data
consummed
In Rust, iterators can be classified into ”reentrant” and ”non-reentrant” types based on their behav-
ior towards the original data after iteration. A ”reentrant” iterator allows the original data to be used
again after iteration, while a ”non-reentrant” iterator consumes the original data. The iter(), iter_mut(),
and into_iter() methods are used to implement iterators for reading values, changing values, and taking
ownership of the original data, respectively.
1 let nums = vec![1,2,3,4,5,6];
2
3 // 1.iter immutable
4 for num in nums.iter() {
5 println!("num: {num}")
6 };
7 println!("{:?}", nums); // use nums after iter
8
9 // 2.iter_mut mutable
10 for num in nums.iter_mut() {
24
1.4. REVIEW CHAPTER 1. RUST BASIC
11 *num += 1;
12 }
13 println!("{:?}", nums); // use nums after iter_mut
14
15 // 3.into_iter transfer nums into iterator and consume nums
16 for num in nums.into_iter() {
17 println!("num: {num}");
18 }
19 // println!("{:?}", nums); error: use of moved value "nums"
In addition to transferring ownership of the original data, iterators can also be consumed or regener-
ated. Consumers are special operations on an iterator that convert the iterator into a value of another type.
Examples of consumers include sum, collect, nth, find, next, and fold. They perform operations on the
iterator to obtain the final value. Producers, on the other hand, are adapters that traverse the iterator and
generate another iterator. Examples of adapters include take, skip, rev, filter, map, zip, and enumerate.
In Rust, an iterator itself is considered an adapter.
1 // adapter_consumer.rs
2
3 fn main() {
4 let nums = vec![1,2,3,4,5,6];
5 let nums_iter = nums.iter();
6 let total = nums_iter.sum::<i32>(); // sum is a consumer
7
8 let new_nums: Vec<i32> = (0..100).filter(|&n| 0 == n % 2)
9 .collect(); // Adapter
10 println!("{:?}", new_nums);
11
12 // calculate the sum of all numbers less than 1000
13 // that are divisible by 3 or 5
14 let sum = (1..1000).filter(|n| n % 3 == 0 || n % 5 == 0)
15 .sum::<u32>();
16 // conbine adapter and consumer
17 println!("{sum}");
18 }
The following code combines adapters, closures, and consumers to find the sum of all integers less
than 1000 that are divisible by 3 or 5, showcasing the power of functional programming. This approach
allows for complex calculations to be performed in a concise and efficient manner. In comparison, imper-
ative programming may result in lengthy and difficult-to-understand code. Therefore, it is recommended
to use closures with iterators, adapters, and consumers for functional programming.
1 fn main() {
2 let mut nums: Vec<u32> = Vec::new();
3 for i in 1..1000 {
4 if i % 3 == 0 || i % 5 == 0 {
5 nums.push(i);
6 }
7 }
8
9 let sum = nums.iter().sum::<u32>();
10 println!("{sum}");
11 }
25
1.4. REVIEW CHAPTER 1. RUST BASIC
Functional programming is a programming paradigm that is distinct from other paradigms such as
imperative programming and declarative programming. Imperative programming is an abstraction of
computer hardware, involving variables, assignment statements, expressions, control statements, etc.
Structured programming and object-oriented programming, which are commonly used, fall under this
paradigm. The main focus is on the steps that the computer executes, instructing it on what to do step
by step.
In contrast, declarative programming expresses program logic as data structures, concentrating on
what to do rather than how to do it. SQL statements are an example of declarative programming. While
functional programming and declarative programming share a common idea of focusing on what to do,
functional programming is not limited to declarative programming. It is an abstraction of mathematics,
describing computation as the evaluation of expressions. In other words, a functional program is a
mathematical expression. The functions in functional programming refer to mathematical functions,
which map the independent variable to the dependent variable y = f (x). The output of a function
solely depends on the input parameter value, not on any other state. For instance, the sin() function in
mathematics calculates the sine value of x. As long as x remains unchanged, the final result will always
be the same, no matter how or when it is called.
26
1.4. REVIEW CHAPTER 1. RUST BASIC
In Rust, Deref and Drop are the two most important traits for smart pointers. By implementing
Deref and Drop for a custom data type similar to Box, we can better understand the difference between
references and smart pointers.
1 // define tuple struct
2 struct SBox<T>(T);
3 impl<T> SBox<T> {
4 fn new(x: T) -> Self {
5 Self(x)
6 }
7 }
8
9 fn main() {
10 let x = 10;
11 let y = SBox::new(x);
12 println!("x = {x}");
13 // println!("y = {}", *y); cannot dereference a raw pointer
14 } <--- x, y call drop
Here is an example of implementation of Deref and Drop for SBox.
1 use std::ops::Deref;
2
3 // implement Deref for SBox
4 impl<T> Deref for SBox<T> {
5 type Target = T; // define associated type,
6 // which is the type that Deref is
targeting to
7 fn deref(&sefl) -> &Self::Target {
8 &self.0 // .0 means the first element of tuple struct
9 }
10 }
11
12 // implement Drop for SBox
13 impl<T> Drop for SBox<T> {
14 fn drop(&mut self) {
15 println("SBox drop itself!"); // just print a message
16 }
17 }
If Deref is not implemented, a variable of our custom data type cannot be dereferenced. On the other
hand, if Drop is not implemented, the variable will be released from the heap only when it goes out of
scope. However, if the variable contains other references or smart pointers, they will not be released
until the program exits, resulting in a memory leak. Therefore, it is crucial to implement Drop correctly
to avoid memory leaks.
1 fn main() {
2 let x = 10;
3 let y = SBox::new(x);
4 println!("x = {x}");
5 println!("y = {}", *y); // *y equals *(y.deref())
6 // y.drop(); error: drop twice
7 } <--- x, y call drop and y prints "SBox drop itself!"
27
1.4. REVIEW CHAPTER 1. RUST BASIC
Box stores its data on the heap, and after implementing Deref, it can be automatically dereferenced.
1 fn main() {
2 let num = 10; // num stored in stack
3 let n_box = Box::new(num); // n_box stored in heap
4 println!("n_box = {}", n_box); // n_box deref to num
5 // automatically
6 println!("{}", 10 == *n_box); // n_box deref to num
7 // manually
8 }
The ownership system rules state that a value can only have one owner at any given time, but in
some scenarios, we need values to have multiple owners. To address this, Rust provides the Rc smart
pointer. Rc is a shareable reference counting smart pointer that can produce multiple ownership values.
Reference counting means that it determines whether a value is still in use by keeping track of the number
of references to it, and if the reference count is zero, the value can be cleaned up.
Rust’s ownership system mandates that a value can have only one owner at any given time. However,
in some cases, multiple ownership is necessary. To solve this problem, Rust provides the Rc smart
pointer, which is a reference counting smart pointer that can produce multiple ownership values. It
tracks whether a value is still in use by counting the number of references to it. Once the reference count
drops to zero, the value can be cleaned up.
b 2
3 4 None
a 1
In the example shown, 3 is shared by variables a(1) and b(2), and cloning an Rc increases the reference
count. Sharing is like turning off the lights in a classroom, only the last person leaving needs to turn off
the lights. Similarly, each user of Rc will only clear the data on the last use. Cloning an Rc will increase
the reference count, just like when a new person enters a classroom.
1 use std::rc::Rc;
2 fn main() {
3 let one = Rc::new(1);
4 // increase reference count
5 let one_1 = one.clone();
6 // display reference count
7 println!("sc:{}", Rc::strong_count(one_1));
8 }
Rc can be used for shared ownership in single-threaded environments only. For multithreaded en-
vironments, Rust provides Arc (atomic reference counting), a thread-safe version of Rc. It allocates a
T type value with shared ownership on the heap. Cloning an Arc increases the reference count while
producing a new Arc instance pointing to the same heap as the source Arc. Arc is immutable by default
and requires locking mechanisms like Mutex to modify it between multiple threads.
By default, Rc and Arc do not allow internal data modification. Rust provides two containers with
internal mutability, Cell and RefCell, for scenarios where modifying internal values is necessary. Internal
mutability is a Rust design pattern that allows data modification while having immutable references. Cell
provides methods for getting and changing internal values. For data types that implement Copy, the get
method is used to view the internal value, while the take method replaces the internal value with the
28
1.4. REVIEW CHAPTER 1. RUST BASIC
default value and returns the replaced value. For all data types, the replace method replaces and returns
the replaced value, while the into_inner method consumes Cell and returns the internal value.
1 // use_cell.rs
2
3 use std::cell::Cell;
4
5 struct Fields {
6 regular_field: u8,
7 special_field: Cell<u8>,
8 }
9
10 fn main() {
11 let fields = Fields {
12 regular_field: 0,
13 special_field: Cell::new(1),
14 };
15
16 let value = 10;
17 // fields.regular_field = value;
18 // error[E0594]: cannot assign to immutable field
19
20 fields.special_field.set(value);
21 // although fields is immutable,
22 // we can still change the value of special_field
23
24 println!("special: {}", fields.special_field.get());
25 }
To enable modification of certain fields internally, Cell provides a backdoor to immutable struct
Fields.
RefCell has similar characteristics to Cell, but instead of using get and set methods to modify in-
ternal data, it directly obtains a mutable reference to the data. For example, RefMut<_> can be used to
modify HashMap through RefCell. Here, shared_map obtains a RefMut<_> type map directly through
borrow_mut(), and then adds an element to the map through insert, which modifies shared_map.
1 // use_refcell.rs
2
3 use std::cell::{RefCell, RefMut};
4 use std::collections::HashMap;
5 use std::rc::Rc;
6
7 fn main() {
8 let shared_map: Rc<RefCell<_>> =
9 Rc::new(RefCell::new(HashMap::new()));
10 {
11 let mut map: RefMut<_> = shared_map.borrow_mut();
12 map.insert("kew", 1);
13 map.insert("shieber", 2);
14 map.insert("mon", 3);
15 map.insert("hon", 4);
16 }
17
29
1.4. REVIEW CHAPTER 1. RUST BASIC
30
1.4. REVIEW CHAPTER 1. RUST BASIC
33 } else {
34 println!("wheel weak reference has been dropped");
35 }
36 }
37 }
Now, let’s take a look at Cow (Copy on Write) Smart Pointer.
1 pub enum Cow<'a, B>
2 where B: 'a + ToOwned + 'a + ?Sized {
3 Borrowed(&'a B), // wrap a reference
4 Owned(<B as ToOwned>::Owned), // wrap a onwed value
5 }
The concept of borrowing involves accessing borrowed content in an immutable way, while owner-
ship means cloning the data when mutable borrowing is needed. To filter out all spaces in a string, we
can use the following code.
1 // use_cow.rs
2 fn delete_spaces(src: &str) -> String {
3 let mut dest = String::with_capacity(src.len());
4 for c in src.chars() {
5 if ' ' != c {
6 dest.push(c);
7 }
8 }
9 dest
10 }
While this code works, its efficiency is not optimal. First, we need to decide whether to use &str or
String for the parameter. If we use String but input &str, we need to clone it before calling the function.
If we use String, the string will be moved to the interior after the call, and the external cannot use it
anymore. In either case, a string generation and copy are performed in the function. If the string does
not contain whitespace characters, it is best to return it as is without copying. This is where Cow comes
in handy as it reduces copying and improves efficiency.
1 // use_cow.rs
2 use std::borrow::Cow;
3 fn delete_spaces2<'a>(src: &'a str) -> Cow<'a, str> {
4 if src.contains(' ') {
5 let mut dest = String::with_capacity(src.len());
6 for c in src.chars() {
7 if ' ' != c { dest.push(c); }
8 }
9 // capture the ownership and drop dest
10 return Cow::Owned(dest);
11 }
12 return Cow::Borrowed(src); // borrow the src
13 }
Here is an example that utilizes Cow.
1 // use_cow.rs
2 fn main() {
3 let s = "i love you";
31
1.4. REVIEW CHAPTER 1. RUST BASIC
1.4.11 Exception
Rust provides a unique exception handling mechanism as it does not have try-catch blocks like other
languages. This section is titled ”Exceptions,” but the author has used the term to refer to failures, errors,
and exceptions collectively. There are four types of exceptions in Rust: Option, Result, Panic, and Abort.
Option is used to handle possible failure cases, indicating whether the operation succeeded or failed
with Some and None. For instance, when getting a value that may not exist, the result may be None,
which should not cause an error but needs to be handled appropriately. Failure is different from error,
as it is an anticipated outcome that will not cause problems for the program. The definition of Option,
which we have already learned, is provided below.
1 enum Option<T> {
2 Some(T),
3 None,
4 }
32
1.4. REVIEW CHAPTER 1. RUST BASIC
Result is used to handle recoverable errors and represents success and failure. An error may not
necessarily cause the program to crash, but it needs to be specifically handled to allow the program to
continue executing. The definition of Result is as follows:
1 enum Result<T,E> {
2 Ok(T),
3 Err(E),
4 }
Opening a file that does not exist, accessing a file without permission, or attempting to convert a
non-numeric string to a number will result in Err(E).
1 use std::fs::File;
2 use std::io::ErrorKind;
3
4 let f = File::open("kw.txt");
5 let f = match f {
6 Ok(file) => file,
7 Err(err) => match err.kind() {
8 ErrorKind::NotFound => match File::create("kw.txt"){
9 Ok(fc) => fc,
10 Err(e) => panic("Error while creating file!"),
11 }
12 ErrorKind::PermissionDenied => panic("No permission!"),
13 other => panic!("Error while openning file"),
14 }
In Rust, handling errors is crucial, and there are various mechanisms to achieve it. One of them is the
panic mechanism, which should only be used when no other solution is available. When encountering
an unrecoverable error, panic will immediately stop the program’s execution to allow the programmer
to identify the problem. This mechanism is also useful for cleaning up memory. However, if you do not
want to use panic to clean up when encountering such errors, you can use the abort mechanism to let the
operating system handle it instead.
Although error handling in Rust may look verbose when using match, there are more concise options
available. For example, you can use the unwrap or expect methods to handle errors.
1 use std::fs::File;
2 use std::io;
3 use std::io::Read;
4
5 fn main() {
6 let f = File::open("kew.txt").unwrap();
7 let f = File::open("mon.txt").expect("Open file failed!");
8 // expect gives more specific error message than unwrap
9 }
If you only want to propagate the error and not handle it, you can use the ”?” operator. In this case,
the return type is Result, which returns String on success and io::Error on failure.
1 fn main() {
2 let s = read_from_file();
3 match s {
4 Err(e) => println!("{}", e),
5 Ok(s) => println!("{}", s),
6 }
33
1.4. REVIEW CHAPTER 1. RUST BASIC
7 }
8
9 fn read_from_file() -> Result<String, io::Error> {
10 let f = File::open("kew.txt")?; // if error, throw it
11 let mut s = String::new();
12 f.read_to_string(&mut s);
13
14 Ok(s)
15 }
When working with errors in Rust, you have to choose between using Option or Result depending on
the type of failure or error you are handling. Option is used for handling possible failures and represents
success or failure with Some or None. On the other hand, Result is used for handling recoverable errors
and represents success and failure with Ok or Err. It is essential to handle errors appropriately to avoid
program crashes. Finally, note that Option and Result are very similar, as Option<T> can be seen as
Result<T, ()>.
1.4.12 Macro
Rust does not have built-in library functions, meaning that programmers must define everything
themselves. However, Rust offers a set of powerful macros, including declarative and procedural macros,
that can accomplish many tasks. Unlike C macros, Rust macros are more powerful and widely used. For
instance, using the ”derive” macro, new functionality can be added to structures, and commonly used
methods like ”println!”, ”vec!”, and ”panic!” are also macros.
There are two main categories of macros in Rust: declarative macros declared using ”macro_rules!”
and procedural macros, further divided into custom derive macros, function-like macros, and attribute
macros. We’ve already covered declarative and custom derive macros, so this section will still focus on
those. Readers can refer to related materials for other macro types.
The format of a declarative macro is ”macro_name!()”, ”macro_name![]”, or ”macro_name!”. These
macros all use a macro name followed by an exclamation mark, which is of one of these bracket types (),
[], {}. Different macro purposes require different brackets, with ”vec![]” not being the same as ”vec!()”.
The latter looks more like a function and is the reason ”println!()” uses parentheses. However, any of
these brackets can be used and different brackets are just to satisfy the unity of meaning and form.
1 macro_rules! macro_name {
2 ($matcher) => {
3 $code_body;
4 return_value
5 };
6 }
In the macro definition, ”$matcher” marks syntax elements like empty, identifiers, literals, keywords,
symbols, patterns, and regular expressions. The ”$” symbol captures values, which are then used in
”$code_body” to process and potentially return a value. For example, the following macro can be used
to calculate the left and right child nodes of a binary tree parent node ”p”, with the left and right child
node indices being ”2p” and ”2p+1”, respectively.
1 macro_rules! left_child {
2 ($parent:ident) => {
3 $parent << 1
4 };
5 }
6 macro_rules! right_child {
34
1.4. REVIEW CHAPTER 1. RUST BASIC
7 ($parent:ident) => {
8 ($parent << 1) + 1
9 };
10 }
When calculating left and right child nodes, macros such as ”left_child!(p)” and ”right_child!(p)”
can be used instead of expressions like ”2 * p” and ”2 * p + 1”. This simplifies the code and clarifies
its meaning. The ”indent” and colon in the macro are metavariable specifiers. ”Indent” is an attribute of
”parent”, indicating that ”parent” is a value. Rust’s macros rely on these specifiers to function correctly.
The second form of macros is called procedural macros because they are more like a process or
function. They take code as input and produce other code as output. Derive macros, a type of procedural
macro, add new functionality to code directly. The following example uses the ”derive” macro to add
the ”Clone” method to ”Student”, so that ”Student” can directly call ”clone()” to achieve copying. This
is a simplified writing of ”impl Clone for Student”.
1 #[derive(Clone)]
2 struct Student;
Multiple traits can also be placed in ”derive” simultaneously, allowing for implementation of various
functions.
1 #[derive(Debug, Clone, Copy)]
2 struct Student;
Macros are a complex topic and should only be used when necessary and when they simplify the
code. It’s not recommended to use macros for their own sake.
35
1.4. REVIEW CHAPTER 1. RUST BASIC
To organize a large project, it is recommended to follow this approach when using Rust libraries.
The Tikv codebase is a good example to follow.
Rust also offers many standard libraries for solving general programming tasks. By studying these
libraries, developers can deepen their understanding of Rust and find inspiration for solving real-world
problems. The standard libraries in Rust include:
alloc env i64 pin task
any error i128 prelude thread
array f32 io primitive time
ascii f64 isize process u8
borrow ffi iter raw u16
boxed fmt marker mem u32
cell fs net ptr u64
char future num rc u128
clone hash ops result usize
cmp hint option slice vec
collections i8 os str backtrace
convert i16 panic string intrinsics
default i32 path sync lazy
The image shown here is a simplified dependency graph of Rust libraries. It was created to illustrate
how Rust as a language is built and how the language’s libraries are interrelated. These libraries are
considered the de facto standards of Rust, with some being fundamental and serving as dependencies for
other libraries.
36
1.4. REVIEW CHAPTER 1. RUST BASIC
An analysis of Rust’s standard libraries revealed that they can be divided into three layers: core,
alloc, and std. Each layer builds upon the previous one, with std being the top layer, alloc in the middle,
and core serving as the most critical layer. Core is responsible for defining and implementing various
core concepts in Rust’s basic syntax, such as variables, numbers, characters, and Boolean types. These
are the foundational concepts that beginners learn when studying Rust’s basics.
USAGE:
PasswdGenerator [OPTIONS] --seed <SEED>
OPTIONS:
-h, --help Prints help information
-l, --length <LENGTH> Length of password [default: 16]
-s, --seed <SEED> Seed to generate password
-V, --version Prints version information
The USAGE section provides instructions on how to use the tool, while the OPTIONS section dis-
plays the control parameters.
shieber@kew $ PasswdGenerator -s wechat
wechat: GQnaoXobRwrgW21A
The output of the PasswdGenerator tool begins with a short seed, such as ”wechat,” which can be
customized with prefixes or suffixes to associate it with a specific account. The seed serves to indicate
that the password was generated specifically for that account.
shieber@kew $ PasswdGenerator -s wechat -l 20
wechat: GQnaoXobRwrgW21Ac2Pb
37
1.4. REVIEW CHAPTER 1. RUST BASIC
The length of the password is controlled by the ”-l” parameter. If this parameter is not included, the
password defaults to a length of 16, which is sufficient in the vast majority of cases.
To handle command-line arguments, the clap library can be used, as shown below.
use clap::Parser;
38
1.4. REVIEW CHAPTER 1. RUST BASIC
// conbine seed and passwd
39
1.4. REVIEW CHAPTER 1. RUST BASIC
Once generated, the directory structure should look like the following table.
.
|-- Cargo.toml
|-- encryptor
| |-- Cargo.toml
| |-- src
| |-- lib.rs
|-- hash
| |-- Cargo.toml
| |-- src
| |-- lib.rs
|-- main
|-- Cargo.toml
|-- src
|-- main.rs
6 directories, 7 files
In each crate, the modules need to be exported for use by the main program. For example, in the
”hash” library, we can implement the ”mersenne_hash” function in ”merhash.rs” and export it in ”lib.rs”.
Similarly, in the ”encryptor” library, we can implement the function for generating passwords in ”pass-
word.rs” and export it in ”lib.rs”. The ”lib.rs” files of each crate should only be used to export functions.
For mersenne_hash, it can be encapsulated in merhash.rs, while the function for generating passwords
can be encapsulated in password.rs. The final directory looks like this:
|-- Cargo.toml
|-- encryptor
| |-- Cargo.toml
| |-- src
| |-- lib.rs
| |-- password.rs
|-- hash
| |-- Cargo.toml
| |-- src
| |-- lib.rs
| |-- merhash.rs
|-- main
|-- Cargo.toml
|-- src
|-- main.rs
6 directories, 9 files
It is important to note that the ”hash” library does not depend on the external libraries and can be
implemented first.
1 // hash/src/lib.rs
2
3 pub mod merhash; // export module merhash
4
5 #[cft(test)] // test module
6 mod tests {
40
1.4. REVIEW CHAPTER 1. RUST BASIC
7 use crate::merhash::mersenne_hash;
8
9 #[test]
10 fn mersenne_hash_works() {
11 let seed = String::from("jdxjp");
12 let hash = mersenne_hash(&seed);
13 assert_eq!(2000375, hash);
14 }
15 }
Then, we implement merhash.
1 // hash/src/merhash.rs
2
3 //! Mersenne Hash
4 //!
5 /// calculate hash value with mersenne prime 127
6 ///
7 /// # Example
8 /// use hash::merhash::mersenne_hash;
9 ///
10 /// let seed = "jdxjp";
11 /// let hash = mersenne_hash(&seed);
12 /// assert_eq!(2000375, hash);
13 pub fn mersenne_hash(seed: &str) -> usize {
14 let mut hash: usize = 0;
15
16 for (i, c) in seed.chars().enumerate() {
17 hash += (i + 1) * (c as usize);
18 }
19
20 (hash % 127).pow(3) - 1
21 }
To ensure the code is well-documented and easy to maintain, document comments and module com-
ments are used throughout the project. Tests are also included in the document comments, which is very
convenient for testing and subsequent document maintenance.
To complete the ”encryptor” library, we need to rely only on the external ”base64” library. In
addition, to handle error situations, the ”anyhow” library is also introduced. To do this, we open the
”Cargo.toml” file under the ”encryptor” directory and add the necessary dependencies under the ”[de-
pendencies]” section, including the ”anyhow” and ”base64” libraries, as well as the ”hash” library that
was previously written.
1 [package]
2 name = "encryptor"
3 authors = ["shieber"]
4 version = "0.1.0"
5 edition = "2021"
6
7 [dependencies]
8 anyhow = "1.0.56"
9 base64 = "0.13.0"
10 hash = { path = "../hash" }
41
1.4. REVIEW CHAPTER 1. RUST BASIC
42
1.4. REVIEW CHAPTER 1. RUST BASIC
51 passwd.push(nthc);
52 mer_hash /= crypto_len;
53 }
54
55 // combine seed and passwd
56 let interval = passwd.clone();
57 for c in seed.chars() {
58 passwd.push(c);
59 passwd += &interval;
60 }
61
62 // encode passwd to base64
63 passwd = encode(passwd);
64 passwd = passwd.replace("+", "*").replace("/", "*");
65
66 // length is not enough, use interval to fill
67 let interval = passwd.clone();
68 while passwd.len() < length {
69 passwd += &interval;
70 }
71
72 // return first length characters as password
73 Ok(format!("{}: {}", seed, &passwd[..length]))
74 }
In the ”Cargo.toml” of the main module, we introduce the necessary libraries, and note that the name
is ”PasswdGenerator”.
1 [package]
2 name = "PasswdGenerator"
3 authors = ["shieber"]
4 version = "0.1.0"
5 edition = "2021"
6
7 [dependencies]
8 anyhow = "1.0.56"
9 clap = { version= "3.1.6", features= ["derive"] }
10 encryptor = { path = "../encryptor"}
Finally, in the main function, we call the ”encryptor” and command line structure to generate pass-
words and return them.
1 // passwdgenerate/src/main.rs
2
3 use anyhow::{bail, Result};
4 use clap::Parser;
5 use encryptor::password::generate_password;
6
7 /// A simple password generator for any account
8 #[derive(Parser, Debug)]
9 #[clap(version, about, long_about= None)]
10 struct Args {
11 /// Seed to generate password
43
1.5. SUMMARY CHAPTER 1. RUST BASIC
12 #[clap(short, long)]
13 seed: String,
14
15 /// Length of password
16 #[clap(short, long, default_value_t = 16)]
17 length: usize,
18 }
19
20 fn main() -> Result<()> {
21 let args = Args::parse();
22
23 // seed must be longer than 4 characters
24 if args.seed.len() < 4 {
25 bail!("seed {} length must >= 4", &args.seed);
26 }
27
28 let (seed, length) = (args.seed, args.length);
29 let passwd = generate_password(&seed[..], length);
30 match passwd {
31 Ok(val) => println!("{}", val),
32 Err(err) => println!("{}", err),
33 }
34
35 Ok(())
36 }
Once the program is complete, compile it with ”cargo build –release” to get the ”PasswdGenerator”
command line tool in the ”target/release/” directory. You can put it in the ”/usr/local/bin” directory so
that you can use it anywhere on the system. Of course, using ”cargo doc” can also generate a ”doc”
directory, which contains detailed documentation about the project.
Through this small project, readers have become familiar with Rust code organization, module usage,
testing methods, and coding styles. Although this password generator is simple, it is completely sufficient
for personal use.
1.5 Summary
In this chapter, we provide an overview of Rust fundamentals. Firstly, we begin by introducing the
installation of the Rust toolchain and summarizing some learning resources. We then review Rust basics,
covering variables, functions, ownership, lifetimes, generics, traits, smart pointers, error handling, and
macro systems.
After that, we present a comprehensive project that implements a command-line tool using Rust.
Although the content of this chapter is relatively basic, it is essential to understand these concepts to
build more complex projects in Rust. Readers who need further clarification are recommended to refer
to the Rust documentation.
By the end of this chapter, readers will have a general understanding of Rust. In the following
chapters, we will delve into learning data structures and algorithms.
44
Chapter 2
Computer Science
2.1 Objectives
• Understand the ideas of computer science.
• Understand the concept of abstract data types.
• Review the basics of the Rust programming language.
45
2.4. WHAT IS PROGRAMMING? CHAPTER 2. COMPUTER SCIENCE
Computer science focuses on the art of problem-solving, which is achieved through abstraction.
Abstraction allows us to consider problems and their solutions from a logical perspective, rather than
being bogged down by physical details. An everyday example of this is driving a car. As a driver, you
interact with the car to reach your destination, using functions provided by the car’s designer, such as the
key, engine, gears, brakes, accelerator, and steering wheel. These functions are interfaces that simplify
the driving experience. In contrast, a car mechanic has a different perspective. They understand the inner
workings of the car, including the engine, transmission, temperature control, and windshield wipers.
These details operate at the physical level and happen ”under the hood.”
Similarly, users interact with computers from a logical perspective, using programs such as email,
document editors, and media players. In contrast, computer scientists, programmers, and system admin-
istrators work at a lower level, understanding how the operating system works, how to configure network
protocols, and how to write control function scripts. The key concept behind both of these examples is
abstraction. By hiding the complex details of the system, an interface simplifies the user’s interaction
with the computer. Rust’s mathematical calculation functions are a good example of this.
1 // sin_cos_function.rs
2
3 fn main() {
4 let x: f32 = 1.0;
5 let y = x.sin();
6 let z = x.cos();
7 println!("sin(1) = {y}, cos(1) = {z}");
8 // sin(1) = 0.84147096, cos(1) = 0.5403023
9 }
We may not know how to calculate the sine and cosine values, but as long as we understand how to
use the function, that’s all we need. Someone has implemented the algorithm for calculating sine and
cosine, and we can trust that the function will return the correct result. This is the idea of a ”black box.”
The interface describes the function’s input and output, while the details of the algorithm are hidden
within.
x x.sin() y
46
2.5. WHY STUDY DATA STRUCTURES AND ABSTRACT DATA
CHAPTER
TYPES? 2. COMPUTER SCIENCE
entities in the problem, with underlying primitive data types such as integer and floating-point data types
being the foundation for algorithm development.
The operations that can be performed on data are also described in the data type. For example, the
most fundamental operations for numbers are addition, subtraction, multiplication, and division. How-
ever, programming languages’ simple structures and data types may not always suffice to represent com-
plex solutions. To control this complexity, more reasonable data management methods (data structures)
and operation processes (algorithms) are required.
User
Interface
Shell Kernnel
Operations
An Abstract Data Type (ADT) is a logical description of how to view and operate on data. It provides
a level of abstraction where only the meaning of the data is of concern, not its final storage form. By
wrapping the data in this way, implementation details can be hidden. The user’s interaction with the
interface is an abstract operation, while the user and shell are abstract data types. Although the specific
operations are not known, understanding their interaction mechanism is still possible. This is the benefit
of abstract data types for algorithm design.
To implement abstract data types, primitive data types are used to build new data types from a physical
view, known as data structures. There are usually many different ways to implement abstract data types,
but different implementations should have the same physical view. This allows programmers to change
implementation details without altering the interaction method, and users to continue to focus on the
problem.
The common logic for abstract data types includes creating new data types, retrieving data, adding,
deleting, searching, modifying, checking for empty data, and computing size. For example, to implement
a queue, its abstract data type logic should include at least: new(), is_empty(), len(), clear(), enqueue(),
and dequeue(). Once the specific operational logic is known, implementation becomes simple, and there
are various ways to implement it as long as these abstract logic exists.
47
2.7. SUMMARY CHAPTER 2. COMPUTER SCIENCE
solutions and those that do not, as well as problems that have solutions but require significant time or
resources. As computer scientists, we need to evaluate and compare different solutions to determine the
most appropriate one.
Learning from the problem-solving approaches of others is an efficient way to develop our skills. By
examining different solutions, we can gain insights into how various algorithm designs help us tackle
challenging problems. By studying various algorithms, we can uncover their core principles and develop
a universal algorithm that can be applied to similar problems in the future. However, for the same
problem, different people may provide different algorithm implementations. As we saw earlier in the
example of computing sine and cosine, there can be many different implementation versions. If one
algorithm can achieve the same result as another algorithm in less time, it is obviously the better option.
2.7 Summary
This chapter delves into the concepts of computer science thinking and abstract data types, providing
clear definitions and highlighting the roles of algorithms and data structures. By abstracting data types,
specific implementation and operation logic is removed, which clarifies the boundaries between data
structures and algorithms and significantly simplifies the process of algorithm design. In the forthcoming
chapters, we will continue to utilize abstract data types to design various data structures.
48
Chapter 3
Algorithm Analysis
3.1 Objectives
• Understanding the Importance of Algorithm Analysis.
• Learning to perform performance benchmark tests on Rust programs.
• Being able to use the Big O notation to analyze the complexity of algorithms.
• Understanding the Big O analysis results for Rust data structures such as arrays.
• Understanding how Rust data structure implementations affect algorithm analysis.
49
3.2. WHAT IS ALGORITHM ANALYSIS? CHAPTER 3. ALGORITHM ANALYSIS
When evaluating the two functions, the answer depends on the reader’s criteria. If readability is
the concern, then ”sum_of_n” is better. However, in this book, we focus more on the statements of the
algorithms themselves, and clean coding is not part of the discussion.
Algorithm analysis compares the amount of resources used by algorithms. An algorithm is better than
another algorithm if it uses fewer resources or is more efficient in using resources. When comparing the
”sum_of_n” and ”tik_tok” functions, they appear very similar in terms of resource usage. Identifying
the resources used for computation is crucial and requires considering both time and space perspectives.
• The space used by an algorithm is determined by its memory consumption, which is typically influ-
enced by the size and nature of the problem. However, some algorithms have special space requirements.
• Algorithm execution time refers to the time it takes for an algorithm to execute all its steps.
For instance, the execution time of the ”sum_of_n” function can be analyzed through benchmark
testing. In Rust, the execution time of code can be calculated by recording the system time before and
after function execution. The SystemTime function, which is available in std::time, returns the system
time when called and gives the elapsed time afterward. By calling this function at the beginning and end
of function execution, we can obtain the function execution time.
1 // static_func_call.rs
2
3 use std::time::SystemTime;
4
5 fn sum_of_n(n: i64) -> i64 {
6 let mut sum: i64 = 0;
7 for i in 1..=n {
8 sum += i;
9 }
10 sum
11 }
12
13 fn main() {
14 for _i in 0..5 {
15 let now = SystemTime::now();
16 let _sum = sum_of_n(500000);
17 let duration = now.elapsed().unwrap();
18 let time = duration.as_millis();
19 println!("func used {time} ms");
20 }
21 }
To illustrate this, we executed the ”sum_of_n” function five times, each time calculating the sum of
the first 500,000 integers and 1000,000 integers. The results were as follows:
func used 10 ms
func used 6 ms
func used 6 ms
func used 6 ms
func used 6 ms
func used 17 ms
func used 12 ms
func used 12 ms
func used 12 ms
func used 12 ms
50
3.3. BIG-O NOTATION ANALYSIS CHAPTER 3. ALGORITHM ANALYSIS
After comparing the results of two executions, it was found that the time taken by the function to
execute is basically the same. On average, the function takes 6 milliseconds to execute. The first exe-
cution takes longer at 10 milliseconds because the function requires initialization and preparation. The
subsequent four executions do not require initialization, and their times are more accurate, hence why
multiple executions are necessary. When calculating the sum of the first 1,000,000 integers, it can be
observed that the first execution takes longer, and the subsequent executions take exactly twice the time
taken to calculate the sum of the first 500,000 integers. This indicates that the execution time of the
algorithm is proportional to the size of the calculation.
To further explore this idea, consider the following function, which also calculates the sum of the
∑n
first n integers, but uses the mathematical formula i=0 = n(n+1) to compute it.
2
1 // static_func_call2.rs
2 fn sum_of_n2(n: i64) -> i64 {
3 n * (n + 1) / 2
4 }
The sum_of_n function in static_func_call.rs is modified to use this formula, and a benchmark test is
performed using three different values of n (100,000, 500,000, 1,000,000). Each calculation is repeated
five times, and the average execution time is recorded. The results are as follows:
func used 1396 ns
func used 1313 ns
func used 1341 ns
There are two important points to note in the output. First, the recorded execution time is in nanosec-
onds, which is significantly shorter than the previous examples. The execution times for all three calcu-
lations are approximately 0.0013 milliseconds, which is orders of magnitude shorter than the previous
6 milliseconds. Second, the execution time is independent of n. As n increases, the calculation time
remains the same, indicating that the calculation is almost unaffected by n.
This benchmark test indicates that the iterative solution sum_of_n performs more work because some
program steps are repeated, which is why it takes longer. Moreover, the execution time of the iterative
solution increases with n. Additionally, it should be noted that running similar functions on different
computers or using different programming languages may yield different results. The calculated value
of 1341 nanoseconds mentioned in the text refers to the computer used in this study, which is Lenovo
Legion R7000P with a 16-core AMD R7-4800H CPU. Older computers may take longer to execute
sum_of_n2.
It is necessary to find a better way to describe the execution time of algorithms. Although benchmark
tests measure the actual time taken by a program to execute, they do not provide a useful measure because
the execution time depends on the specific machine, and there are also magnitude conversions between
milliseconds and nanoseconds. We require a measure that is independent of the program or computer
used, and can be used to compare the efficiency of different algorithm implementations. Big O notation
is a good method for measuring algorithmic efficiency in the field of algorithm analysis, analogous to
acceleration measuring the change in speed per second.
51
3.3. BIG-O NOTATION ANALYSIS CHAPTER 3. ALGORITHM ANALYSIS
We use the function T to represent the total number of executions to measure the algorithm’s time
complexity, where T (n) = 1 + n. The parameter n represents the problem size, and T (n) is the time
spent solving a problem with a size greater than or equal to n. Using n to represent the problem size
makes sense for the summing function, as summing 100,000 integers is a larger problem than summing
1000 integers, and thus takes longer. Our goal is to demonstrate how the algorithm’s execution time
changes relative to the problem size.
For measuring the algorithm’s space complexity, we use the function S to represent the total memory
consumption, where S(n) = 2. The parameter n still represents the problem size, but S(n) is indepen-
dent of n. To analyze the performance of an algorithm, it is crucial to determine the primary components
of its time complexity function T (n) and space complexity function S(n), rather than merely counting
the number of operation steps and space usage. As the problem size grows, some components of these
functions become dominant. The fastest-growing parts of these functions with n are represented by the
big O notation, denoted as O(f (n)). The function f (n) signifies the primary part of T (n) or S(n).
For instance, T (n) = n + 1 in the previous example. As n increases, the constant 1 becomes less
significant. Thus, we can approximate the running time of T (n) as O(T (n)) = O(n + 1) = O(n).
While the constant 1 is undoubtedly important for T (n), its contribution becomes negligible when n is
large. However, when T (n) = n3 + 1, (n = 1), discarding the constant 1 is unreasonable since it would
mean discarding a significant portion of the running time. In this case, using O(T (n)) = O(n3 + 1)
is appropriate for small values of n. But as n increases, the 1 becomes less significant, and we can
approximate T (n) as O(T (n)) = O(n3 ).
For S(n), since it is a constant, O(S(n)) = O(2) = O(1). The big O notation only represents the
order of magnitude. Therefore, although the actual complexity of S(n) is O(2), we use O(1) instead.
Assuming an algorithm has a number of operational steps determined by T (n) = 6n2 + 37n + 996.
When n is small, such as 1 or 2, the constant term 996 may seem to be the main part of the function.
However, as n grows larger, the significance of the n2 term increases. In fact, as n becomes very large,
the other two terms become negligible in the final result. Thus, we can approximate T (n) by ignoring
the other terms and focusing only on 6n2 . Moreover, the coefficient 6 also becomes insignificant as n
grows larger. Thus, we say that T (n) has a complexity order of n2 , or O(n2 ).
However, the complexity of some algorithms depends on the exact values of the data, rather than the
size of the problem. For such algorithms, their performance needs to be characterized based on best case,
worst case, or average case scenarios. Worst case refers to a specific dataset that leads to particularly poor
algorithm performance, while the same algorithm may have significantly different performance under
different datasets. In most cases, the efficiency of algorithm execution lies between the two extremes of
worst and best cases, i.e., the average case. Thus, it is important for programmers to understand these
differences and avoid being misled by a particular case.
In the study of algorithms, some common order functions appear repeatedly, as shown in the table
and figure below. To determine which of these functions is the main part, we need to observe how they
relate to each other as n grows larger.
The figure below illustrates the growth of various functions. At small values of n, it can be difficult to
discern which function dominates the others. However, as n increases, the relative sizes of the functions
become clear. It is worth noting that at n = 10, 2n exceeds n3 , though this case is not shown on the
graph. The figure also highlights the differences between different orders of magnitude. Generally, for
n > 10, we have O(2n ) > O(n3 ) > O(n2 ) > O(n log n) > O(n) > O(log n) > O(1). This
information is useful in algorithm design, as it allows us to calculate the complexity of an algorithm. If
we determine that an algorithm has a complexity similar to O(2n ), for example, we know that it is not
practical and can look to optimize it for better performance.
52
3.3. BIG-O NOTATION ANALYSIS CHAPTER 3. ALGORITHM ANALYSIS
n3 2n
n2
nlogn
logn
1
The above analysis examines time and space complexity, but often time complexity is the primary
concern because space is typically difficult to optimize. For example, the space complexity of an input
array is limited from the outset, but different algorithms can result in significant differences in the time
required to run on the array. Moreover, as storage becomes cheaper and more abundant, time becomes
the most important factor because it cannot be increased. Therefore, most of the complexity analysis in
this book focuses on time complexity, unless otherwise specified.
The following code only employs addition, subtraction, multiplication, and division. We can use it
to apply our newly acquired understanding of algorithm complexity analysis.
1 // big_o_analysis.rs
2 fn main() {
3 let a = 1; let b = 2;
4 let c = 3; let n = 1000000;
5
6 for i in 0..n {
7 for j in 0..n {
8 let x = i * i;
9 let y = j * j;
10 let z = i * j;
11 }
12 }
13
14 for k in 0..n {
15 let w = a*b + 45;
16 let v = b*b;
17 }
18
19 let d = 996;
20 }
53
3.4. ANAGRAM DETECTION CHAPTER 3. ALGORITHM ANALYSIS
The code above has a time complexity of O(n2 ), which can be obtained by analyzing the code. Firstly,
the time to allocate operations a, b, c, and n is a constant 4. Then, the second term is 3n2 , which comes
from the execution of three statements n2 times due to nested iteration. The third term is 2n, as two
statements are iteratively executed n times. Finally, the fourth term is a constant 1, representing the final
assignment statement d = 996. Thus, the function that determines the number of operational steps is
T (n) = 4 + 3n2 + 2n + 1 = 3n2 + 2n + 5. By examining the exponent, we can see that the n2 term is
the most significant, indicating that the time complexity of this code is O(n2 ).
54
3.4. ANAGRAM DETECTION CHAPTER 3. ALGORITHM ANALYSIS
16 vec_b.push(c);
17 }
18
19 // index: pos1, pos2
20 let mut pos1: usize = 0;
21 let mut pos2: usize;
22
23 // loop controlling
24 let mut is_anagram = true;
25 let mut found: bool;
26 while pos1 < s1.len() && is_anagram {
27 pos2 = 0;
28 found = false;
29 while pos2 < vec_b.len() && !found {
30 if vec_a[pos1] == vec_b[pos2] {
31 found = true;
32 } else {
33 pos2 += 1;
34 }
35 }
36
37 // replace a character with ' '
38 if found {
39 vec_b[pos2]= ' ';
40 } else {
41 is_anagram = false;
42 }
43
44 // tackle the next character of s1
45 pos1 += 1;
46 }
47
48 is_anagram
49 }
50
51 fn main() {
52 let s1 = "rust";
53 let s2 = "trus";
54 let result = anagram_solution2(s1, s2);
55 println!("s1 and s2 is anagram: {result}");
56 // s1 and s2 is anagram: true
57 }
Upon analyzing this algorithm, we observe that each character in s1 is compared to all characters in
s2, which leads to a maximum of n comparisons for each character in s1. As a result, the n positions in
blist will be visited once to match the characters from s1. The total number of visits can be expressed as
the sum of integers from 1 to n.
∑
n
n(n + 1) 1 1
1 + 2 + ··· + n = i= = n2 + n (3.1)
i=1
2 2 2
The time complexity of this algorithm is O(n2 ), as the term n2 dominates as n grows. This is much
better than the brute force algorithm that has a complexity of O(n!).
55
3.4. ANAGRAM DETECTION CHAPTER 3. ALGORITHM ANALYSIS
56
3.4. ANAGRAM DETECTION CHAPTER 3. ALGORITHM ANALYSIS
57
3.5. PERFORMANCE OF RUST DATA STRUCTURES CHAPTER 3. ALGORITHM ANALYSIS
58
3.6. SUMMARY CHAPTER 3. ALGORITHM ANALYSIS
Rust’s String type is based on Vec, which allows String to be modified like Vec. To use a portion of
the String, one can use &str, which is a slice of the String type that is easy to index. However, since &str
is based on String, it cannot be changed because modifying the slice would change the data in String,
which may be used elsewhere. In Rust, mutable strings use String, while immutable strings use &str.
Vec is similar to lists in other languages, and VecDeque extends Vec to support inserting data at both
ends of the sequence, making it a double-ended queue. LinkedList is a linked list and can be used when
an unknown size Vec is desired.
The HashMap in Rust is similar to dictionaries in other languages, while BTreeMap is a B-tree
whose nodes contain data and pointers. It is often used to implement databases, file systems, and other
content storage applications. The HashSet and BTreeSet in Rust are similar to the Set collection in other
languages, which are used to record a single value that has appeared once. HashSet is implemented using
HashMap as its underlying data structure, while BTreeSet is implemented using BTreeMap. BinaryHeap
is a priority queue, storing a set of elements that can extract the maximum value at any time.
Rust’s collection data types are highly efficient, as can be seen from the tables below that show the
performance of various data structures. The highest complexity of these data structures is O(n).
3.6 Summary
This chapter introduced the big O notation analysis method for evaluating algorithm complexity,
which involves calculating the number of code execution steps and determining the maximum quantity
level. We then examined the complexity of Rust’s basic data types and collection data types. Upon
comparison and analysis, it becomes clear that Rust’s built-in scalar, composite, and collection data
types are highly efficient. Additionally, custom data structures implemented based on these collection
types can also achieve efficient.
59
Chapter 4
4.1 Objectives
• Understanding Abstract Data Types Vec, Stack, Queue, Deque, Linked List
• Being able to implement stack, queue, deque, and linked list using Rust
• Understanding the performance (complexity) of basic linear data structures
• Understanding prefix, infix, and postfix expression formats
• Using a stack to implement postfix expression and calculate the value
• Using a stack to convert infix expressions to postfix expressions
• Being able to identify whether to use a stack, queue, deque, or linked list for a given problem
• Being able to implement abstract data types as linked lists using nodes and references
• Being able to compare the performance of self-implemented Vec with Rust’s built-in Vec.
4.3 Stack
A stack is a linear data structure that has numerous applications, such as in function calls and webpage
data recording. It is an ordered collection of data items where new items are added or removed from
60
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
the top, while the bottom refers to the opposite end. The bottom is significant because the items closest
to it have been stored for the longest time, and the most recently added item will be the first one to be
removed. This is known as the Last In First Out (LIFO) or First In Last Out (FILO) principle, which
means that newer items are closer to the top, while older items are closer to the bottom.
Stacks are ubiquitous in everyday life, like bricks on a construction site, books on a desk, and plates
in a restaurant. To access the bottom brick, book, or plate, we need to remove the items on top first. The
schematic diagram of a stack is shown below with some conceptual names.
Computer top
Integrated Circuit
Transistor
Semicondutor
To understand the function of stacks, it is best to observe how they are formed and emptied. Suppose
we stack books on a clean desktop one by one; we are creating a stack. Now, if we remove a book, the
order of removal is opposite to the order in which the books were placed. The significance of stacks lies
in their ability to reverse the order of items, inserting and deleting in the opposite order. The following
diagram shows the process of creating and deleting data objects, with particular attention to the order of
the data.
4
1 2 3 4 4 3 2 1
The ability to reverse the order of items makes the reversal property of stacks particularly useful, as
demonstrated in various computer applications. For instance, when browsing news on a web browser,
the back function is implemented using a stack. As the user browses web pages, they are stacked, with
the current page at the top and the first page at the bottom. Pressing the back button takes the user to the
previous page in the opposite order. Without the power of stacks, designing this back function would
be nearly impossible. This example highlights the importance of data structures in simplifying certain
functions, as choosing the right data structure can make a significant difference.
61
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
62
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
1 // stack.rs
2
3 #[derive(Debug)]
4 struct Stack<T> {
5 size: usize, // stack size
6 data: Vec<T>, // stack data
7 }
8
9 impl<T> Stack<T> {
10 // initialize a stack
11 fn new() -> Self {
12 Self {
13 size: 0,
14 data: Vec::new()
15 }
16 }
17
18 fn is_empty(&self) -> bool {
19 0 == self.size
20 }
21
22 fn len(&self) -> usize {
23 self.size
24 }
25
26 // clear stack
27 fn clear(&mut self) {
28 self.size = 0;
29 self.data.clear();
30 }
31
32 // put the item into the tail of Vec
33 fn push(&mut self, val: T) {
34 self.data.push(val);
35 self.size += 1;
36 }
37
38 // size decrease by 1 and then return the value
39 fn pop(&mut self) -> Option<T> {
40 if 0 == self.size {
41 return None;
42 }
43 self.size -= 1;
44 self.data.pop()
45 }
46
47 // return reference to the top value
48 fn peek(&self) -> Option<&T> {
49 if 0 == self.size {
50 return None;
51 }
63
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
52
53 self.data.get(self.size - 1)
54 }
55
56 fn peek_mut(&mut self) -> Option<&mut T> {
57 if 0 == self.size {
58 return None;
59 }
60
61 self.data.get_mut(self.size - 1)
62 }
63
64 // Implementing iteration for stack
65 // into_iter: stack modified and becomes a iterator
66 // iter: stack unmodified and get a unmutable iterator
67 // iter_mut: stack unmodified and get a mutable iterator
68 fn into_iter(self) -> IntoIter<T> {
69 IntoIter(self)
70 }
71
72 fn iter(&self) -> Iter<T> {
73 let mut iterator = Iter { stack: Vec::new() };
74 for item in self.data.iter() {
75 iterator.stack.push(item);
76 }
77
78 iterator
79 }
80
81 fn iter_mut(&mut self) -> IterMut<T> {
82 let mut iterator = IterMut { stack: Vec::new() };
83 for item in self.data.iter_mut() {
84 iterator.stack.push(item);
85 }
86
87 iterator
88 }
89 }
90
91 // Implementation of 3 iterations
92 struct IntoIter<T>(Stack<T>);
93 impl<T: Clone> Iterator for IntoIter<T> {
94 type Item = T;
95 fn next(&mut self) -> Option<Self::Item> {
96 if !self.0.is_empty() {
97 self.0.size -= 1;
98 self.0.data.pop()
99 } else {
100 None
101 }
102 }
103 }
64
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
104
105 struct Iter<'a, T: 'a> { stack: Vec<&'a T>, }
106 impl<'a, T> Iterator for Iter<'a, T> {
107 type Item = &'a T;
108 fn next(&mut self) -> Option<Self::Item> {
109 self.stack.pop()
110 }
111 }
112
113 struct IterMut<'a, T: 'a> { stack: Vec<&'a mut T> }
114 impl<'a, T> Iterator for IterMut<'a, T> {
115 type Item = &'a mut T;
116 fn next(&mut self) -> Option<Self::Item> {
117 self.stack.pop()
118 }
119 }
120
121 fn main() {
122 basic();
123 peek();
124 iter();
125
126 fn basic() {
127 let mut s = Stack::new();
128 s.push(1); s.push(2); s.push(3);
129
130 println!("size: {}, {:?}", s.len(), s);
131 println!("pop {:?}, size {}",s.pop().unwrap(), s.len())
;
132 println!("empty: {}, {:?}", s.is_empty(), s);
133
134 s.clear();
135 println!("{:?}", s);
136 }
137
138 fn peek() {
139 let mut s = Stack::new();
140 s.push(1); s.push(2); s.push(3);
141
142 println!("{:?}", s);
143 let peek_mut = s.peek_mut();
144 if let Some(top) = peek_mut {
145 *top = 4;
146 }
147
148 println!("top {:?}", s.peek().unwrap());
149 println!("{:?}", s);
150 }
151
152 fn iter() {
153 let mut s = Stack::new();
154 s.push(1); s.push(2); s.push(3);
65
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
155
156 let sum1 = s.iter().sum::<i32>();
157 let mut addend = 0;
158 for item in s.iter_mut() {
159 *item += 1;
160 addend += 1;
161 }
162
163 let sum2 = s.iter().sum::<i32>();
164 println!("{sum1} + {addend} = {sum2}");
165 assert_eq!(9, s.into_iter().sum::<i32>());
166 }
167 }
After executing the code, we obtain the following results.
size: 3, Stack { size: 3, data: [1, 2, 3] }
pop 3, size 2
empty: false, Stack { size: 2, data: [1, 2] }
Stack { size: 0, data: [] }
Stack { size: 3, data: [1, 2, 3] }
top 4
Stack { size: 3, data: [1, 2, 4] }
6 + 3 = 9
66
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
To solve the parenthesis matching problem, a deeper understanding of brackets and their matching is
needed. When processing symbols from left to right, the nearest left starting bracket ’(’ must match the
next right closing symbol ’)’ (as shown in Figure 4.3). In addition, the first left starting bracket processed
must wait until it matches the last right closing bracket. Ending brackets match starting brackets in the
opposite order, from inside to outside. This is a problem that can be solved using a stack.
( ( ) ( ( ) ) ( ) )
last matched parenthesis
Matching parenthesis is crucial in computer programs since it determines the next operation. The
challenge is to write an algorithm that can read a string of symbols from left to right and determine
whether the brackets match. Parenthesis and their matching are essential for computer programs since
bracket and nesting are prevalent in programs.
The implementation of the algorithm for matching parentheses using a stack is quite simple since it
only involves push, pop, and judgment operations on the stack. To start, process the bracket string from
left to right with an empty stack. If the symbol is a left starting symbol, push it onto the stack; if it is an
ending symbol, pop the top element of the stack and match these two symbols. If they match, continue to
process the next bracket until the string is completely processed. At the end of the processing, the stack
should be empty; if not, it indicates that there are unmatched brackets. Here is a Rust implementation of
the bracket matching program.
1 // par_checker1.rs
2
3 fn par_checker1(par: &str) -> bool {
4 // adding characters into Vec
5 let mut char_list = Vec::new();
6 for c in par.chars() {
7 char_list.push(c);
8 }
9
10 let mut index = 0;
11 let mut balance = true; // determine if balanced
12 let mut stack = Stack::new();
13 while index < char_list.len() && balance {
14 let c = char_list[index];
15
16 if '(' == c { // if is '(', put data into stack
17 stack.push(c);
18 } else { // if is ')', determine if stack is empty
19 if stack.is_empty() { // empty stack, matched
20 balance = false;
21 } else {
22 let _r = stack.pop();
23 }
24 }
25
26 index += 1;
27 }
67
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
28
29 // parenthesis matched: balanced and empty stack
30 balance && stack.is_empty()
31 }
32
33 fn main() {
34 let sa = "()(())";
35 let sb = "()((()";
36 let res1 = par_checker1(sa);
37 let res2 = par_checker1(sb);
38 println!("{sa} balanced:{res1}, {sb} balanced:{res2}");
39 // ()(()) balanced:true, ()((() balanced:false
40 }
While the above example only involves matching the parentheses ’()’, there are actually three com-
monly used types of parentheses: ’()’, ’[]’, and ’{}’. These different types of left and right parentheses
are often nested together, as shown in Rust, where square brackets ’[]’ are used for indexing, curly
brackets ’{}’ are used for formatting output, and parentheses ’()’ are used for function parameters, tu-
ples, mathematical expressions, and more. As long as each symbol maintains its own left starting and
right ending relationship, mixed nesting symbols can be used.
{ { ( [ ] [ ] ) } ( ) }
[ [ { { ( ( ) ) } } ] ]
[ ] [ ] [ ] ( ) { }
All of the parentheses shown above are matched. On the contrary, the following expression is not
matched.
( } [ ]
( ( ( ) ] ) )
[ { ( ) ]
To handle three types of parentheses, the previous parentheses checking program, par_checker1(only
able to detect ’()’ parentheses), needs to be extended. However, the algorithm process remains the same.
Each left starting parenthesis is pushed onto the stack, waiting for the matching right ending parenthesis
to appear. When an ending parenthesis appears, the program checks whether the types of parentheses
match. If the two parentheses do not match, then the string does not match. If the entire string has been
processed and the stack is empty, then the parentheses expression matches.
To detect whether the types of parentheses match, a new function called par_match() has been added.
This function can detect the three commonly used types of parentheses. The detection principle is very
simple: the program arranges the parentheses in order and checks whether their indices match.
1 // par_checker2.rs
2
3 // check if parentheses match of various symbols
4 fn par_match(open: char, close: char) -> bool {
5 let opens = "([{";
6 let closers = ")]}";
7 opens.find(open) == closers.find(close)
8 }
9
10 fn par_checker2(par: &str) -> bool {
11 let mut char_list = Vec::new();
68
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
12 for c in par.chars() {
13 char_list.push(c);
14 }
15
16 let mut index = 0;
17 let mut balance = true;
18 let mut stack = Stack::new();
19 while index < char_list.len() && balance {
20 let c = char_list[index];
21 // check 3 open symbols simultaneously
22 if '(' == c || '[' == c || '{' == c {
23 stack.push(c);
24 } else {
25 if stack.is_empty() {
26 balance = false;
27 } else {
28 // determine if match
29 let top = stack.pop().unwrap();
30 if !par_match(top, c) {
31 balance = false;
32 }
33 }
34 }
35 index += 1;
36 }
37 balance && stack.is_empty()
38 }
39
40 fn main() {
41 let sa = "(){}[]";
42 let sb = "(){)[}";
43 let res1 = par_checker2(sa);
44 let res2 = par_checker2(sb);
45 println!("sa balanced:{res1}, sb balanced:{res2}");
46 // (){}[] balanced:true, (){)[} balanced:false
47 }
The current implementation can handle different types of bracket matching problems. However, if
the input expression contains non-bracket characters, the program will fail to work properly.
(a+b)(c*d)func()
The apparent complexity of the problem can be deceiving because the actual issue is still related to
detecting matching brackets. Hence, non-bracket characters can be skipped during processing. In the
case of the given example, the non-bracket characters can be ignored, leaving only the brackets, resulting
in the string: ()(){}(). The problem remains the same as before, and the code can be modified to detect
matching brackets even in strings containing non-bracket characters. The following implementation is
the modified code based on par_checker2.rs.
1 // par_checker3.rs
2
3 fn par_checker3(par: &str) -> bool {
4 let mut char_list = Vec::new();
69
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
70
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
The function shown below takes a decimal parameter and uses Rust’s modulus operator (%) to extract
the remainder, which is then pushed onto a stack. The parameter is repeatedly divided by 2 until it reaches
0, and the program constructs the binary representation of the original value.
1 // divide_by_two.rs
2
3 fn divide_by_two(mut dec_num: u32) -> String {
4 // save the remainder in a stack
5 let mut rem_stack = Stack::new();
6
7 // push rem into the stack
8 while dec_num > 0 {
9 let rem = dec_num % 2;
10 rem_stack.push(rem);
11 dec_num /= 2;
12 }
13
14 // pop out elems from the stack to form a string
15 let mut bin_str = "".to_string();
16 while !rem_stack.is_empty() {
17 let rem = rem_stack.pop().unwrap().to_string();
18 bin_str += &rem;
19 }
20 bin_str
21 }
22
23 fn main() {
24 let num = 10;
25 let bin_str: String = divided_by_two(num);
26 println!("{num} = b{bin_str}");
27 // 10 = b1010
28 }
This algorithm for converting from decimal to binary can be extended to perform conversions be-
tween any two bases commonly used in computer science, such as binary, octal, and hexadecimal. For
example, the decimal number 233(10) corresponds to 351(8) in octal and e9(16) in hexadecimal.
To make the function more general, it can be modified to accept a predetermined conversion base,
71
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
replacing the concept of dividing by 2 with dividing by the conversion base. The remainders are still
pushed onto a stack until the value being converted reaches 0. However, for bases greater than 10,
remainders greater than 10 will inevitably occur, so it is best to represent them as a single character. In
the base_converter function, we choose to use A-F to represent 10-15, but lowercase letters such as a-f
or other character sequences can also be used.
1 // base_converter.rs
2
3 fn base_converter(mut dec_num: u32, base: u32) -> String {
4 // digits is the string form of integers(especially for 10
-15)
5 let digits = ['0', '1', '2', '3', '4', '5', '6', '7',
6 '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'];
7 let mut rem_stack = Stack::new();
8
9 // push rem into the stack
10 while dec_num > 0 {
11 let rem = dec_num % base;
12 rem_stack.push(rem);
13 dec_num /= base;
14 }
15
16 // pop out elems from the stack to form a string
17 let mut base_str = "".to_string();
18 while !rem_stack.is_empty() {
19 let rem = rem_stack.pop().unwrap() as usize;
20 base_str += &digits[rem].to_string();
21 }
22 base_str
23 }
24
25 fn main() {
26 let num1 = 10;
27 let num2 = 43;
28 let bin_str: String = base_converter(num1, 2);
29 let hex_str: String = base_converter(num2, 16);
30 println!("{num1} = b{bin_str}, {num2} = x{hex_str}");
31 // 10 = b1010, 43 = x2B
32 }
72
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
specific algorithms to perform the same task. A computer can avoid confusion by using fully parenthe-
sized expressions, where each operator has a pair of parentheses, indicating the order of the operation.
However, processing these expressions can be difficult since computers work from left to right. There-
fore, even simple tasks that seem easy for humans can be complex for computers, as they lack intelligence
and the ability to understand expressions intuitively.
To ensure that computers do not confuse the order of operations, a fully parenthesized expression
can be used, where each operator is enclosed in a pair of parentheses indicating the order of operations.
However, since computers process data from left to right, they may have difficulty jumping between inner
and outer parentheses to calculate expressions like (A + (B * C)). While humans can do this intuitively,
it is challenging for computers since they are rigid and require explicit instructions.
A more intuitive method for computers is to separate the operator and operands by moving the opera-
tor outside the parentheses. This creates prefix and postfix expressions, which can be distinguished from
the original infix expression. In prefix expressions, operators come before the operands, and in postfix
expressions, operators come after the corresponding operands. Calculations are performed by taking the
operator and operands, calculating the result as the current value, and then proceeding to subsequent
operations until the expression is fully calculated.
For example, the infix expression A + B can be written as + A B in prefix notation or A B + in postfix
notation. By following the rules for each notation, readers can calculate more complex expressions in
prefix or postfix notation and confirm the correct result.
To analyze complex expressions, readers must understand the importance of parentheses in express-
ing the order of operations. Even small changes in the placement of parentheses can result in vastly
different expressions. While infix expressions require parentheses to disambiguate the order of opera-
tions, prefix and postfix expressions do not need them. This is because the order of operations is entirely
determined by the position of operators in these notations. As a result, prefix and postfix expressions
can clearly express the calculation logic without ambiguity, ensuring that calculations based on them are
error-free. The table below shows some examples of complex expressions in different notations.
Prefix and postfix expressions may appear more complicated to compute at first. For instance, the
prefix expression ”+ A * B C” for ”A + B * C” requires calculating ”B * C” before adding it to A.
However, the multiplication sign is still inside the addition operator, making it impossible to calculate.
To resolve this issue, stacks can be used to reverse the order of operations. By pushing the operands
and operators onto separate stacks from left to right, the order of operations can be reversed, as shown
in Figure (4.5). This approach enables prefix and postfix expressions to be computed efficiently and
accurately.
73
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
* B
+ A
To calculate expressions in prefix notation, the operator and operands are pushed onto separate stacks
from left to right. To compute the expression, the top operator is popped from the operator stack, and the
two top operands are popped from the operand stack. The operator is applied to these operands to obtain
a result, which is then pushed back onto the operand stack. This process is repeated until the operator
stack is empty. Finally, the top value in the operand stack is the result of the expression. This method
is as efficient as fully bracketed expressions since the computer only needs to push and pop from the
stacks, and does not need to handle parentheses, as shown in Figure (4.6).
B*C = BC A + BC = A + BC
BC
+ A A+BC
Similarly, postfix expressions can also be calculated using stacks. Only one stack is required to per-
form the calculation. For example, the postfix expression for A + B * C, which is A B C * +, is calculated
by pushing A, B, and C onto the stack. When the * operator is encountered, the top two operands, B and
C, are popped, multiplied, and pushed back onto the stack. Then, when the + operator is encountered,
the top two operands, A and BC, are popped, added, and pushed back onto the stack, resulting in the
final result of A + BC. This approach demonstrates the efficiency of stack-based computations, which
can handle expressions in any notation without the need for parentheses.
( A + ( B * C ) ) = +A ∗ BC
( A + ( B * C ) ) = ABC ∗ +
74
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
an operand pair. The internal expression of (A + (B * C)) is (B * C), and by moving the multiplication
symbol to the left bracket position and deleting the left and matching right brackets, the subexpression
can be converted to a prefix expression. Similarly, by moving the addition operator to its corresponding
left bracket position and deleting the matching right bracket, the complete prefix expression + A * B C
can be obtained. By repeating this operation for all operators, the complete postfix expression can be
obtained.
To convert an expression, regardless of whether it is a prefix or postfix expression, the expression
must first be converted into a fully bracketed expression based on the order of operations. Once the
expression is fully bracketed, the operator inside the brackets can be moved to the position of the left or
right bracket to achieve the desired notation. For example, the more complex expression (A + B) * C
- (D + E) / (F + G) can be converted to a prefix or postfix expression using this method. Although the
result may be complex for humans, computers can handle this process with ease.
(A + B) * C - (D + E) / (F + G)
-*+ A B C /+ D E + F G A B + C * D E + F G +/-
Obtaining a fully parenthesized expression is a difficult task, and it requires modifying the string by
moving and deleting characters. Therefore, this method is not universal enough. A more convenient
approach is to handle operators separately. When converting an infix expression, if the operands are
not considered, they maintain their original relative positions, and only the operators change position.
Therefore, it is not necessary to change the position of operands when encountering them; instead, only
operators need to be handled when they are encountered. However, operators have priorities and often
reverse the order. The characteristic of reversing the order is inherent to the stack data structure, which
can be used to store operators.
To convert an infix expression to a postfix expression, we can use a stack to store operators while
scanning the expression from left to right. When an operand is encountered, it is added to the postfix
expression. If an operator is encountered, we compare its priority with the operator at the top of the stack.
If the current operator has higher priority, it is pushed onto the stack. If the current operator has lower
or equal priority, we pop operators from the stack and add them to the postfix expression until we reach
an operator with lower priority or an opening parenthesis. When we encounter a closing parenthesis, we
pop operators from the stack and add them to the postfix expression until we reach the corresponding
opening parenthesis, which is then discarded. The result is a postfix expression that can be evaluated
efficiently.
For example, to convert the infix expression (A + B) * C, we scan it from left to right. We first
encounter the operand A, which is added to the postfix expression. Next, we encounter the operator +,
which has lower priority than the opening parenthesis, so we push it onto the stack. We then encounter
the operand B, which is added to the postfix expression. Finally, we encounter the closing parenthesis,
so we pop the operator + from the stack and add it to the postfix expression, followed by the operator *.
The resulting postfix expression is A B + C *.
In the same way, we can convert the infix expression A * B + C * D to the postfix expression A B *
C D * +, as shown in the figure below.
75
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
A * B + C * D
* *
* * + + + + +
A B * C D * +
76
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
To convert an infix expression to postfix expression, we can use a HashMap named ”prec” to store the
priority of operators. This HashMap maps each operator to an integer that is used to compare the priority
of operators. The priority of parentheses is assigned the lowest value so that any operator compared with
them has higher priority. Operators are limited to ”+-*/”, while operands are defined as uppercase letters
A − Z or digits 0 − 9.
1 // infix_to_postfix.rs
2 use std::collections::HashMap;
3
4 fn infix_to_postfix(infix: &str) -> Option<String> {
5 // check parenthesis
6 if !par_checker3(infix) {
7 return None;
8 }
9
10 // set priority of all symbols
11 let mut prec = HashMap::new();
12 prec.insert("(", 1); prec.insert(")", 1);
13 prec.insert("+", 2); prec.insert("-", 2);
14 prec.insert("*", 3); prec.insert("/", 3);
15
16 // ops: svae operators, postfix: svae postfix expression
17 let mut ops = Stack::new();
18 let mut postfix = Vec::new();
19 for token in infix.split_whitespace() {
20 // characters range from 0 - 9 and A-Z can
21 // be pushed onto the stack
22 if ("A" <= token && token <= "Z") ||
23 ("0" <= token && token <= "9") {
24 postfix.push(token);
25 } else if "(" == token {
26 // open mark, push onto the stack
27 ops.push(token);
28 } else if ")" == token {
29 // close mark, pop out from stack
30 let mut top = ops.pop().unwrap();
31 while top != "(" {
32 postfix.push(top);
33 top = ops.pop().unwrap();
34 }
35 } else {
36 // check the priority of operators
37 while (!ops.is_empty()) &&
38 (prec[ops.peek().unwrap()]
39 >= prec[token]) {
40 postfix.push(ops.pop().unwrap());
41 }
42 ops.push(token);
43 }
44 }
45
46 // push the remaining operators onto the stack
77
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
47 while !ops.is_empty() {
48 postfix.push(ops.pop().unwrap())
49 }
50 // pop out operators and create the postfix expression
51 let mut postfix_str = "".to_string();
52 for c in postfix {
53 postfix_str += &c.to_string();
54 postfix_str += " ";
55 }
56
57 Some(postfix_str)
58 }
59
60 fn main() {
61 let infix = "( A + B ) * ( C + D )";
62 let postfix = infix_to_postfix(infix);
63 match postfix {
64 Some(val) => { println!("{infix} -> {val}"); },
65 None => {
66 println!("{infix} isn't a correct infix string");
67 },
68 }
69 // ( A + B ) * ( C + D ) -> A B + C D + *
70 }
When calculating postfix expressions, special attention must be given to the ”-” and ”/” operators.
Unlike the ”+” and ”*” operators, the order of operands matters for ”-” and ”/”. For example, A / B and
B / A, A - B and B - A are completely different and cannot be handled like ”+”, and ”*”. Assuming the
postfix expression is a space-separated string with operators ”+-*/” and operands as integers, the output
is also an integer. The following are the algorithm steps for calculating postfix expressions:
1. Create an empty stack named op_stack.
2. Split the string into a list of symbols.
3. Scan the symbol list from left to right. If the symbol is an operand, convert it from a character
to an integer and push the value onto op_stack. If the symbol is an operator, pop op_stack twice. The
second pop is the first operand, and the first pop is the second operand. Perform the arithmetic operation
and push the result back onto the operand stack.
4. When the entire input expression has been processed, the result is on the stack. Pop op_stack to
get the final result of the operation.
4 5 6 * +
5 5 30
4 4 4 4 34
The specific steps for evaluating postfix expressions are shown in the above figure, and below is the
implementation code.
78
4.3. STACK CHAPTER 4. BASIC DATA STRUCTURES
1 // postfix_eval.rs
2
3 fn postfix_eval(postfix: &str) -> Option<i32> {
4 // the expression needs at least two operands and
5 // one operator, and two spaces to separate them.
6 if postfix.len() < 5 { return None; }
7
8 let mut ops = Stack::new();
9 for token in postfix.split_whitespace() {
10 // Strings can be compared directly
11 if "0" <= token && token <= "9" {
12 ops.push(token.parse::<i32>().unwrap());
13 } else {
14 // For subtraction and division, the order matters
15 let op2 = ops.pop().unwrap();
16 let op1 = ops.pop().unwrap();
17 let res = do_calc(token, op1, op2);
18 ops.push(res);
19 }
20 }
21 // The value remained in the stack is the result
22 Some(ops.pop().unwrap())
23 }
24
25 // Perform arithmetic operations
26 fn do_calc(op: &str, op1: i32, op2: i32) -> i32 {
27 if "+" == op {
28 op1 + op2
29 } else if "-" == op {
30 op1 - op2
31 } else if "*" == op {
32 op1 * op2
33 } else if "/" == op {
34 if 0 == op2 {
35 panic!("ZeroDivisionError: Invalid operation!");
36 }
37 op1 / op2
38 } else {
39 panic!("OperatorError: Invalid operator: {:?}", op);
40 }
41 }
42
43 fn main() {
44 let postfix = "1 2 + 1 2 + *";
45 let res = postfix_eval(postfix);
46 match res {
47 Some(val) => println!("res = {val}"),
48 None => println!("{postfix} isn't a valid postfix"),
49 }
50 // res = 9
51 }
79
4.4. QUEUE CHAPTER 4. BASIC DATA STRUCTURES
4.4 Queue
A queue is an ordered collection of items that has a front and a rear end. New items are added at the
rear, and items are removed from the front. Each element added to the queue moves towards the front
until it becomes the next item to be removed. This type of ordering is known as First In First Out (FIFO).
A stack, in contrast, uses Last In First Out (LIFO) ordering.
Queues are commonly used in everyday situations such as long lines of people waiting to board a
train or bus, or at self-service restaurants. Queues have limited behavior because they have only one
entrance and one exit. It is not possible to cut in line or leave early; one must wait for their turn. While
it is true that real-life queues and queue data structures may allow cutting in line, this discussion does
not consider that possibility.
Operating systems use queues as a data structure to control processes. Multiple different queues
are used in scheduling algorithms to prioritize executing programs as quickly as possible and service as
many users as possible. When typing on a keyboard, a delay may occur before characters appear on the
screen. This is because the keystrokes are placed in a buffer similar to a queue.
enque 4 2 1 2 7 8 3 5 0 1 deque
80
4.4. QUEUE CHAPTER 4. BASIC DATA STRUCTURES
81
4.4. QUEUE CHAPTER 4. BASIC DATA STRUCTURES
47 }
48
49 fn iter(&self) -> Iter<T> {
50 let mut iterator = Iter { stack: Vec::new() };
51 for item in self.data.iter() {
52 iterator.stack.push(item);
53 }
54 iterator
55 }
56
57 fn iter_mut(&mut self) -> IterMut<T> {
58 let mut iterator = IterMut { stack: Vec::new() };
59 for item in self.data.iter_mut() {
60 iterator.stack.push(item);
61 }
62 iterator
63 }
64 }
65
66 // Implemention of 3 iterations
67 struct IntoIter<T>(Queue<T>);
68 impl<T: Clone> Iterator for IntoIter<T> {
69 type Item = T;
70 fn next(&mut self) -> Option<Self::Item> {
71 if !self.0.is_empty() {
72 Some(self.0.data.remove(0))
73 } else {
74 None
75 }
76 }
77 }
78
79 struct Iter<'a, T: 'a> { stack: Vec<&'a T>, }
80 impl<'a, T> Iterator for Iter<'a, T> {
81 type Item = &'a T;
82 fn next(&mut self) -> Option<Self::Item> {
83 if 0 != self.stack.len() {
84 Some(self.stack.remove(0))
85 } else {
86 None
87 }
88 }
89 }
90
91 struct IterMut<'a, T: 'a> { stack: Vec<&'a mut T> }
92 impl<'a, T> Iterator for IterMut<'a, T> {
93 type Item = &'a mut T;
94 fn next(&mut self) -> Option<Self::Item> {
95 if 0 != self.stack.len() {
96 Some(self.stack.remove(0))
97 } else {
98 None
82
4.4. QUEUE CHAPTER 4. BASIC DATA STRUCTURES
99 }
100 }
101 }
1 fn main() {
2 basic();
3 iter();
4 fn basic() {
5 let mut q = Queue::new(4);
6 let _r1 = q.enqueue(1); let _r2 = q.enqueue(2);
7 let _r3 = q.enqueue(3); let _r4 = q.enqueue(4);
8 if let Err(error) = q.enqueue(5) {
9 println!("Enqueue error: {error}");
10 }
11 if let Some(data) = q.dequeue() {
12 println!("dequeue data: {data}");
13 } else {
14 println!("empty queue");
15 }
16 println!("empty: {}, len: {}", q.is_empty(), q.len());
17 println!("full: {}", q.is_full());
18 println!("q: {:?}", q);
19 q.clear();
20 println!("{:?}", q);
21 }
22
23 fn iter() {
24 let mut q = Queue::new(4);
25 let _r1 = q.enqueue(1); let _r2 = q.enqueue(2);
26 let _r3 = q.enqueue(3); let _r4 = q.enqueue(4);
27 let sum1 = q.iter().sum::<i32>();
28 let mut addend = 0;
29 for item in q.iter_mut() {
30 *item += 1;
31 addend += 1;
32 }
33 let sum2 = q.iter().sum::<i32>();
34 println!("{sum1} + {addend} = {sum2}");
35 println!("sum = {}", q.into_iter().sum::<i32>());
36 }
37 }
Here is the result of the execution.
Enqueue error: No space available
dequeue data: 1
empty: false, len: 3
full: false
q: Queue { cap: 4, data: [4, 3, 2] }
Queue { cap: 4, data: [] }
10 + 4 = 14
sum = 14
83
4.4. QUEUE CHAPTER 4. BASIC DATA STRUCTURES
Mon
Tom Bob
Kew Marry
Lisa
The game of hot potato is comparable to the Josephus problem, a legendary story recounted by the
historian Flavius Josephus. Josephus and his comrades were trapped by the Roman army in a cave and
chose to die instead of becoming Roman slaves. They formed a circle and counted clockwise to the eighth
person, who was then killed. This continued until only one person was left alive. As a mathematician,
Josephus figured out where he should sit to be the last person alive and ultimately joined the Roman
side. While there are variations of this story, the fundamental ideas of both the hot potato game and the
Josephus problem can be simulated using a queue.
To implement the hot potato game, the program accepts multiple names of children and a constant
num to set the count of how many children to skip before removing one. Assuming the child holding the
hot potato is always at the front of the queue, they leave the queue and re-enter at the back, effectively
passing the potato to the next child who must be at the front of the queue. After num rounds of dequeuing
84
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
and enqueuing, the child at the front of the queue is permanently removed. The process repeats until only
one name remains.
The queue model for the Hot Potato game is clearly illustrated in the two figures above. Here is the
implementation of the game based on this model.
1 // hot_potato.rs
2
3 fn hot_potato(names: Vec<&str>, num: usize) -> &str {
4 // Initialize the queue and enqueue the names
5 let mut q = Queue::new(names.len());
6 for name in names {
7 let _nm = q.enqueue(name);
8 }
9
10 while q.size() > 1 {
11 // Dequeue and enqueue the names,
12 // which simulates passing the potato
13 for _i in 0..num {
14 let name = q.dequeue().unwrap();
15 let _rm = q.enqueue(name);
16 }
17
18 // After num dequeue/enqueue cycles
19 // remove one person
20 let _rm = q.dequeue();
21 }
22
23 q.dequeue().unwrap()
24 }
1 fn main() {
2 let name = vec!["Mon","Tom","Kew","Lisa","Marry","Bob"];
3 let survivor = hot_potato(name, 8);
4 println!("The survival person is {survivor}");
5 // The survival person is Marry
6 }
It is worth noting that the counting value used in the implementation is 8, which is greater than the
number of people in the queue (which is 6). However, this is not a problem because the queue functions
like a circle that returns to the head after reaching the tail, and continues until the count is reached.
Therefore, there will always be someone dequeued in the end.
4.5 Deque
A deque also is a linear data structure with two ends: the front and the rear. Unlike a queue, a deque
allows items to be added and removed from both ends. This flexibility makes it a hybrid linear structure
that can function as both a stack and a queue.
Although a deque shares similarities with both a stack and a queue, it does not enforce LIFO or FIFO
ordering. The order of adding and removing data determines whether it acts like a stack or a queue. It is
important to note that a deque should not be used as a replacement for either a stack or a queue, as each
data structure has its own unique properties and is designed for specific computational purposes. The
image below provides an example of Deque.
85
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
enqueue enqueue
4 2 1 2 7 8 3 5 0 1
deque deque
86
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
87
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
88
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
99 // Implementing 3 iterations
100 struct IntoIter<T>(Deque<T>);
101 impl<T: Clone> Iterator for IntoIter<T> {
102 type Item = T;
103 fn next(&mut self) -> Option<Self::Item> {
104 // first element of a tuple is not empty
105 if !self.0.is_empty() {
106 Some(self.0.data.remove(0))
107 } else {
108 None
109 }
110 }
111 }
112
113 struct Iter<'a, T: 'a> { stack: Vec<&'a T>, }
114 impl<'a, T> Iterator for Iter<'a, T> {
115 type Item = &'a T;
116 fn next(&mut self) -> Option<Self::Item> {
117 if 0 != self.stack.len() {
118 Some(self.stack.remove(0))
119 } else {
120 None
121 }
122 }
123 }
124
125 struct IterMut<'a, T: 'a> { stack: Vec<&'a mut T> }
126 impl<'a, T> Iterator for IterMut<'a, T> {
127 type Item = &'a mut T;
128 fn next(&mut self) -> Option<Self::Item> {
129 if 0 != self.stack.len() {
130 Some(self.stack.remove(0))
131 } else {
132 None
133 }
134 }
135 }
136
137 fn main() {
138 basic();
139 iter();
140
141 fn basic() {
142 let mut d = Deque::new(4);
143 let _r1 = d.add_front(1);
144 let _r2 = d.add_front(2);
145 let _r3 = d.add_rear(3);
146 let _r4 = d.add_rear(4);
147
148 if let Err(error) = d.add_front(5) {
149 println!("add_front error: {error}");
150 }
89
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
90
4.5. DEQUE CHAPTER 4. BASIC DATA STRUCTURES
deque r u s t deque
To check for palindromes using a deque, we start by processing the input string from left to right and
adding each character to the rear of the deque. At this point, the front of the deque holds the first character
of the string, while the rear holds the last character. We can then use the deque’s feature of dequeuing
from both ends to compare the front and rear characters. If they match, we continue dequeuing the front
and rear characters until either all the characters are used up, leaving an empty deque, or a deque of size
1 is left. In both cases, the string is a palindrome. Any other situation indicates that the string is not a
palindrome. Below is the code implementaion of palindrome detection using a deque.
1 // palindrome_checker.rs
2 fn palindrome_checker(pal: &str) -> bool {
3 let mut d = Deque::new(pal.len());
4 for c in pal.chars() {
5 let _r = d.add_rear(c);
6 }
7
8 let mut is_pal = true;
9 while d.size() > 1 && is_pal {
10 let head = d.remove_front();
11 let tail = d.remove_rear();
12 if head != tail {
13 is_pal = false;
14 }
15 }
16 is_pal
17 }
18
19 fn main() {
20 let pal = "rustsur";
21 let is_pal = palindrome_checker(pal);
22 println!("{pal} is palindrome string: {is_pal}");
23 // rustsur is palindrome string: true
24
25 let pal = "panda";
26 let is_pal = palindrome_checker(pal);
27 println!("{pal} is palindrome string: {is_pal}");
28 // panda is palindrome string: false
29 }
91
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
4.6 LinkedList
An ordered collection of data items is important to maintain the relative position of data and efficient
indexing. Arrays and linked lists both allow data to be collected in an ordered manner and saved in
relative positions, making them suitable for implementing ordered data types. For example, Rust’s de-
fault implementation of Vec uses an array. This section focuses on linked lists, which offer some unique
advantages.
Unlike arrays, linked lists do not require elements to be stored in contiguous memory. Each item in
the collection has a reference to the next item, so the items can be randomly placed without the need to
allocate a block of memory, resulting in higher efficiency. To use a linked list, we need to specify the
position of the first item explicitly. Once we know the position of the first item, we can determine the
position of the second item and so on until the end of the entire linked list.
20
Head
10
40 30
Usually, linked lists provide a reference to the head of the list, and the last item must be set to null or
an empty next item, also known as the tail. With this setup, we can easily traverse the linked list from
the head to the tail, or vice versa, by following the references between items. This makes linked lists an
efficient data structure for implementing other ordered data types.
92
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
data next
20 p
In Rust, None is used to represent the absence of a next node in both the Node and the linked list.
The new function initializes a grounded node with its next field set to None. Explicitly assigning None
to the next field is a good practice that helps avoid dangling pointers, a common issue in languages like
C/C++.
List head
40 20 80 30
The following code shows the implementation of the linked list, with the node link defined as Link
for code clarity.
1 // linked_list.rs
2
3 // The node link uses a Box pointer (size determined),
4 // because only with a determined size can memory be allocated.
5 type Link<T> = Option<Box<Node<T>>>;
6
93
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
94
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
59
60 // peek_mut get a mutable reference
61 fn peek_mut(&mut self) -> Option<&mut T> {
62 self.head.as_mut().map(|node| &mut node.elem )
63 }
64
65 // Implementation of iteration for the linked list.
66 // into_iter: makes the linked list an iterator
67 // by consuming it
68 // iter: returns an immutable iterator without modifying
69 // the linked list
70 // iter_mut: returns a mutable iterator without modifying
71 // the linked list
72 fn into_iter(self) -> IntoIter<T> {
73 IntoIter(self)
74 }
75
76 fn iter(&self) -> Iter<T> {
77 Iter { next: self.head.as_deref() }
78 }
79
80 fn iter_mut(&mut self) -> IterMut<T> {
81 IterMut { next: self.head.as_deref_mut() }
82 }
83 }
84
85 // Implementation of three iterations
86 struct IntoIter<T>(List<T>);
87 impl<T> Iterator for IntoIter<T> {
88 type Item = T;
89 fn next(&mut self) -> Option<Self::Item> {
90 // (List<T>) tuple's 0th item
91 self.0.pop()
92 }
93 }
94
95 struct Iter<'a, T: 'a> { next: Option<&'a Node<T>> }
96 impl<'a, T> Iterator for Iter<'a, T> {
97 type Item = &'a T;
98 fn next(&mut self) -> Option<Self::Item> {
99 self.next.map(|node| {
100 self.next = node.next.as_deref();
101 &node.elem
102 })
103 }
104 }
105
106 struct IterMut<'a, T: 'a> { next: Option<&'a mut Node<T>> }
107 impl<'a, T> Iterator for IterMut<'a, T> {
108 type Item = &'a mut T;
109 fn next(&mut self) -> Option<Self::Item> {
110 self.next.take().map(|node| {
95
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
96
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
163 }
164
165 fn iter_test() {
166 let mut list = List::new();
167 list.push(1); list.push(2); list.push(3);
168
169 let mut iter = list.iter();
170 assert_eq!(iter.next(), Some(&3));
171 assert_eq!(iter.next(), Some(&2));
172 assert_eq!(iter.next(), Some(&1));
173 assert_eq!(iter.next(), None);
174 println!("iter test Ok!");
175 }
176
177 fn iter_mut_test() {
178 let mut list = List::new();
179 list.push(1); list.push(2); list.push(3);
180
181 let mut iter = list.iter_mut();
182 assert_eq!(iter.next(), Some(&mut 3));
183 assert_eq!(iter.next(), Some(&mut 2));
184 assert_eq!(iter.next(), Some(&mut 1));
185 assert_eq!(iter.next(), None);
186 println!("iter_mut test Ok!");
187 }
188 }
97
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
18 }
19 }
20 }
21
22 // Linked list stack
23 #[derive(Debug, Clone)]
24 struct LStack<T> {
25 size: usize,
26 top: Link<T>, // Top controls the entire stack
27 }
28
29 impl<T: Clone> LStack<T> {
30 fn new() -> Self {
31 Self {
32 size: 0,
33 top: None
34 }
35 }
36
37 fn is_empty(&self) -> bool {
38 0 == self.size
39 }
40
41 fn len(&self) -> usize {
42 self.size
43 }
44
45 fn clear(&mut self) {
46 self.size = 0;
47 self.top = None;
48 }
49
50 // take out the node on the top, leaving an empty space
51 // that could be filled later
52 fn push(&mut self, val: T) {
53 let mut node = Node::new(val);
54 node.next = self.top.take();
55 self.top = Some(Box::new(node));
56 self.size += 1;
57 }
58
59 fn pop(&mut self) -> Option<T> {
60 self.top.take().map(|node| {
61 let node = *node;
62 self.top = node.next;
63 self.size -= 1;
64 node.data
65 })
66 }
67
68 // Return a reference to the data in the linked list stack
69 fn peek(&self) -> Option<&T> {
98
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
70 self.top.as_ref().map(|node| &node.data)
71 }
72
73 fn peek_mut(&mut self) -> Option<&mut T> {
74 self.top.as_deref_mut().map(|node| &mut node.data)
75 }
76
77 // into_iter: the linked list stack changes
78 // iter: the linked list stack remains unchanged
79 // iter_mut: the linked list stack remains unchanged
80 fn into_iter(self) -> IntoIter<T> {
81 IntoIter(self)
82 }
83
84 fn iter(&self) -> Iter<T> {
85 Iter { next: self.top.as_deref() }
86 }
87
88 fn iter_mut(&mut self) -> IterMut<T> {
89 IterMut { next: self.top.as_deref_mut() }
90 }
91 }
92
93 // Implement three iterations
94 struct IntoIter<T: Clone>(LStack<T>);
95 impl<T: Clone> Iterator for IntoIter<T> {
96 type Item = T;
97 fn next(&mut self) -> Option<Self::Item> {
98 self.0.pop()
99 }
100 }
101
102 struct Iter<'a, T: 'a> { next: Option<&'a Node<T>> }
103 impl<'a, T> Iterator for Iter<'a, T> {
104 type Item = &'a T;
105 fn next(&mut self) -> Option<Self::Item> {
106 self.next.map(|node| {
107 self.next = node.next.as_deref();
108 &node.data
109 })
110 }
111 }
112
113 struct IterMut<'a, T: 'a> { next: Option<&'a mut Node<T>> }
114 impl<'a, T> Iterator for IterMut<'a, T> {
115 type Item = &'a mut T;
116 fn next(&mut self) -> Option<Self::Item> {
117 self.next.take().map(|node| {
118 self.next = node.next.as_deref_mut();
119 &mut node.data
120 })
121 }
99
4.6. LINKEDLIST CHAPTER 4. BASIC DATA STRUCTURES
122 }
123
124 fn main() {
125 basic();
126 iter();
127
128 fn basic() {
129 let mut s = LStack::new();
130 s.push(1); s.push(2); s.push(3);
131
132 println!("empty: {:?}", s.is_empty());
133 println!("top: {:?}, size: {}", s.peek(), s.len());
134 println!("pop: {:?}, size: {}", s.pop(), s.len());
135
136 let peek_mut = s.peek_mut();
137 if let Some(data) = peek_mut {
138 *data = 4
139 }
140 println!("top {:?}, size {}", s.peek(), s.len());
141
142 println!("{:?}", s);
143 s.clear();
144 println!("{:?}", s);
145 }
146
147 fn iter() {
148 let mut s = LStack::new();
149 s.push(1); s.push(2); s.push(3);
150
151 let sum1 = s.iter().sum::<i32>();
152 let mut addend = 0;
153 for item in s.iter_mut() {
154 *item += 1;
155 addend += 1;
156 }
157 let sum2 = s.iter().sum::<i32>();
158 println!("{sum1} + {addend} = {sum2}");
159
160 assert_eq!(9, s.into_iter().sum::<i32>());
161 }
162 }
The output of the code will depend on how the stack is used and what functions are called.
empty: false
top: Some(3), size: 3
pop: Some(3), size: 2
top Some(4), size 2
LStack { size: 2, top: Some(Node { data: 4,
next: Some(Node { data: 1, next: None }) }) }
LStack { size: 0, top: None }
6 + 3 = 9
100
4.7. VEC CHAPTER 4. BASIC DATA STRUCTURES
4.7 Vec
In this chapter, we have demonstrated how the basic data type Vec can be used to implement various
abstract data types, including stacks, queues, deques, and linked lists. Vec is a powerful yet simple data
container that offers mechanisms for data collection and various operations, which is why we repeatedly
use it as the underlying data structure for implementing other data types. It is similar to Python’s List and
is very convenient to use. However, not all programming languages include Vec, or not all data types
may be suitable for it. In some cases, Vec or similar data containers must be implemented separately by
programmers.
Finally, it is worth noting that while Vec is a powerful data structure, it may not always be the best
choice for a particular use case. Programmers should carefully consider the requirements of their project
and choose the appropriate data structure accordingly.
101
4.7. VEC CHAPTER 4. BASIC DATA STRUCTURES
Since references play a crucial role in Vec, it is necessary to maintain a reference to the first node.
The linked list is created with None indicating that the list does not reference any content, as shown in
Figure (4.19). The head of the linked list refers to the first node of the list, which stores the address of
the next node. It is important to note that the Vec itself does not contain any node objects but only a
reference to the first node in the linked list structure.
List head
90 80 70 60
To add a new item to the linked list, the only entry point is through the head of the list. As all other
nodes can only be accessed by following the next link from the first node, the most efficient way to add a
new node is to add it to the head of the linked list. This approach adds the new item as the first element,
with the existing items linked to it.
While the implementation of Vec presented here is unordered, it is possible to implement an ordered
Vec using a data comparison function with total or partial orders. The following LVec provides only a
portion of the functionality of the standard library Vec, and print_lvec is used to print its data items.
1 // lvec.rs
2
3 use std::fmt::Debug;
4
5 #[derive(Debug)]
6 struct Node<T> {
7 elem: T,
8 next: Link<T>,
9 }
10
11 type Link<T> = Option<Box<Node<T>>>;
12
13 impl<T> Node<T> {
14 fn new(elem: T) -> Self {
15 Self {
16 elem: elem,
17 next: None
18 }
19 }
20 }
21
22 // LinkedList Vec definition
23 #[derive(Debug)]
24 struct LVec<T> {
25 size: usize,
26 head: Link<T>,
27 }
28
29 impl<T: Copy + Debug> LVec<T> {
30 fn new() -> Self {
31 Self { size: 0, head: None }
102
4.7. VEC CHAPTER 4. BASIC DATA STRUCTURES
32 }
33
34 fn is_empty(&self) -> bool {
35 0 == self.size
36 }
37
38 fn len(&self) -> usize {
39 self.size
40 }
41
42 fn clear(&mut self) {
43 self.size = 0;
44 self.head = None;
45 }
46
47 fn push(&mut self, elem: T) {
48 let node = Node::new(elem);
49 if self.is_empty() {
50 self.head = Some(Box::new(node));
51 } else {
52 let mut curr = self.head.as_mut().unwrap();
53
54 // find the last node in the list
55 for _i in 0..self.size-1 {
56 curr = curr.next.as_mut().unwrap();
57 }
58
59 // insert the new data after the last node
60 curr.next = Some(Box::new(node));
61 }
62 self.size += 1;
63 }
64
65 // add a new LVec to the end of the stack
66 fn append(&mut self, other: &mut Self) {
67 while let Some(node) = other.head.as_mut().take() {
68 self.push(node.elem);
69 other.head = node.next.take();
70 }
71 other.clear();
72 }
73
74 fn insert(&mut self, mut index: usize, elem: T) {
75 if index >= self.size { index = self.size; }
76
77 // three cases for inserting a new node
78 let mut node = Node::new(elem);
79 if self.is_empty() { // LVec is empty
80 self.head = Some(Box::new(node));
81 } else if index == 0 { // insert at the beginning of
the list
82 node.next = self.head.take();
103
4.7. VEC CHAPTER 4. BASIC DATA STRUCTURES
83 self.head = Some(Box::new(node));
84 } else { // insert int the middle of the list
85 let mut curr = self.head.as_mut().unwrap();
86 for _i in 0..index - 1 { // find the right insert
position
87 curr = curr.next.as_mut().unwrap();
88 }
89 node.next = curr.next.take();
90 curr.next = Some(Box::new(node));
91 }
92 self.size += 1;
93 }
94
95 fn pop(&mut self) -> Option<T> {
96 if self.size < 1 {
97 return None;
98 } else {
99 self.remove(self.size - 1)
100 }
101 }
102
103 fn remove(&mut self, index: usize) -> Option<T> {
104 if index >= self.size { return None; }
105
106 // two cases for deleting a node
107 let mut node;
108 if 0 == index {
109 node = self.head.take().unwrap();
110 self.head = node.next.take();
111 } else { // find the position which will be deleteed
and arrange the links properly
112 let mut curr = self.head.as_mut().unwrap();
113 for _i in 0..index - 1 {
114 curr = curr.next.as_mut().unwrap();
115 }
116 node = curr.next.take().unwrap();
117 curr.next = node.next.take();
118 }
119 self.size -= 1;
120
121 Some(node.elem)
122 }
123
124 fn into_iter(self) -> IntoIter<T> {
125 IntoIter(self)
126 }
127
128 fn iter(&self) -> Iter<T> {
129 Iter { next: self.head.as_deref() }
130 }
131
132 fn iter_mut(&mut self) -> IterMut<T> {
104
4.7. VEC CHAPTER 4. BASIC DATA STRUCTURES
105
4.7. VEC CHAPTER 4. BASIC DATA STRUCTURES
106
4.8. SUMMARY CHAPTER 4. BASIC DATA STRUCTURES
It’s important to note that LVec is a linked list with n nodes, and any method that requires traversing
nodes, such as insert, push, pop, remove, etc., has a time complexity of O(n). Although on average
it may only require traversing half as many nodes, in the worst case, every node in the list must be
processed.
4.8 Summary
This chapter focuses on several linear data structures, including stack, queue, double-ended queue,
linkedList, and Vec.
A stack is a type of data structure that maintains last-in-first-out (LIFO) ordering and has basic oper-
ations such as push, pop, and is_empty. Stacks are useful for designing algorithms that compute parsed
expressions and can provide inversion properties that are useful in implementing operating system func-
tion calls, website save functions, and more. Prefix, infix, and postfix expressions can all be handled
with a stack, but computers do not typically use infix expressions.
A queue is a simple data structure that maintains a first-in-first-out (FIFO) sorting property and has
basic operations such as enqueue, dequeue, and is_empty. Queues are very useful in scheduling system
tasks and can help build timed task emulations.
A double-ended queue(Deque) is a data structure that allows a mixed behavior of stack and queue.
Its basic operations include is_empty, add_front, add_rear, remove_front, and remove_rear. Although a
double-ended queue can be used as a stack or a queue, it is recommended to use it only as a double-ended
queue.
A linkedList is a collection of items, each of which is stored in a relative position in the list. The
implementation of a linked list itself maintains logical order and does not need to be stored in physical
order. Modifying the head of a linked list is a special case.
Vec is a data container that comes with Rust, and the default implementation uses dynamic arrays.
In this chapter, however, we use a linked list for this purpose.
107
Chapter 5
Recursion
5.1 Objectives
• Understanding simple recursive solutions
• Learning how to write programs using recursion
• Understanding and applying the three laws of recursion
• Understanding recursion as a form of iteration
• Formulating problems into recursive solutions
• Understanding how computers implement recursion
108
5.2. WHAT IS RECURSION? CHAPTER 5. RECURSION
The process of constructing small additions can be done using elementary school-level knowledge
of constructing fully parenthesized expressions, of which there are of course two forms as shown below.
2+1+7+4+5=2+1+7+4+5
sum = ((((2 + 1) + 7) + 4) + 5)
sum = (((3 + 7) + 4) + 5)
sum = (10 + 4) + 5)
sum = (14 + 5)
sum = 19
(5.1)
=2+1+7+4+5
sum = (2 + (1 + (7 + (4 + 5))))
sum = (2 + (1 + (7 + 9)))
sum = (2 + (1 + 16))
sum = (2 + 17)
sum = 19
Overall, recursion provides an elegant and effective solution to seemingly difficult problems, al-
lowing for efficient computation and solving programming challenges even in languages with limited
capabilities.
The given expression can be correctly parenthesized using either form of parentheses on the right-
hand side. By following the rule of parentheses precedence and internal priority, we can treat the paren-
thesized expression as a sequence of small additions. The pattern of both parts and the entire expression
is entirely recursive, and we can simulate it without using While or For loops.
If we observe the calculation starting with ”sum” and read it from bottom to top, we can see that the
first term is 19, followed by (2 + 17), and then (2 + (1 + 16)). The total sum is the sum of the first term
and the remaining terms on the right-hand side. We can further decompose it into the sum of its first
term and the remaining terms on the right-hand side. Therefore, we can express this mathematically as
follows:
Sum(nums) = F irst(nums) + Sum(restR(nums)) (5.2)
This is the calculation method for the second form of parenthesized expressions. Similarly, we also
have a calculation method for the first form of parenthesized expressions.
The equation defines various functions such as First(nums), restR(nums), Last(nums), and restL(nums)
that return the required elements from the array.
We can implement two recursive methods to calculate expressions in Rust. The nums_sum1 function
sums the remaining terms after nums[0], while the nums_sum2 function sums the last term and all the
terms before it. Both implementations have the same time and space complexities and do not differ from
each other.
1 // nums_sum12.rs
2
3 // Form : Sum(nums) = First(nums) + Sum(restR(nums))
4 fn nums_sum1(nums: &[i32]) -> i32 {
5 if 1 == nums.len() {
6 nums[0]
7 } else {
8 let first = nums[0];
9 first + nums_sum1(&nums[1..])
10 }
109
5.2. WHAT IS RECURSION? CHAPTER 5. RECURSION
11 }
12
13 // Form : Sum(nums) = Last(nums) + Sum(restL(nums))
14 fn nums_sum2(nums: &[i32]) -> i32 {
15 if 1 == nums.len() {
16 nums[0]
17 } else {
18 let last = nums[nums.len() - 1];
19 nums_sum2(&nums[..nums.len() - 1]) + last
20 }
21 }
22
23 fn main() {
24 let nums = [2,1,7,4,5];
25 let sum1 = nums_sum1(&nums);
26 let sum2 = nums_sum2(&nums);
27 println!("sum1 = {sum1}, sum2 = {sum2}");
28 // sum1 = 19, sum2 = 19
29
30 let nums = [-1,7,1,2,5,4,10,100];
31 let sum1 = nums_sum1(&nums);
32 let sum2 = nums_sum2(&nums);
33 println!("sum1 = {sum1}, sum2 = {sum2}");
34 // sum1 = 128, sum2 = 128
35 }
The crucial part of the code is the if and else statements and their respective forms. The condition of if
1 == nums.len() is crucial as it marks the turning point of the function. At this point, the numerical value
is directly returned without any further mathematical calculation. In the else statement, the function calls
itself, achieving the effect of solving parentheses layer by layer and calculating their values. This is the
essence of recursion, where a function calls itself until it reaches a base case.
110
5.2. WHAT IS RECURSION? CHAPTER 5. RECURSION
sum(2,1,7,4,5) = 2+
sum(1,7,4,5) = 1+
sum(7,4,5) = 7+
sum(4,5) = 4+
sum(5) = 5
111
5.2. WHAT IS RECURSION? CHAPTER 5. RECURSION
112
5.2. WHAT IS RECURSION? CHAPTER 5. RECURSION
23 }
24
25 numstr
26 }
27
28 fn main() {
29 let num = 100;
30 let sb = num2str_stk(100, 2);
31 let so = num2str_stk(100, 8);
32 let sh = num2str_stk(100, 16);
33 println!("{num} = b{sb}, o{so}, x{sh}");
34 // 100 = b1100100, o144, x64
35 }
To solve this problem recursively, we need to determine the base case. Suppose there is a tower of
Hanoi with three pegs: left, middle, and right, and five disks on the left peg. We can assume that we
know how to move one disk to the right. This is the base case. If we know how to move n-1 disks to the
middle peg, we can easily move the bottom disk to the right peg and then move the n-1 disks from the
middle peg to the right peg. If we don’t know how to move n-1 disks to the middle peg, we can assume
that we know how to move n-2 disks to the middle peg, then move the n-1th disk to the right peg, and
then move the n-2 disks from the middle peg to the top of the n-1th disk on the right peg. This process
is an abstraction of the disk movement process.
In summary, we can organize the above operation process into the following algorithm to solve the
Tower of Hanoi puzzle recursively.
1 Move height-1 disks to the middle peg using the target peg as a
helper.
2 Move the last disk to the target peg.
3 Move height-1 disks from the middle peg to the target peg using
the starting peg as a helper.
113
5.3. TAIL RECURSION CHAPTER 5. RECURSION
To solve the Towers of Hanoi problem, the recursive three laws must be followed to ensure that larger
discs remain at the bottom of the stack. The base case is when there is only one disc, which can be moved
to its final destination. The algorithm then reduces the height of the Towers of Hanoi through steps 1
and 3, moving towards the base case. The solution can be implemented in a few lines of Rust code using
recursion, as shown below.
1 // hanoi.rs
2
3 // p: pole
4 fn hanoi(height:u32, src_p:&str, des_p:&str, mid_p:&str) {
5 if height >= 1 {
6 hanoi(height - 1, src_p, mid_p, des_p);
7 println!("move disk[{height}] from
8 {src_p} to {des_p}");
9 hanoi(height - 1, mid_p, des_p, src_p);
10 }
11 }
12
13 fn main() {
14 hanoi(1, "A", "B", "C");
15 hanoi(2, "A", "B", "C");
16 hanoi(3, "A", "B", "C");
17 hanoi(4, "A", "B", "C");
18 hanoi(5, "A", "B", "C");
19 hanoi(6, "A", "B", "C");
20 }
To simulate the problem, one can use three pens and paper to move the discs according to the output
of the hanoi function for heights 1, 2, 3, and 4.
114
5.3. TAIL RECURSION CHAPTER 5. RECURSION
9 }
10
11 fn nums_sum4(sum: i32, nums: &[i32]) -> i32 {
12 if 1 == nums.len() {
13 sum + nums[0]
14 } else {
15 nums_sum4(sum + nums[nums.len() - 1],
16 &nums[..nums.len() - 1])
17 }
18 }
19
20 fn main() {
21 let nums = [2,1,7,4,5];
22 let sum1 = nums_sum3(0, &nums);
23 let sum2 = nums_sum4(0, &nums);
24 println!("sum1 is {sum1}, sum2 is {sum2}");
25 // sum1 is 19, sum2 is 19
26 }
Ultimately, the implementation of recursive programs depends on the programmer’s ability to write
clear and efficient code. If tail recursion does not cause stack overflow and is easy to understand, it can
be a useful tool in algorithm optimization.
2
2
1
7 5 1
4
4 7
5
115
5.4. DYNAMIC PROGRAMMING CHAPTER 5. RECURSION
To implement this approach, the numBills function is used to calculate the number of bills required
for a given amount of change. The function considers five cases, where the change can be made up of
different denominations of bills. The algorithm checks if the change amount is equal to a denomination
of bill on line 8, which serves as a base case. If not, the function recursively calls itself with a reduced
change amount and the next smallest denomination of bill. The count of bills used is incremented before
each recursive call to keep track of the total number of bills used in the final solution
1 // rec_mc1.rs
2
3 fn rec_mc1(cashes: &[u32], amount: u32) -> u32 {
4 // Calculates the minimum number of coins required for
5 // change when using only 1-yuan bills.
6 let mut min_cashes = amount;
7
8 if cashes.contains(&amount) {
9 return 1;
10 } else {
116
5.4. DYNAMIC PROGRAMMING CHAPTER 5. RECURSION
To address this issue, one solution is to store previously calculated results to avoid redundant cal-
culations. One approach is to store the current minimum number of bills in a list and check this list
before calculating a new minimum value. If the result already exists, use the stored value instead of
recalculating it. This technique is an example of trading space for time in algorithm design.
117
5.4. DYNAMIC PROGRAMMING CHAPTER 5. RECURSION
1 // rc_mc2.rs
2
3 fn rec_mc2(cashes: &[u32], amount: u32,
4 min_cashes: &mut [u32]) -> u32
5 {
6 // Calculates the minimum number of coins required for
7 // change when using only 1-yuan bills.
8 let mut min_cashe_num = amount;
9
10 if cashes.contains(&amount) {
11 // Collects denominations that match the
12 // current change value.
13 min_cashes[amount as usize] = 1;
14 return 1;
15 } else if min_cashes[amount as usize] > 0 {
16 // If the change amount already has the minimum number
17 // of coins required for change, return directly.
18 return min_cashes[amount as usize];
19 } else {
20 for c in cashes.iter()
21 .filter(|&&c| c <= amount)
22 .collect::<Vec<&u32>>() {
23
24 let cashe_num = 1 + rec_mc2(cashes,
25 amount - c,
26 min_cashes);
27
28 // Updates the minimum number of coins
29 // required for change.
30 if cashe_num < min_cashe_num {
31 min_cashe_num = cashe_num;
32 min_cashes[amount as usize] = min_cashe_num;
33 }
34 }
35 }
36
37 min_cashe_num
38 }
39
40 fn main() {
41 let amount = 90u32;
42 let cashes: [u32; 5] = [1,5,10,20,50];
43 let mut min_cashes: [u32; 91] = [0; 91];
44 let cashe_num = rec_mc2(&cashes, amount, &mut min_cashes);
45 println!("need refund {cashe_num} cashes");
46 // need refund 3 cashes
47 }
The rec_mc2 is less time-consuming because it uses the variable min_cashes to store intermediate
values. It is important to note that although this section discusses dynamic programming, both programs
presented are recursive rather than dynamic programming. The second program only saves intermediate
values during recursion, using a memory or cache technique to reduce the time required for calculations.
118
5.4. DYNAMIC PROGRAMMING CHAPTER 5. RECURSION
119
5.4. DYNAMIC PROGRAMMING CHAPTER 5. RECURSION
The iterative dynamic programming code is much more concise than the previous recursive versions
and reduces the use of the stack. However, it is important to note that just because a problem can be
solved using recursion does not mean it is the best solution.
While the dynamic programming algorithm finds the minimum number of bills required, it does not
indicate which denominations of bills are used. To obtain this information, a table, cashes_used, can be
added to record the denominations and quantities of bills used. The algorithm can be extended by adding
the denomination of the last bill used for each amount to the cashes_used table and then continuously
finding the last bill used for the previous amount until the end.
1 // dp_rc_mc_show.rs
2
3 // The algorithm uses cashes_used to collect
4 // the denominations of coins used,
5 fn dp_rec_mc_show(cashes: &[u32], amount: u32,
6 min_cashes: &mut [u32],
7 cashes_used: &mut [u32]) -> u32 {
8 for denm in 1..=amount {
9 let mut min_cashe_num = denm ;
10 // With a minimum denomination of 1 yuan
11 let mut used_cashe = 1;
12 for c in cashes.iter()
13 .filter(|&c| *c <= denm)
14 .collect::<Vec<&u32>>() {
15 let index = (denm - c) as usize;
16 let cashe_num = 1 + min_cashes[index];
17 if cashe_num < min_cashe_num {
18 min_cashe_num = cashe_num;
19 used_cashe = *c;
20 }
21 }
22 // update the minimum number of coins
23 // required for each amount
24 min_cashes[denm as usize] = min_cashe_num;
25 cashes_used[denm as usize] = used_cashe;
26 }
27
28 min_cashes[amount as usize]
29 }
30
31 // prints the denominations of coins used
32 fn print_cashes(cashes_used: &[u32], mut amount: u32) {
33 while amount > 0 {
34 let curr = cashes_used[amount as usize];
35 println!("${curr}");
36 amount -= curr;
37 }
38 }
39
40 fn main() {
41 let amount = 81u32; let cashes = [1,5,10,20,50];
42 let mut min_cashes: [u32; 82] = [0; 82];
43 let mut cashes_used: [u32; 82] = [0; 82];
120
5.4. DYNAMIC PROGRAMMING CHAPTER 5. RECURSION
121
5.5. SUMMARY CHAPTER 5. RECURSION
It is important to note that a problem that can be solved with dynamic programming may not neces-
sarily be solvable with recursion as they have different requirements.
5.5 Summary
In this chapter, we covered both recursive and iterative algorithms. Recursive algorithms must satisfy
three laws, and while recursion can sometimes replace iteration, it is not always the optimal solution.
Recursion can be a natural way to express a problem, but tail recursion is an optimization technique that
can reduce stack usage. Dynamic programming is a useful approach for solving optimization problems
by breaking down large problems into smaller ones and gradually constructing larger solutions. While
recursion solves problems by breaking them down, dynamic programming works by building up from
small to large problems.
122
Chapter 6
Searching
6.1 Objectives
• Be able to implement sequential search and binary search algorithms.
• Understand the concept of using hash tables as a search technique.
• Use Vec to implement a HashMap data structure.
123
6.3. THE SEQUENTIAL SEARCH CHAPTER 6. SEARCHING
data items in slices are ordered, and can be accessed in order, making these data structures linear as well.
Similarly, the stacks, queues, and linked lists that we have previously studied are also linear. Based on
this same linear logic inherent in the physical world, a natural search technique is linear search, which
is also known as sequential search.
Linear search works by starting at the first item in the slice and moving from one item to another
in order until the target item is found or the entire slice is traversed. If the item being searched for is
not found after traversing the entire slice, it means that the item does not exist. The following figure
illustrates how this search works.
begin
58 26 92 19 72 33 44 56 20 66
124
6.3. THE SEQUENTIAL SEARCH CHAPTER 6. SEARCHING
Of course, sequencial search can also return the specific position of the search item, or return None
if it is not found.
1 // sequential_search_pos.rs
2
3 fn sequential_search_pos(nums:&[i32],num:i32) -> Option<usize>{
4 let mut pos: usize = 0;
5 let mut found = false;
6 while pos < nums.len() && !found {
7 if num == nums[pos] {
8 found = true;
9 } else {
10 pos += 1;
11 }
12 }
13
14 if found { Some(pos) } else { None }
15 }
16
17 fn main() {
18 let num = 8;
19 let nums = [9,3,7,4,1,6,2,8,5];
20 match sequential_search_pos(&nums, num) {
21 Some(pos) => println!("{num}'s index: {pos}"),
22 None => println!("nums does not contain {num}"),
23 }
24 // 8's index: 7
25 }
∑
n
O(i)/n = O(n/2) = O(n) (6.1)
i=1
When the set becomes large, the complexity of sequential search on a random sequence is O(n), as
1/2 can be ignored. However, if the data items are ordered in ascending order and the probability of the
125
6.3. THE SEQUENTIAL SEARCH CHAPTER 6. SEARCHING
target item existing in any of the n positions is still the same, the search performance can be improved.
If the target item does not exist, the search can be accelerated through techniques like binary search. For
example, if searching for the target item 50, the comparison is performed in order until 56. At this point,
it can be determined that there is no target value of 50 behind it because the items behind 56 are larger
than 56, and the algorithm stops searching.
begin
19 20 26 33 44 56 58 66 72 92
The sequential search algorithm used on sorted data sets is shown below. It optimizes the algorithm
by setting the ”stop” variable to control the search and stop immediately when it goes beyond the range
for saving time.
1 // ordered_sequential_search.rs
2
3 fn ordered_sequential_search(nums:&[i32], num:i32) -> bool {
4 let mut pos = 0;
5 let mut found = false;
6 // return when reads on ordered data
7 let mut stop = false;
8
9 while pos < nums.len() && !found && !stop {
10 if num == nums[pos] {
11 found = true;
12 } else if num < nums[pos] {
13 // ordered data, return
14 stop = true;
15 } else {
16 pos += 1;
17 }
18 }
19
20 found
21 }
22
23 fn main() {
24 let nums = [1,3,8,10,15,32,44,48,50,55,60,62,64];
25 let num = 44;
26 let found = ordered_sequential_search(&nums, num);
27 println!("nums contains {num}: {found}");
28 // nums contains 44: true
29
30 let num = 49;
31 let found = ordered_sequential_search(&nums, num);
32 println!("nums contains {num}: {found}");
33 // nums contains 49: false
34 }
126
6.4. THE BINARY SEARCH CHAPTER 6. SEARCHING
In the case of sorted data, if the target item is not in the set and is less than the first item, only one
comparison is needed to determine that the item is not in the set. In the worst case, the algorithm would
need to compare the target item with all n items in the set, resulting in n comparisons. The average
number of comparisons is still n/2, and the complexity remains at O(n). However, the O(n) complexity
is better than the search on unordered data because most searches follow the average case. In fact, the
average case complexity for sorted data sets can be twice as fast as unordered ones. Thus, sorting has
always been a crucial topic in computer science. The complexity of sequential search for unordered and
ordered data sets is summarized in the table below.
19 20 26 33 44 56 58 66 72 92
Binary search is a search algorithm that is used on sorted data sets. It is faster than sequential search
since it divides the data set into two parts using low, mid, and high to control the range of the search. If
the middle item is the one being searched for, the search is completed. If not, the ordered property of the
127
6.4. THE BINARY SEARCH CHAPTER 6. SEARCHING
sorted set is used to eliminate half of the remaining items. By repeating the comparison and omission
process, the target item is found relatively quickly.
To implement binary search, we set low and high to the far left and far right, respectively. For
instance, if we want to find 60, we compare it with the middle value of 44. If it is greater than 44, we
move low to 56 and mid to 66. We then compare it with 66 and find that it is less than 66, so we move
high to 58 and mid to 56. This process continues until we reach the target item.
1 // binary_search.rs
2
3 fn binary_search1(nums: &[i32], num: i32) -> bool {
4 let mut low = 0;
5 let mut high = nums.len() - 1;
6 let mut found = false;
7
8 // note: <= not <
9 while low <= high && !found {
10 let mid: usize = (low + high) >> 1;
11
12 //low + high may cause overflow, use substraction
13 //let mid: usize = low + ((high - low) >> 1);
14
15 if num == nums[mid] {
16 found = true;
17 } else if num < nums[mid] {
18 // num < mid, drop right half of data
19 high = mid - 1;
20 } else {
21 // num >= mid, drop left half of data
22 low = mid + 1;
23 }
24 }
25
26 found
27 }
28
29 fn main() {
30 let nums = [1,3,8,10,15,32,44,48,50,55,60,62,64];
31
32 let target = 3;
33 let found = binary_search1(&nums, target);
34 println!("nums contains {target}: {found}");
35 // nums contains 3: true
36
37 let target = 63;
38 let found = binary_search1(&nums, target);
39 println!("nums contains {target}: {found}");
40 // nums contains 63: false
41 }
Binary search satisfies the three laws of recursion and can be implemented using recursion. However,
recursive implementation involves slicing the dataset and discarding the mid item, which can cause stack
overflow risk. Therefore, it is generally recommended to use an iterative approach to implement binary
search.
128
6.4. THE BINARY SEARCH CHAPTER 6. SEARCHING
1 // binary_search.rs
2
3 fn binary_search2(nums: &[i32], num: i32) -> bool {
4 // base case1: target does not exisit
5 if 0 == nums.len() { return false; }
6
7 let mid: usize = nums.len() >> 1;
8
9 // base case2: target exisits
10 if num == nums[mid] {
11 return true;
12 } else if num < nums[mid] {
13 // minimize problem size
14 return binary_search2(&nums[..mid], num);
15 } else {
16 return binary_search2(&nums[mid+1..], num);
17 }
18 }
19
20 fn main() {
21 let nums = [1,3,8,10,15,32,44,48,50,55,60,62,64];
22
23 let target = 3;
24 let found = binary_search2(&nums, target);
25 println!("nums contains {target}: {found}");
26 // nums contains 3: true
27
28 let target = 63;
29 let found = binary_search2(&nums, target);
30 println!("nums contains {target}: {found}");
31 // nums contains 63: false
32 }
In summary, binary search is a fast and intuitive search algorithm that works best on sorted data sets.
It can be implemented using recursion but an iterative approach is generally preferred to avoid stack
overflow risk.
129
6.4. THE BINARY SEARCH CHAPTER 6. SEARCHING
While binary search may seem efficient, it’s not worth sorting and using binary search when n is very
small. In such cases, sequential search may be more efficient. Additionally, sorting large datasets for
binary search can be time-consuming and memory-intensive, making sequential search a more efficient
option. However, binary search is well-suited for datasets that are neither too large nor too small, making
it an ideal choice for many practical scenarios.
(27 − 1)(13 − 0)
x= +0
35 − 1 (6.4)
x=9
Starting with nums[9] which has a value of 28, which is greater than 27, we use it as the upper bound.
Since the index of 28 is 9, we continue the interpolation algorithm to search for elements in the range
[0,8].
(27 − 1)(8 − 0)
x= +0
27 − 1 (6.5)
x=8
Finally, we arrive at nums[9], which has a value of 27, and the algorithm ends.
1 // interpolation_search.rs
2
3 fn interpolation_search(nums: &[i32], target: i32) -> bool {
4 if nums.is_empty() {
5 return false;
6 }
7
8 let mut low = 0usize;
9 let mut high = nums.len() - 1;
10 loop {
11 let low_val = nums[low];
12 let high_val = nums[high];
13
14 // in this case, target isn't in the nums
15 if high <= low || target < low_val
130
6.4. THE BINARY SEARCH CHAPTER 6. SEARCHING
131
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
to check the values at positions 21 , 22 , and 23 , which are 4, 7, and 15, all of which are less than 22.
Checking position 24 = 16 is beyond the range, so the upper bound is the last index, 14.
1 // exponential_search.rs
2
3 fn exponential_search(nums: &[i32], target: i32) -> bool {
4 let size = nums.len();
5 if size == 0 { return false; }
6
7 // find the upper bound
8 let mut high = 1usize;
9 while high < size && nums[high] < target {
10 high <<= 1;
11 }
12 // use the half of the upper bound as the lower bound
13 let low = high >> 1;
14
15 // use binary_search method implemented previously
16 binary_search(&nums[low..size.min(high+1)], target)
17 }
18
19 fn main() {
20 let nums = [1,9,10,15,16,17,19,23,27,28,29,30,32,35];
21 let target = 27;
22 let found = exponential_search(&nums, target);
23 println!("nums contains {target}: {found}");
24 // nums contains 27: true
25 }
Note that in the implementation of the exponential search algorithm, the lower bound at line 13 is
half of the high but can also be set to 0. However, setting it to 0 may reduce efficiency.
The complexity of exponential search is divided into two parts: finding the upper bound for di-
viding the search range, and performing binary search. The complexity of finding the upper bound
is related to the target value i, and its complexity is O(log(i)). The complexity of binary search is
O(log(n)), where n is the length of the search range. The length of the search range is high − low =
2log(i) − 2log(i)−1 = 2log(i)−1 , and its complexity is O(log(2log(i)−1 )) = O(log(i)). Therefore, the
total complexity is O(log(i) + log(i)) = O(log(i)).
132
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
0 1 2 3 4 5 6 7 8 9 10
None None None None None None None None None None None
The mapping between data items and their corresponding slots in the hash table is accomplished
through a hash function. The hash function takes any item in the collection and returns a specific slot
name, a process known as hashing. For example, if there are integer items [24, 61, 84, 41, 56, 31], and a
hash table with a capacity of 11 slots, each item’s location in the hash table can be obtained by inputting
it into the hash function. A simple hash function involves using modulo, since any number modulo 11
will have a remainder within 11, ensuring that there is always a slot available to store the data.
0 1 2 3 4 5 6 7 8 9 10
Calculated the hash value, the item can be inserted into the corresponding slot in the hash table, as
depicted in the figure above. Currently, 6 of the 11 slots in the table are occupied, resulting in a load factor
of λ = 6/11. The load factor is a useful metric to evaluate the hash table, particularly when it needs to
store a large number of items. A high load factor indicates that there is limited space for additional items,
necessitating resizing the table. In languages like Rust and Go, resizing occurs automatically when the
load factor exceeds a particular threshold, preparing for future data insertion.
Although the data stored in the hash table is unsorted, the hash function enables the calculation of
the slot of a data item regardless of its disorderliness. For instance, to check if the number 56 exists
in the table, its hash value can be computed (hash(56) = 1), and slot 1 can be examined to locate 56,
indicating its presence in the table. The search operation has a complexity of O(1), making hash search
very efficient. However, conflicts may occur and must be resolved, or else the hash table cannot be
utilized. For example, if 97 is added, and its hash value (hash(97) = 9) corresponds to an occupied slot
(9) that contains 31 instead of 97, a collision occurs that necessitates resolution.
133
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
Another hash algorithm is the ”mid-square” method. The item is squared first, and then the middle
portion of the square is extracted as the value for which the remainder is computed. For example, if the
number is 36, its square is 1296, and taking the middle portion, 29, and then computing the remainder
of 11 yields hash(29) = 7, indicating that 36 should be stored in slot 7.
When storing a string, the remainder can also be calculated based on the ASCII values of its char-
acters. The string ”rust” has four characters with ASCII values of [114, 117, 115, 116], and their sum
is 462. Computing the remainder results in hash(462) = 0. Other strings, such as ”Java” with ASCII
values [74, 97, 118, 97], may also be tried. Its sum is 386, and taking the remainder gives hash(386) =
1, indicating that ”Java” should be stored in slot 1, as shown in the figure below.
0 1 2 3 4 5 6 7 8 9 10
Starting the index from 1 is preferred to ensure that the first character contributes to the total. Below
is an example calculation and code snippet.
134
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
1 // hash.rs
2 fn hash2(astr: &str, size: usize) -> usize {
3 let mut sum = 0;
4 for (i, c) in astr.chars().enumerate() {
5 sum += (i + 1) * (c as usize);
6 }
7 sum % size
8 }
9
10 fn main() {
11 let (s1, s2, size) = ("rust", "Rust", 11);
12 let p1 = hash2(s1, size);
13 let p2 = hash2(s2, size);
14 // rust in slot 2, Rust in slot 3
15 }
Efficiency is crucial when designing a hash function to ensure that it does not become the bottleneck
of the system. If the hash function is too complex, it could break the O(1) complexity.
0 1 2 3 4 5 6 7 8 9 10
When inserting 35, its position should be slot 2, but we find that slot 2 already contains 24. Therefore,
we start searching for an empty slot from slot 2 and find that slot 3 is empty, so we insert 35 there.
0 1 2 3 4 5 6 7 8 9 10
When we insert 47, we find that slot 3 has the value 35, so we search for the next empty slot and find
that slot 4 is empty, so we insert 47 there.
0 1 2 3 4 5 6 7 8 9 10
135
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
To search for items in a hash table built using open addressing, we must use the same method as used
during its construction. For instance, if we are searching for item 56 with a hash of 1, we locate it in slot
56 and return true. However, if we are looking for item 35, which has a hash of 2, we may find another
item like 24 instead of 35 due to collisions. To avoid returning false, we need to conduct a sequential
search until we locate 35, an empty slot, or we loop back to 24 before returning the result.
However, using a sequential search may lead to data clustering, where items cluster in the table
due to multiple collisions occurring in the same hash slot. Linear probing fills subsequent slots with
conflicting items, forcing the originally intended values to be inserted elsewhere. Sequential searches
are time-consuming, with a complexity greater than O(1). To address data clustering, open addressing
technology can be extended to skip several slots instead of sequentially searching for the next open
slot. By checking every three slots when a conflict occurs, for example, the conflicting items are more
evenly distributed, thus dispersing the conflict. This method has a significant effect and can alleviate
data clustering, as shown in the figure below.
0 1 2 3 4 5 6 7 8 9 10
When adding item 35, a conflict may occur, prompting the search to skip every three slots from that
point onwards, thus dispersing the conflict. Consequently, item 47 can be inserted without any conflict.
This method has proven effective in alleviating data clustering, as demonstrated in the following figure.
Rehashing is the process of finding an alternative slot in a hash table after a collision occurs. This
process involves calculating a new hash value using a specified skip size, which must ensure that all
slots in the table can eventually be accessed. To guarantee this, it is advisable to use a prime number as
the table size, as exemplified by the use of 11 in the accompanying example.
Another approach to resolving conflicts is the chain method. This method involves setting up a linked
list to store data items for each conflicting position, as shown in Figure (6.6). When searching, conflicts
are resolved by sequentially searching the chain, which has a complexity of O(n). If the data in the
conflict chain is sorted, a binary search can be utilized to achieve a complexity of O(log2(n)). If the
chain becomes too long, it can be transformed into a red-black tree to enhance its structural stability. In
many programming languages, the chain method is the default implementation for resolving conflicts in
hash table data structures.
0 1 2 3 4 5 6 7 8 9 10
33 35 73 42
99 51 97
66 64
136
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
In actual implementation, a HashMap in Rust is made up of two separate Vecs - one for storing the
keys (called slot) and the other for storing the values (called data). The slot Vec saves the keys with the
index starting from 1, while the default value in the slot Vec is 0. The HashMap is encapsulated by a
struct, and a cap parameter is added to control the capacity.
1 // hashmap.rs
2
3 #[derive(Debug, Clone, PartialEq)]
4 struct HashMap <T> {
5 cap: usize, // capacity
6 slot: Vec<usize>, // store data address (index)
7 data: Vec<T>, // store elements
8 }
137
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
The rehash function in Rust can be implemented using a linear search method that adds 1, which
is simple and easy to implement. The initial size of the HashMap is set to 11, but it can also be set to
other prime numbers, such as 13, 17, 19, 23, 29, etc. Below is the complete implementation code for a
HashMap in Rust.
1 // hashmap.rs
2
3 impl<T: Clone + PartialEq + Default> HashMap<T> {
4 fn new(cap: usize) -> Self {
5 // Initialize slot and data
6 let mut slot = Vec::with_capacity(cap);
7 let mut data = Vec::with_capacity(cap);
8 for _i in 0..cap{
9 slot.push(0);
10 data.push(Default::default());
11 }
12
13 HashMap { cap, slot, data }
14 }
15
16 fn len(&self) -> usize {
17 let mut len = 0;
18 for &d in self.slot.iter() {
19 // If slot is not empty, then increase len by 1
20 if 0 != d {
21 len += 1;
22 }
23 }
24 len
25 }
26
27 fn is_empty(&self) -> bool {
28 let mut empty = true;
29 for &d in self.slot.iter() {
30 if 0 != d {
31 empty = false;
32 break;
33 }
34 }
35 empty
36 }
37
38 fn clear(&mut self) {
39 let mut slot = Vec::with_capacity(self.cap);
40 let mut data = Vec::with_capacity(self.cap);
41 for _i in 0..self.cap{
42 slot.push(0);
43 data.push(Default::default());
44 }
45
46 self.slot = slot;
47 self.data = data;
138
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
48 }
49
50 fn hash(&self, key: usize) -> usize {
51 key % self.cap
52 }
53
54 fn rehash(&self, pos: usize) -> usize {
55 (pos + 1) % self.cap
56 }
57
58 fn insert(&mut self, key: usize, value: T) {
59 if 0 == key { panic!("Error: key must > 0"); }
60
61 let pos = self.hash(key);
62 if 0 == self.slot[pos] {
63 // If the slot is empty, insert directly
64 self.slot[pos] = key;
65 self.data[pos] = value;
66 } else {
67 // If the slot is not empty, then
68 // find next available position
69 let mut next = self.rehash(pos);
70 while 0 != self.slot[next]
71 && key != self.slot[next] {
72 next = self.rehash(next);
73
74 // If the slot is full, exit
75 if next == pos {
76 println!("Error: slot is full!");
77 return;
78 }
79 }
80
81 // Insert the data in the found slot
82 if 0 == self.slot[next] {
83 self.slot[next] = key;
84 self.data[next] = value;
85 } else {
86 self.data[next] = value;
87 }
88 }
89 }
90
91 fn remove(&mut self, key: usize) -> Option<T> {
92 if 0 == key { panic!("Error: key must > 0"); }
93
94 let pos = self.hash(key);
95 if 0 == self.slot[pos] {
96 // If the slot is empty, return None
97 None
98 } else if key == self.slot[pos] {
99 // If found the same key,
139
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
140
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
141
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
204
205 struct IterMut<'a, T: 'a> { stack: Vec<&'a mut T>, }
206 impl<'a, T> Iterator for IterMut<'a, T> {
207 type Item = &'a mut T;
208 fn next(&mut self) -> Option<Self::Item> {
209 self.stack.pop()
210 }
211 }
212
213 fn main() {
214 basic();
215 iter();
216
217 fn basic() {
218 let mut hmap = HashMap::new(11);
219 hmap.insert(2,"dog");
220 hmap.insert(3,"tiger");
221 hmap.insert(10,"cat");
222
223 println!("empty: {}, size: {:?}",
224 hmap.is_empty(), hmap.len());
225 println!("contains key 2: {}", hmap.contains(2));
226
227 println!("key 3: {:?}", hmap.get(3));
228 let val_ptr = hmap.get_mut(3).unwrap();
229 *val_ptr = "fish";
230 println!("key 3: {:?}", hmap.get(3));
231 println!("remove key 3: {:?}", hmap.remove(3));
232 println!("remove key 3: {:?}", hmap.remove(3));
233
234 hmap.clear();
235 println!("empty: {}, size: {:?}",
236 hmap.is_empty(), hmap.len());
237 }
238
239 fn iter() {
240 let mut hmap = HashMap::new(11);
241 hmap.insert(2,"dog");
242 hmap.insert(3,"tiger");
243 hmap.insert(10,"cat");
244
245 for item in hmap.iter() {
246 println!("val: {item}");
247 }
248
249 for item in hmap.iter_mut() {
250 *item = "fish";
251 }
252
253 for item in hmap.iter() {
254 println!("val: {item}");
255 }
142
6.5. THE HASH SEARCH CHAPTER 6. SEARCHING
256 }
257 }
Here is the code execution output.
empty: false, size: 3
contains key 2: true
key 3: Some("tiger")
key 3: Some("fish")
remove key 3: Some("fish")
remove key 3: None
empty: true, size: 0
val: cat
val:
val:
val:
val:
val:
val:
val: tiger
val: dog
val:
val:
val: fish
val: fish
val: fish
val: fish
val: fish
val: fish
val: fish
val: fish
val: fish
val: fish
val: fish
143
6.6. SUMMARY CHAPTER 6. SEARCHING
6.6 Summary
This chapter covers several search algorithms, namely sequential search, binary search, interpolation
search, exponential search and hash search. Sequential search is a straightforward algorithm that has a
complexity of O(n). Binary search, on the other hand, is a fast algorithm that cuts the data set in half each
time, but it requires the data to be sorted, with a complexity of O(log2 (n)). Other search algorithms,
such as interpolation search and exponential search, build on binary search and are suitable for different
types of data distribution. Hash search is a highly efficient O(1) search algorithm that uses a HashMap.
However, it is crucial to consider that hash tables are prone to collisions and need appropriate measures,
such as open addressing and chaining, to resolve them. Sorting is a helpful technique that speeds up
search algorithms, and in the next chapter, we will delve into sorting algorithms.
144
Chapter 7
Sorting
7.1 Objectives
• Learn sorting algorithms.
• Be able to implement the ten basic sorting algorithms in Rust.
145
7.3. THE BUBBLE SORT CHAPTER 7. SORTING
To ensure that the order of the elements in a sorted collection remains stable, it is important to con-
sider stability when evaluating sorting algorithms. Stability refers to whether the relative order of equal
elements in the original collection is maintained in the sorted collection. For example, if the amount: 5
appears twice in a collection and sorting changes the order of the two occurrences, this can cause issues
for algorithms that rely on the stability of the sequence, such as deduction operations.
There are numerous sorting algorithms available for collections, but there are ten basic types of
sorting algorithms that serve as the foundation for most of them. These include bubble sort, quick sort,
selection sort, heap sort, insertion sort, shell sort, merge sort, counting sort, bucket sort, and radix sort.
Many improved algorithms have been derived from these ten basic algorithms. In this article, we will
explain several of these improved algorithms, including new bubble sort, cocktail sort, comb sort, binary
insertion sort, flash sort, and Tim sort.
92 84 66 56 44 31 72 19 24
84 92 66 56 44 31 72 19 24
84 66 92 56 44 31 72 19 24
84 66 56 92 44 31 72 19 24
84 66 56 44 92 31 72 19 24
84 66 56 44 31 92 72 19 24
84 66 56 44 31 72 92 19 24
84 66 56 44 31 72 19 92 24
84 66 56 44 31 72 19 24 92
The diagonal line in the figure above represents the maximum value, and it keeps moving towards the
right, just like bubbles rising to the top. Bubble sort involves frequent swap operations, which are com-
146
7.3. THE BUBBLE SORT CHAPTER 7. SORTING
monly used auxiliary operations in comparisons. In Rust, the Vec data structure defaults to implementing
the swap() function, but you can also implement the following swap operation.
1 // swap
2 let temp = data[i];
3 data[i] = data[j];
4 data[j] = temp;
Some programming languages provide a convenient way to swap values without using temporary
variables, such as: data[i], data[j] = data[j], data[i]. Although this feature still uses variables internally,
it operates on two variables simultaneously, as shown in the figure below.
47 84
In this chapter, we only deal with sets of numbers to simplify the algorithm design. Therefore, we
can use a Vec to implement bubble sort.
1 // bubble_sort.rs
2
3 fn bubble_sort1(nums: &mut [i32]) {
4 if nums.len() < 2 {
5 return;
6 }
7
8 for i in 1..nums.len() {
9 for j in 0..nums.len()-i {
10 if nums[j] > nums[j+1] {
11 nums.swap(j, j+1);
12 }
13 }
14 }
15 }
Here is an example of bubble sort.
1 fn main() {
2 let mut nums = [54,26,93,17,77,31,44,55,20];
3 bubble_sort1(&mut nums);
4 println!("sorted nums: {:?}", nums);
5 // sorted nums: [17, 20, 26, 31, 44, 54, 55, 77, 93]
6 }
It is worth noting that to avoid a double for-loop, we can also use a while-loop to implement bubble
sort.
1 // bubble_sort.rs
2
3 fn bubble_sort2(nums: &mut [i32]) {
4 let mut len = nums.len() - 1;
5
6 while len > 0 {
7 for i in 0..len {
147
7.3. THE BUBBLE SORT CHAPTER 7. SORTING
n2 n
1 + 2 + ..., +n − 1 = + (7.1)
2 2
2
This means that the time complexity of bubble sort is O( n2 + n2 ) = O(n2 ).
Although both bubble sort algorithms presented above achieve sorting, even a sorted set requires
continuous comparisons and swapping of data items. To optimize the algorithm, we can add a variable
to control whether comparisons should continue and exit directly when encountering a sorted set.
1 // bubble_sort.rs
2
3 fn bubble_sort3(nums: &mut [i32]) {
4 // compare controls whether to continue comparing
5 let mut compare = true;
6 let mut len = nums.len() - 1;
7
8 while len > 0 && compare {
9 compare = false;
10 for i in 0..len {
11 if nums[i] > nums[i+1] {
12 // Data is unordered and need to compare
13 nums.swap(i, i+1);
14 compare = true;
15 }
16 }
17
18 len -= 1;
19 }
20 }
21
22 fn main() {
23 let mut nums = [54,26,93,17,77,31,44,55,20];
148
7.3. THE BUBBLE SORT CHAPTER 7. SORTING
24 bubble_sort3(&mut nums);
25 println!("sorted nums: {:?}", nums);
26 // sorted nums: [17, 20, 26, 31, 44, 54, 55, 77, 93]
27 }
Bubble sort is a sorting algorithm that compares adjacent elements in an array, starting from the first
number and exchanging positions based on their relative size. Elements are only swapped from left to
right. However, can we perform bubble sort from right to left? Yes, we can, and this bidirectional sorting
method is called cocktail sort. Cocktail sort is a variant of bubble sort that sorts in descending order when
sorting from right to left. While it slightly optimizes bubble sort, its time complexity is still O(n2 ), but
it approaches O(n) if the sequence is already sorted.
1 // cocktail_sort.rs
2
3 fn cocktail_sort(nums: &mut [i32]) {
4 if nums.len() <= 1 { return; }
5
6 // bubble controls the sort process
7 let mut bubble = true;
8 let len = nums.len();
9 for i in 0..(len >> 1) {
10 if bubble {
11 bubble = false;
12 // bubble from left to right
13 for j in i..(len - i - 1) {
14 if nums[j] > nums[j+1] {
15 nums.swap(j, j+1);
16 bubble = true
17 }
18 }
19 // bubble from right to left
20 for j in (i+1..=(len - i - 1)).rev() {
21 if nums[j] < nums[j-1] {
22 nums.swap(j-1, j);
23 bubble = true
24 }
25 }
26 } else {
27 break;
28 }
29 }
30 }
31
32 fn main() {
33 let mut nums = [1,3,2,8,3,6,4,9,5,10,6,7];
34 cocktail_sort(&mut nums);
35 println!("sorted nums {:?}", nums);
36 // sorted nums [1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10]
37 }
In contrast to bubble sort, comb sort can compare items with a distance greater than 1. Comb sort
starts by setting the gap to the length of the array and decreasing it by a fixed ratio in each iteration of
the loop, typically by multiplying it by 0.8, which is the most effective ratio determined by the original
149
7.3. THE BUBBLE SORT CHAPTER 7. SORTING
author through experimentation. When the gap is 1, comb sort degenerates into bubble sort. Comb sort
aims to move inverted numbers forward as much as possible and ensure that the numbers in the current
gap are sorted, similar to combing hair with a comb, where the gap is similar to the gap between the teeth
of the comb. Comb sort has a time complexity of O(nlogn), with a space complexity of O(1), and it is
an unstable sorting algorithm.
1 // comb_sort.rs
2
3 fn comb_sort(nums: &mut [i32]) {
4 if nums.len() <= 1 { return; }
5 let mut i;
6 let mut gap: usize = nums.len();
7
8 // ordered basicly
9 while gap > 0 {
10 gap = (gap as f32 * 0.8) as usize;
11 i = gap;
12 while i < nums.len() {
13 if nums[i-gap] > nums[i] {
14 nums.swap(i-gap, i);
15 }
16 i += 1;
17 }
18 }
19
20 // rearrange the element properly.
21 // exchange controls the process
22 let mut exchange = true;
23 while exchange {
24 exchange = false;
25 i = 0;
26 while i < nums.len() - 1 {
27 if nums[i] > nums[i+1] {
28 nums.swap(i, i+1);
29 exchange = true;
30 }
31 i += 1;
32 }
33 }
34 }
1 fn main() {
2 let mut nums = [1,2,8,3,4,9,5,6,7];
3 comb_sort(&mut nums);
4 println!("sorted nums {:?}", nums);
5 // sorted nums [1, 2, 3, 4, 5, 6, 7, 8, 9]
6 }
Bubble sort suffers from the problem that the boundary index, such as i, j, i+1, j+1, must be arranged
properly and cannot be wrong. In 2021, a new sorting algorithm [12] was published that does not require
handling boundary index values. It is intuitive and resembles bubble sort at first glance, but it is actually
similar to insertion sort. Although it looks like a descending sort, it is actually an ascending sort.
150
7.4. THE QUICK SORT CHAPTER 7. SORTING
1 // CantBelieveItCanSort.rs
2
3 fn cbic_sort1(nums: &mut [i32]) {
4 for i in 0..nums.len() {
5 for j in 0..nums.len() {
6 if nums[i] < nums[j] { nums.swap(i, j); }
7 }
8 }
9 }
10
11 fn main() {
12 let mut nums = [54,32,99,18,75,31,43,56,21,22];
13 cbic_sort1(&mut nums);
14 println!("sorted nums {:?}", nums);
15 // sorted nums [18, 21, 22, 31, 32, 43, 54, 56, 75, 99]
16 }
Of course, it can also be implemented as a descending sort by changing the less than symbol to a
greater than symbol.
1 // CantBelieveItCanSort.rs
2
3 fn cbic_sort2(nums: &mut [i32]) {
4 for i in 0..nums.len() {
5 for j in 0..nums.len() {
6 if nums[i] > nums[j] {
7 nums.swap(i, j);
8 }
9 }
10 }
11 }
1 fn main() {
2 let mut nums = [54,32,99,18,75,31,43,56,21,22];
3 cbic_sort2(&mut nums);
4 println!("sorted nums {:?}", nums);
5 // sorted nums [99, 75, 56, 54, 43, 32, 31, 22, 21, 18]
6 }
This algorithm uses only two for loops, and the index values do not need to be handled separately.
While it resembles the definition of bubble sort, it is not actually a bubble sort algorithm.
151
7.4. THE QUICK SORT CHAPTER 7. SORTING
In the following figure, the pivot value of 84 is not necessarily at the middle of the sorted collection,
and it would be more efficient to choose a value closer to the middle, such as 56, as the pivot value.
However, choose a correct pivot value is not the focus of this article.
84 92 66 56 44 31 72 19 24
To implement quicksort, two markers need to be set for comparison after selecting the pivot value
(the dark gray value). The left and right markers should be located at the far left and far right extremes
of the collection, except for the pivot value.
84 92 66 56 44 31 72 19 24
leftmark rightmark
84 92 66 56 44 31 72 19 24
leftmark rightmark
The goal of partitioning is to move the items that are misaligned with respect to the pivot value. By
comparing the values at the left and right markers with the pivot value and swapping the smaller value
to the left marker and the larger value to the right marker, a basicly sorted collection can be achieved
quickly through repeated swaps.
24 92 66 56 44 31 72 19 84
leftmark rightmark
24 92 66 56 44 31 72 19 84
leftmark rightmark
24 92 66 56 44 31 72 19 84
leftmark rightmark
To begin partitioning, right shift the left index until a value greater than or equal to the pivot value
is found. Then left shift the decreasing right index until a value less than or equal to the pivot value is
found. If the value of the left index is greater than the right index, swap the values. In this case, 84 and
24 satisfy this condition, so the values are directly swapped. Repeat this process until the left and right
indices cross each other.
After crossing, compare the values of the left and right indices. If the right is less than the left, swap
the right index value with the pivot value. Otherwise, swap the left index value with the pivot value.
The right index value serves as the splitting point, dividing the set into two intervals.
Recursively call quicksort on the left and right intervals until the sorting is completed. It is important
to note that the pivot value does not necessarily have to be the value in the middle of the collection but
152
7.4. THE QUICK SORT CHAPTER 7. SORTING
should be a value that is in the middle or close to the middle of the final sorted collection for the fastest
sorting speed.
24 19 31 44 56 66 72 92 84
leftmark rightmark
<56 >56
24 19 31 44 56 66 72 92 84
Once quicksort is executed on both the left and right sides, the sorting process is completed. If
the set’s length is less than or equal to one, it’s already sorted, and the program can exit directly. To
implement quicksort, we have a dedicated partition function that selects the first item(or selects a random
value in the data set) as the pivot value.
1 // quick_sort.rs
2
3 fn quick_sort1(nums: &mut [i32], low: usize, high: usize) {
4 if low < high {
5 let split = partition(nums, low, high);
6
7 // avoid out of range (split <= 1) and syntax error
8 if split > 1 {
9 quick_sort1(nums, low, split - 1);
10 }
11
12 quick_sort1(nums, split + 1, high);
13 }
14 }
15
16 fn partition(nums:&mut[i32], low:usize,high:usize) -> usize {
17 // left marker and right marker
18 let mut lm = low;
19 let mut rm = high;
20
21 loop {
22 // left mark move to right gradually
23 while lm <= rm && nums[lm] <= nums[low] {
24 lm += 1;
25 }
26
27 // right mark move to left gradually
28 while lm <= rm && nums[rm] >= nums[low] {
29 rm -= 1;
30 }
153
7.4. THE QUICK SORT CHAPTER 7. SORTING
31
32 // once lm > rm, return and exchange data between
33 // postion lm and rm
34 if lm > rm {
35 break;
36 } else {
37 nums.swap(lm, rm);
38 }
39 }
40
41 nums.swap(low, rm);
42
43 rm
44 }
45
46 fn main() {
47 let mut nums = [54,26,93,17,77,31,44,55,20];
48 let high = nums.len() - 1;
49 quick_sort1(&mut nums, 0, high);
50 println!("sorted nums: {:?}", nums);
51 // sorted nums: [17, 20, 26, 31, 44, 54, 55, 77, 93]
52 }
However, it’s also possible to implement quicksort directly through the recursive method without
using the partition function.
1 // quick_sort.rs
2
3 fn quick_sort2(nums: &mut [i32], low: usize, high: usize) {
4 if low >= high {
5 return;
6 }
7
8 // left marker and right marker
9 let mut lm = low;
10 let mut rm = high;
11 while lm < rm {
12 // right marker move to left gradually
13 while lm < rm && nums[low] <= nums[rm] {
14 rm -= 1;
15 }
16
17 // left marker move to right gradually
18 while lm < rm && nums[lm] <= nums[low] {
19 lm += 1;
20 }
21
22 // exchange data between position lm and rm
23 nums.swap(lm, rm);
24 }
25
26 // exchange data between position low and lm
27 nums.swap(low, lm);
154
7.5. THE INSERTION SORT CHAPTER 7. SORTING
28
29 if lm > 1 {
30 quick_sort2(nums, low, lm - 1);
31 }
32
33 quick_sort2(nums, rm + 1, high);
34 }
35
36 fn main() {
37 let mut nums = [54,26,93,17,77,31,44,55,20];
38 let high = nums.len() - 1;
39 quick_sort2(&mut nums, 0, high);
40 println!("sorted nums: {:?}", nums);
41 // sorted nums: [17, 20, 26, 31, 44, 54, 55, 77, 93]
42
43 let mut nums = [1000,-1,2,-20,89,64,0,99,73];
44 let high = nums.len() - 1;
45 quick_sort2(&mut nums, 0, high);
46 println!("sorted nums: {:?}", nums);
47 // sorted nums: [-20, -1, 0, 2, 64, 73, 89, 99, 1000]
48 }
To partition a set of length n in quicksort, there will be log(n) partitions if the partition is always in
the middle. However, finding the splitting point requires checking the pivot value against each of the n
items, resulting in a complexity of nlog(n). In the worst case, when the splitting point is far from the
middle, sorting of items 1 and n - 1 will be repeated n times, leading to a complexity of O(n2 ).
Quicksort relies on recursion, but excessive depth can decrease its performance. To overcome this
limitation, introsort switches to heap sort after the recursion depth exceeds log(n). For small numbers (n
< 20), introsort switches to insertion sort. This mixed sorting algorithm can achieve the high performance
of quicksort on regular datasets and maintain a performance of O(nlog(n)) in the worst case. Introsort
is a built-in sorting algorithm in C++.
Quicksort divides the array to be sorted into two areas for sorting. However, if there are a large
number of duplicate elements, quicksort will repeatedly compare them, resulting in wasted performance.
To address this issue, the array can be divided into three areas for sorting(three-area quicksort). Duplicate
elements are placed in the third area, and only the other two areas are sorted. The repeated data is chosen
as the pivot value, and the less-than pivot value is put into the left area, the greater-than pivot value into
the right area, and the equal pivot value into the middle area. Then, three-area quicksort is recursively
called on the left and right areas.
155
7.5. THE INSERTION SORT CHAPTER 7. SORTING
84 92 66 56 44 31 72 19 24 84 ordered
84 92 66 56 44 31 72 19 24 still ordered
66 84 92 56 44 31 72 19 24 insert 66
56 66 84 92 44 31 72 19 24 insert 56
44 56 66 84 92 31 72 19 24 insert 44
31 44 56 66 84 92 72 19 24 insert 31
31 44 56 66 72 84 92 19 24 insert 72
19 31 44 56 66 72 84 92 24 insert 19
19 24 31 44 56 66 72 84 92 insert 24
1 // insertion_sort.rs
2 fn insertion_sort(nums: &mut [i32]) {
3 if nums.len() < 2 { return; }
4 for i in 1..nums.len() {
5 let mut pos = i;
6 let curr = nums[i];
7 while pos > 0 && curr < nums[pos-1] {
8 // move element to right
9 nums[pos] = nums[pos-1];
10 pos -= 1;
11 }
12
13 // insert element: curr
14 nums[pos] = curr;
15 }
16 }
17 fn main() {
18 let mut nums = [54,32,99,18,75,31,43,56,21];
19 insertion_sort(&mut nums);
20 println!("sorted nums: {:?}", nums);
21 // sorted nums: [18, 21, 31, 32, 43, 54, 56, 75, 99]
22 }
The insertion sort requires comparing each new element with the already sorted elements one by
one. However, the binary search algorithm discussed in Chapter 6 can efficiently locate the position of
an element in a sorted subsequence. Therefore, binary search can be utilized for acceleration.
156
7.5. THE INSERTION SORT CHAPTER 7. SORTING
1 // binary_insertion_sort.rs
2
3 fn binary_insertion_sort(nums: &mut [i32]) {
4 let mut temp;
5 let mut left;
6 let mut mid;
7 let mut right;
8
9 for i in 1..nums.len() {
10 left = 0;
11
12 // Sorted array boundaries
13 right = i - 1;
14
15 // Data to be sorted
16 temp = nums[i];
17
18 // Binary search finds the position of temp
19 while left <= right {
20 mid = (left + right) >> 1;
21 if temp < nums[mid] {
22 // To prevent right = 0 - 1 case
23 if 0 == mid {
24 break;
25 }
26 right = mid - 1;
27 } else {
28 left = mid + 1;
29 }
30 }
31
32 // Move data back
33 for j in (left..=i-1).rev() {
34 nums.swap(j, j+1);
35 }
36
37 // Insert temp into the empty space
38 if left != i {
39 nums[left] = temp;
40 }
41 }
42 }
43
44 fn main() {
45 let mut nums = [1,3,2,8,6,4,9,7,5,10];
46 binary_insertion_sort(&mut nums);
47 println!("sorted nums: {:?}", nums);
48 // sorted nums: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
49 }
157
7.6. THE SHELL SORT CHAPTER 7. SORTING
84 92 66 56 44 31 72 19 24
84 92 66 56 44 31 72 19 24
84 92 66 56 44 31 72 19 24
To understand Shell sort, consider the figure above. The set in the figure has nine items. If a gap of
three is used, there will be a total of three subsets, each with three items of the same color. These separated
elements can be considered as connected together, so insertion sort can be used to sort elements of the
same color.
After sorting the subsets, the overall set is still unordered, as shown in the figure below. Although the
overall set is unordered, it is not completely unordered. Subsets of the same color are ordered. By sorting
the entire set using insertion sort, the set can be completely sorted quickly. The number of insertion sort
moves is small at this point because adjacent items are in their own subsets’ ordered positions, and these
adjacent items are almost ordered. Therefore, only a few insertion moves are needed to complete the
sorting.
56 19 24 72 44 31 84 92 66
In Shell sort, the increment is the key feature, and different increments can be used to determine the
number of subsets. The gap value is adjusted continuously in the implementation of Shell sort to achieve
sorting.
1 // shell_sort.rs
2
3 fn shell_sort(nums: &mut [i32]) {
4 // Internal function for insertion sort
5 // with elements exhcanged distance is gap
6 fn ist_sort(nums: &mut [i32], start: usize, gap: usize) {
7 let mut i = start + gap;
8
9 while i < nums.len() {
10 let mut pos = i;
11 let curr = nums[pos];
12
13 while pos >= gap && curr < nums[pos - gap] {
14 nums[pos] = nums[pos - gap];
15 pos -= gap;
158
7.7. THE MERGE SORT CHAPTER 7. SORTING
16 }
17
18 nums[pos] = curr;
19 i += gap;
20 }
21 }
22
23 // minimize the gap in every loop untill 1
24 let mut gap = nums.len() / 2;
25 while gap > 0 {
26 for start in 0..gap {
27 ist_sort(nums, start, gap);
28 }
29
30 gap /= 2;
31 }
32 }
The following are use cases.
1 fn main() {
2 let mut nums = [54,32,99,18,75,31,43,56,21,22];
3 shell_sort(&mut nums);
4 println!("sorted nums: {:?}", nums);
5 // sorted nums: [18, 21, 22, 31, 32, 43, 54, 56, 75, 99]
6
7 let mut nums = [1000,-1,2,-20,89,64,0,99,73];
8 shell_sort(&mut nums);
9 println!("sorted nums: {:?}", nums);
10 // sorted nums: [-20, -1, 0, 2, 64, 73, 89, 99, 1000]
11 }
Shell sort divides the original set into smaller subsets and applies insertion sort to each subset. The
subsets are selected by choosing an item every few items with the distance between them called the gap.
The set is then considered as connected together, and insertion sort is used to sort elements of the same
color. The final insertion operation in Shell sort is much less than in insertion sort because the collection
has been pre-sorted by earlier increments. This makes the overall sorting very efficient.
Although the complexity analysis of Shell sort is slightly more complicated, it is roughly between
O(n) and O(n2 ). By changing the gap value according to the formula 2k − 1 (1, 3, 7, 15, 31), the
3
complexity of Shell sort is approximately O(n 2 ), which is very fast. Additionally, binary search can
be used to further improve Shell sort, similar to insertion sort. However, the index processing in binary
search needs to add the gap value. Readers are encouraged to think about implementing this improved
algorithm.
159
7.7. THE MERGE SORT CHAPTER 7. SORTING
56 19 24 72 44 31 84 92 66
56 19 24 72 44 31 84 92 66
56 19 24 72 44 31 84 92 66
56 19 24 72 44 31 84 92 66
19 56 24 72 31 44 84 66 92
19 56 24 72 31 44 84 66 92
19 24 56 72 31 44 84 66 92
19 24 31 44 56 72 84 66 92
19 24 31 44 56 66 72 84 92
Merge sort breaks down the set into basic cases with one or two elements, allowing for easy direct
comparison. Then, the merge process begins by first sorting the smallest subsequence of the basic case.
Two-by-two merging is performed until the set is completely sorted. Even if the set is not evenly di-
vided, the performance is not affected, as the difference is at most one element. The merge operation is
straightforward, as each subsequence is already sorted, and only one comparison is necessary at a time.
The resulting sequence is always ordered. Below is a Rust implementation of merge sort, consisting of
two sorts and one merge operation, which is straightforward.
1 // merge_sort.rs
2
3 fn merge_sort(nums: &mut [i32]) {
4 if nums.len() > 1 {
5 let mid = nums.len() >> 1;
6 merge_sort(&mut nums[..mid]); // sort the first half
7 merge_sort(&mut nums[mid..]); // sort the last half
8 merge(nums, mid); // merge all
160
7.7. THE MERGE SORT CHAPTER 7. SORTING
9 }
10 }
11
12 fn merge(nums: &mut [i32], mid: usize) {
13 let mut i = 0; // mark element in first half of data
14 let mut k = mid; // mark element in last half of data
15 let mut temp = Vec::new();
16
17 for _j in 0..nums.len() {
18 if k == nums.len() || i == mid {
19 break;
20 }
21
22 // put into a temp colletion
23 if nums[i] < nums[k] {
24 temp.push(nums[i]);
25 i += 1;
26 } else {
27 temp.push(nums[k]);
28 k += 1;
29 }
30 }
31
32 // to make sure all data been solved
33 if i < mid && k == nums.len() {
34 for j in i..mid {
35 temp.push(nums[j]);
36 }
37 } else if i == mid && k < nums.len() {
38 for j in k..nums.len() {
39 temp.push(nums[j]);
40 }
41 }
42
43 // put temp data back to nums, finish sort
44 for j in 0..nums.len() {
45 nums[j] = temp[j];
46 }
47 }
48
49 fn main() {
50 let mut nums = [54,32,99,22,18,75,31,43,56,21];
51 merge_sort(&mut nums);
52 println!("sorted nums: {:?}", nums);
53 // sorted nums: [18, 21, 22, 31, 32, 43, 54, 56, 75, 99]
54 }
To analyze the time complexity of merge sort, we can divide the sorting process into two parts:
sorting and merging. The sorting complexity is O(log2 (n)), as we learned in the binary search section.
The merging step involves placing each item in the set on the sorted list, with a maximum of n times,
resulting in a complexity of O(n). As recursive and merging operations are combined, the performance
of merge sort is O(nlog2 (n)).
161
7.8. THE SELECTION SORT CHAPTER 7. SORTING
The space complexity of merge sort is relatively high at O(n). To reduce space usage, we can
consider optimizing merge sort using insertion sort. One approach is to use insertion sort directly when
the length is less than a certain threshold(such as 64) and to use merge sort when the length is greater
than the threshold. This algorithm is called insertion merge sort and improves the merge sort algorithm
to some extent. However, implementation details are left to the reader.
84 92 66 56 44 31 72 19 24
84 24 66 56 44 31 72 19 92
19 24 66 56 44 31 72 84 92
19 24 31 56 44 66 72 84 92
19 24 31 44 56 66 72 84 92
162
7.9. THE HEAP SORT CHAPTER 7. SORTING
19
24 31
44 56 66 72
84 92
Heap sort is a sorting algorithm designed using the heap data structure. It repeatedly selects the top
element and moves it to the end, then rebuilds the heap to implement the sorting. It is a selection sort
with worst-case, best-case, and average time complexity of O(nlog2 (n)) and is an unstable sort. As
shown in the figure above, the heap is similar to a linked list with multiple connections. The nodes in
the heap are numbered by layer, and if this logical structure is mapped to an array, it looks like the figure
below, where the first position, index 0, is occupied by 0.
0 19 24 31 44 56 66 72 84 92
0 1 2 3 4 5 6 7 8 9
163
7.9. THE HEAP SORT CHAPTER 7. SORTING
It is possible to represent heaps not only with trees, but also with arrays or Vecs(show in figure
above). In fact, using arrays or Vecs to represent heaps is more in line with the literal meaning of the
word ”heap,” which refers to a collection of things gathered together. Note that our index starts from 1,
which allows us to represent the indices of the left and right child nodes as 2i and 2i+1, making it easier
to calculate.
To satisfy the requirements of a binary tree’s node relationships, a heap represented by an array
should meet the following criteria:
• Max heap: arr[i] >= arr[2i] and arr[i] >= arr[2i+1]
• Min heap: arr[i] <= arr[2i] and arr[i] <= arr[2i+1]
Heap sort involves constructing an unsorted sequence into a min heap. The minimum value of the
entire sequence is then the root node of the heap. Swap it with the last element of the sequence, and the
last element becomes the minimum value. This minimum value is no longer considered in the heap, and
the remaining n-1 elements are reconstructed into a heap, producing a new minimum value. Swap this
minimum value to the end of the new heap, and you will have two sorted values. Repeat this process
until the entire sequence is sorted.A min heap produces a descending order sorting, while a max heap
produces an ascending order sorting.
To better illustrate the heap sort process, a figure is provided below. The dark gray color represents
the minimum element, which has been replaced by 92 and is no longer part of the heap. When 92 is at the
top of the heap, it is no longer a min heap, so a new min heap is constructed to make the minimum value
24 at the top of the heap. Then, 24 is swapped with the last element in the heap, and the second-to-last
element becomes dark gray. Note that the dotted line indicates that this element no longer belongs to
the heap. Continuing to swap, the dark gray subsequence gradually fills the entire heap in reverse order
from the last level, achieving a reverse sorting from largest to smallest. To achieve sorting from smallest
to largest, a max heap should be constructed, and the corresponding logic in the min heap should be
modified.
92
24 31
44 56 66 72
84 19
24
44 31
84 56 66 72
92 19
31
44 66
84 56 92 72
24 19
164
7.9. THE HEAP SORT CHAPTER 7. SORTING
165
7.10. THE BUCKET SORT CHAPTER 7. SORTING
51 } else {
52 left
53 };
54
55 // swap data if child node is greater than parent node
56 if nums[child] > nums[parent] {
57 nums.swap(parent, child);
58 }
59
60 // update parent and child relationship
61 parent = child;
62 }
63 }
64
65 fn main() {
66 let mut nums = [0,54,32,99,18,75,31,43,56,21,22];
67 heap_sort(&mut nums);
68 println!("sorted nums: {:?}", nums);
69 // sorted nums: [0, 18, 21, 22, 31, 32, 43, 54, 56, 75, 99]
70 }
The heap is a binary tree with a time complexity of O(nlog(n)). The time is mainly consumed in
two parts: building the heap and adjusting the heap n times. Building the heap processes n elements
with a complexity of O(n). The longest path for each adjustment is from the root to the leaf, which is
the height of the heap, logn. Therefore, the time complexity of n adjustments is O(nlog(n)), and the
final total time complexity is O(nlog(n)). Macros are used here to obtain node indexes, and functions
can also be used to implement them. Heap sort and selection sort are similar in that they both find the
maximum or minimum value in a set.
166
7.10. THE BUCKET SORT CHAPTER 7. SORTING
1 // bucket_sort.rs
2
3 struct Bucket<H, T> {
4 hasher: H, // hasher: a function, recieved when called
5 values: Vec<T>, // values: a container for data
6 }
To help readers understand bucket sort, the figure below illustrates the process of hashing data into
buckets, sorting within each bucket, and merging the sorted data to obtain the final ordered set.
80 92 66 56 44 31 70 20 24
0 1 2 3 4 5 6 7 8
80 70 20 66 56 31 92 44 24
20 70 80 31 56 66 92 24 44
20 24 31 44 56 66 70 80 92
167
7.10. THE BUCKET SORT CHAPTER 7. SORTING
168
7.11. THE COUNTING SORT CHAPTER 7. SORTING
counter = [0, 0, 0, 0, 0, 0, 0, 0, 0]
Then scan nums and calculate the index by subtracting minV from the current value. For example,
when scanning 0, the index is 0 - 0 = 0, so the value at index 0 of counter is incremented by 1. At this
point, counter is [1,0,0,0,0,0,0,0,0]. Continue scanning until the final value of counter is obtained.
[1, 2, 0, 1, 2, 2, 0, 2, 1]
When traversing the counter set, if a value at a certain index is not 0, write the corresponding index
value into nums, and decrement the value in counter. For example, if the value at the first position 0 of
counter is 1, indicating that there is one 0 in nums, write it into nums. Continuing, the value at index 1
is 2, indicating that there are two 1s in nums, write them into nums. Finally, nums is sorted as:
[0, 1, 1, 3, 4, 4, 5, 5, 7, 7, 8]
The sorting process is completed by scanning the counter set, and the sorting process does not involve
comparison, exchange, or other operations, making it very fast. Here is an implementation of counting
sort:
1 // counting_sort.rs
2
3 fn counting_sort(nums: &mut [usize]) {
4 if nums.len() <= 1 {
5 return;
6 }
7
8 // bucket number is the maximum value in nums plus 1
9 let max_bkt_num = 1 + nums.iter().max().unwrap();
10
11 // save the number of each value in nums
12 let mut counter = vec![0; max_bkt_num];
13 for &v in nums.iter() {
14 counter[v] += 1;
15 }
16
17 // write data back to original nums slice
18 let mut j = 0;
19 for i in 0..max_bkt_num {
20 while counter[i] > 0 {
21 nums[j] = i;
169
7.12. THE RADIX SORT CHAPTER 7. SORTING
22 counter[i] -= 1;
23 j += 1;
24 }
25 }
26 }
27
28 fn main() {
29 let mut nums = [54,32,99,18,75,31,43,56,21,22];
30 counting_sort(&mut nums);
31 println!("sorted nums: {:?}", nums);
32 // sorted nums: [18, 21, 22, 31, 32, 43, 54, 56, 75, 99]
33 }
170
7.12. THE RADIX SORT CHAPTER 7. SORTING
1 // radix_sort.rs
2
3 fn radix_sort(nums: &mut [usize]) {
4 if nums.len() <= 1 { return; }
5
6 // Find the largest number, which has the most digits.
7 let max_num = match nums.iter().max() {
8 Some(&x) => x,
9 None => return,
10 };
11
12 // Find the power of 2 that is greater than or equals to
13 // the length of nums as the bucket size. For example,
14 // the closest and greater power of 2 to 10 is 2^4 = 16,
15 // the closest and greater power of 2 to 17 is 2^5 = 32.
16 let radix = nums.len().next_power_of_two();
17
18 // The variable 'digit' represents the count of numbers
19 // in a bucket that are less than a certain digit.
20 // The ones, tens, hundreds, and thousands digits
21 // correspond to positions 1, 2, 3, and 4, respectively.
22 // The counting starts from the ones digit, so it is 1.
23 let mut digit = 1;
24 while digit <= max_num {
25 // Calculate the position of the data in the bucket.
26 let index_of = |x| x / digit % radix;
27 let mut counter = vec![0; radix];
28 for &x in nums.iter() {
29 counter[index_of(x)] += 1;
30 }
31 for i in 1..radix {
32 counter[i] += counter[i-1];
33 }
34
35 // sorting
36 for &x in nums.to_owned().iter().rev() {
37 counter[index_of(x)] -= 1;
38 nums[counter[index_of(x)]] = x;
39 }
40
41 // process next bucket
42 digit *= radix;
43 }
44 }
45
46 fn main() {
47 let mut nums = [0,54,32,99,18,75,31,43,56,21,22,100];
48 radix_sort(&mut nums);
49 println!("sorted nums: {:?}", nums);
50 // sorted nums: [18, 21, 22, 31, 32, 43, 54, 56, 75, 99]
51 }
171
7.13. THE TIM SORT CHAPTER 7. SORTING
To sort by the same digit, stable sorting is used because it preserves the previous sorting result. For
instance, sorting the tens digit preserves the sorting result of the ones digit, and sorting the hundreds digit
preserves the sorting result of the tens digit. Binary can also be used to solve any non-negative integer
sequence with the radix sorting algorithm. The time complexity is O(64n) assuming the maximum
integer in the sequence is 64 bits. Although comparison-based sorting algorithms have a time complexity
of O(nlog(n)), they are still faster than radix sorting since the coefficient of 64 is too large to be practical.
To use binary, k=2 is the smallest and the number of digits is the largest, so the time complexity becomes
O(nd) and the space complexity becomes O(n + k), which is smaller. Conversely, when using the
maximum value as the radix, k=maxV is the largest and the number of digits is the smallest, resulting
in a smaller time complexity of O(nd) but an increased space complexity of O(n + k), leading to radix
sorting degenerating into counting sorting.
In summary, these three non-comparison sorting methods (counting sorting, bucket sorting, and radix
sorting) are interconnected. Counting sorting is a specific case of bucket sorting, and radix sorting
degenerates into counting sorting if it uses the minimum number of digits to sort. Bucket sorting is
suitable for evenly distributed elements, counting sorting requires a small difference between maxV
and minV, and radix sorting can only handle positive numbers with maxV and minV being as close as
possible. Therefore, these three sorting methods are only applicable for sorting small amounts of data,
ideally less than 10000.
9 10 2 3 4 5 1 8 7 6
To sort the set of elements, TimSort divides them into runs or partitions. These runs can be seen as
individual computational units. TimSort iterates through the elements and places them in different runs
while merging the runs according to certain rules. This merging continues until only one run remains,
which is the sorted result. To set the appropriate partition size, TimSort sets the minrun parameter. Each
partition cannot have fewer elements than minrun. If a partition has fewer elements than minrun, the
number of runs is expanded to minrun using insertion sort and then merged.
9 10 2 3 4 5 1 8 7 6
172
7.13. THE TIM SORT CHAPTER 7. SORTING
173
7.13. THE TIM SORT CHAPTER 7. SORTING
| | | |
A -> | [xxxxxxxxxxxxx] | A -> | [xxxxxxxxxxxxx] |
| | | |
5: A < B + C; A > C, B merges C 6: A < B + C; C > A, B merges A
| | | |
C -> | [xxxxxx] | C -> | [xxxxxxxxxxx] |
| | | |
B -> | [xxxxxxxx] | B -> | [xxxxxxxxxxxxx] |
| | | |
A -> | [xxxxxxxxxxxxx] | A -> | [xxxxxxxxx] |
| | | |
The figure depicts the merging of three ordered blocks with a minimum run of 3. The characters A,
B, and C represent the run blocks [xx], while the vertical lines represent a temporary stack used to merge
these blocks. To determine whether and how to merge the blocks, the lengths of the three blocks are
compared. The ideal merged state is shown in cases 1 and 3, while the other four merging scenarios are
used to approximate these two cases.
The condition that A > B + C, B > C is necessary to ensure efficient merging. For example,
merging the block [6,7,8] directly with [0,1,2,3,4,9,5] would result in two remaining blocks with a large
difference in length, making insertion sort inefficient. In contrast, when the blocks have similar lengths,
merging them is very fast. Therefore, in this case, the blocks are not merged and are processed separately
before being merged in reverse order at the end: [6,7,8] and [5], followed by [0,1,2,3,4,9] and [5,6,7,8].
0 1 2 3 4 9 6 7 8 5
To simplify the sorting process, a minimum run length is set, below which binary insertion sort is
used. If the run length equals 64, minrun is also set to 64. Otherwise, a value between 32-64 is chosen
n
for minrun such that k = minrun is less than or equal to a power of 2. k represents the number of
remaining blocks after scanning and processing, and their lengths must be sorted from largest to smallest.
This enables merging from the tail, resulting in longer blocks and faster merging, akin to binary search.
Hence, k is required to be less than or equal to a power of 2.
1 2 4 7 8 23 19 16 14 13 12 10 20 18 17 15 11 9 0 5 6 1 3 21 22
2 2 4 7 8 23 19 16 14 13 12 10 20 18 17 15 11 9 0 5 6 1 3 21 22
3 2 4 7 8 23 19 16 14 13 12 10 20 18 17 15 11 9 0 5 6 1 3 21 22
4 2 4 7 8 23 10 12 13 14 16 19 20 18 17 15 11 9 0 5 6 1 3 21 22
5 2 4 7 8 10 12 13 14 16 19 23 20 18 17 15 11 9 0 5 6 1 3 21 22
6 2 4 7 8 10 12 13 14 16 19 23 20 18 17 15 11 9 0 5 6 1 3 21 22
7 2 4 7 8 10 12 13 14 16 19 23 9 11 15 17 18 20 0 5 6 1 3 21 22
8 2 4 7 8 10 12 13 14 16 19 23 9 11 15 17 18 20 0 5 6 1 3 21 22
9 2 4 7 8 10 12 13 14 16 19 23 9 11 15 17 18 20 0 1 3 5 6 21 22
10 2 4 7 8 10 12 13 14 16 19 23 9 11 15 17 18 20 0 1 3 5 6 21 22
11 2 4 7 8 10 12 13 14 16 19 23 9 11 15 17 18 20 0 1 3 5 6 21 22
12 2 4 7 8 10 12 13 14 16 19 23 0 1 3 5 6 9 11 15 17 18 20 21 22
13 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
174
7.13. THE TIM SORT CHAPTER 7. SORTING
To illustrate the block expansion and merging mechanism, diagrams are presented above. For this
example, minrun = 5, and each row represents an operation cycle. The leftmost column indicates the
cycle number, and the square box contains the elements to be sorted.
The TimSort algorithm consists of several rounds of processing to sort a given set of data. In the first
round, the algorithm obtains the data to be sorted from its parameters. In the second round, it searches for
partitions and identifies a partition of length equal to the minimum run value, which is a predetermined
set of values [2,4,7,8,23]. In the third round, it searches for the next partition and finds an inversion
[19,16,14,13,12,10], which is then corrected to be in order as values [10,12,13,14,16,19]in the fourth
round.
In the fifth round, the algorithm compares the length of the current partition to that of the pre-
vious partition and merges them using insertion sort if the current partition is longer. The result is
[2,4,7,8,10,12,13,14,16,19,23], a new partition that is a combination of the two previous partitions. In
the sixth round, it finds the next partition [20,18,17,15,11,9], detects an inversion, and corrects it to be in
order in the seventh round. It then compares its length to the previous partition and continues searching
for the next partition since it is shorter.
In the eighth round, the algorithm finds a new partition [0,5,6] that is shorter than the minimum run
value and expands it using insertion sort. It then compares its length to the previous two partitions and
determines that they satisfy certain conditions, which means that they do not need to be merged. This
process continues until all the partitions are complete, which occurs after ten rounds.
In the eleventh round, the algorithm starts merging the smaller partitions from the end to the begin-
ning([0,1,3,5,6] and [21, 22]). In the twelfth round, it merges the two remaining partitions [9,11,15,17,18,20]
and [0,1,3,5,6,21,22], which are of similar length. Finally, in the thirteenth round, the algorithm performs
the final merge to obtain the sorted output.
To implement TimSort, the process can be broken down into several steps. Firstly, the original data
list and the minimum number of elements required for merging (MIN_MERGE) need to be prepared.
Next, the starting position of each run and its corresponding run can be obtained through the partition
process. After that, since merge sorting involves temporary stacks and processing of two runs, these
data related to sorting tasks can be implemented in a structure. Finally, the actual merging process can
be performed from the small partitions at the end to the beginning, with longer partitions merged with
shorter ones and the partitions of decreasing length for easier merging.
1 // tim_sort_without_gallop.rs
2
3 // The minimum length of a sequence involved in merging,
4 // shorter than which insertion sort is used.
5 const MIN_MERGE: usize = 64;
6
7 // Sorting state structure
8 struct SortState<'a> {
9 list: &'a mut [i32],
10 runs: Vec<Run>, // store all run(s)
11 pos: usize,
12 }
13
14 // Define the Run entity to save the starting index and
15 // interval length of the run in the list.
16 #[derive(Debug, Copy, Clone)]
17 struct Run {
18 pos: usize,
19 len: usize,
20 }
21
175
7.13. THE TIM SORT CHAPTER 7. SORTING
176
7.13. THE TIM SORT CHAPTER 7. SORTING
23
24 pos
25 }
26
27 // Determine whether the relationship between list[i]
28 // and list[i+1] is ascending or descending and
29 // return the inflection point index
30 fn find_run(list: &[i32]) -> (bool, usize) {
31 let len = list.len();
32 if len < 2 {
33 return (false, len);
34 }
35
36 let mut pos = 1;
37 if list[1] < list[0] {
38 // descending: list[i+1] < list[i]
39 while pos < len - 1 && list[pos + 1] < list[pos] {
40 pos += 1;
41 }
42 (true, pos + 1)
43 } else {
44 // ascending: list[i+1] >= list[i]
45 while pos < len - 1 && list[pos + 1] >= list[pos] {
46 pos += 1;
47 }
48 (false, pos + 1)
49 }
50 }
Next, to sort the SortState, you will need to implement a constructor and a sort function. Finally, when
the length of the partitions does not meet the requirements, partition merging needs to be performed using
merge sort.
1 // tim_sort_without_gallop.rs
2
3 impl<'a> SortState<'a> {
4 fn new(list: &'a mut [i32]) -> Self {
5 SortState {
6 list: list,
7 runs: Vec::new(),
8 pos: 0,
9 }
10 }
11
12 fn sort(&mut self) {
13 let len = self.list.len();
14 // calculate the minrun
15 let minrun = calc_minrun(len);
16
17 while self.pos < len {
18 let pos = self.pos;
19 let mut run_len = count_run(self.list
20 .split_at_mut(pos)
177
7.13. THE TIM SORT CHAPTER 7. SORTING
21 .1);
22
23 // check if the remaining number of elements is
24 // less than minrun, if so,
25 // let run_minlen = len - pos
26 let run_minlen = if minrun > len - pos {
27 len - pos
28 } else {
29 minrun
30 };
31
32 // If the run is very short, extend its length
33 // to run_minlen, and the extended run needs to
34 // be sorted, so use binary insertion sort
35 if run_len < run_minlen {
36 run_len = run_minlen;
37 let left = self.list
38 .split_at_mut(pos).1
39 .split_at_mut(run_len).0;
40 binary_insertion_sort(left);
41 }
42
43 // Stack the runs, with each run having a
44 // different length
45 self.runs.push(Run {
46 pos: pos,
47 len: run_len,
48 });
49
50 // Find the next run position
51 self.pos += run_len;
52
53 // Merge runs that do not conform to the
54 // A > B + C and B > C rules
55 self.merge_collapse();
56 }
57
58
59 // Forcefully merge all remaining runs from the top
60 // of the stack until only one run remains,
61 // completing the tim_sort sort
62 self.merge_force_collapse();
63 }
64
65 // Merge runs to satisfy A > B + C and B > C
66 // If A <= B + C, merge B with the shorter of A and C
67 // If only A and B, and A <= B, merge A and B
68 fn merge_collapse(&mut self) {
69 let runs = &mut self.runs;
70 while runs.len() > 1 {
71 let n = runs.len() - 2;
72
178
7.13. THE TIM SORT CHAPTER 7. SORTING
179
7.13. THE TIM SORT CHAPTER 7. SORTING
180
7.13. THE TIM SORT CHAPTER 7. SORTING
181
7.13. THE TIM SORT CHAPTER 7. SORTING
67 } else {
68 self.list[self.dest_pos]
69 = self.temp[self.first_pos];
70 self.first_pos += 1;
71 }
72 self.dest_pos += 1;
73 }
74 }
75 }
76
77 // clear temp stack
78 impl<'a> Drop for MergeLo<'a> {
79 fn drop(&mut self) {
80 unsafe {
81 // put left date of temp into list(high digit)
82 if self.first_pos < self.first_len {
83 for i in 0..(self.first_len - self.first_pos) {
84 self.list[self.dest_pos + i]
85 = self.temp[self.first_pos + i];
86 }
87 }
88
89 // set the lenght of temp to 0
90 self.temp.set_len(0);
91 }
92 }
93 }
94
95 // merge B and C as a new run block
96 fn merge_hi(
97 list: &mut [i32],
98 first_len: usize,
99 second_len: usize)
100 {
101 unsafe {
102 let mut state = MergeHi::new(list,
103 first_len,
104 second_len);
105 state.merge();
106 }
107 }
108
109 impl<'a> MergeHi<'a> {
110 unsafe fn new(
111 list: &'a mut [i32],
112 first_len: usize,
113 second_len: usize) -> Self
114 {
115 let mut ret_val = MergeHi {
116 first_pos: first_len as isize - 1,
117 second_pos: second_len as isize - 1,
118 dest_pos: list.len() as isize - 1,// from the tail
182
7.13. THE TIM SORT CHAPTER 7. SORTING
183
7.13. THE TIM SORT CHAPTER 7. SORTING
171 }
172 }
173 }
Here is the main function of TimSort.
1 // timSort entry point
2
3 fn tim_sort(list: &mut [i32]) {
4 if list.len() < MIN_MERGE {
5 binary_insertion_sort(list);
6 } else {
7 let mut sort_state = SortState::new(list);
8 sort_state.sort();
9 }
10 }
Here is an example of TimSort.
1 fn main() {
2 let mut nums: Vec<i32> = vec![
3 2, 4, 7, 8, 23, 19, 16, 14, 13, 12, 10, 20,
4 18, 17, 15, 11, 9, -1, 5, 6, 1, 3, 21, 40,
5 22, 39, 38, 37, 36, 35, 34, 33, 24, 30, 31, 32,
6 25, 26, 27, 28, 29, 41, 42, 43, 44, 45, 46, 47,
7 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
8 60, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70,
9 61, 62, 63, 64, 65, 66, 67, 68, 69, 95, 94, 93,
10 92, 91, 90, 85, 82, 83, 84, 81, 86, 87, 88, 89,
11 ];
12 tim_sort(&mut nums);
13 println!("sorted nums: {:?}", nums);
14 }
sorted nums: [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
The implemented TimSort algorithm only works for i32 type numbers, but it can be extended to sup-
port sorting algorithms for various types of numbers through the use of generics. Additionally, during
merging, some data may already be sorted, but the implemented TimSort algorithm still goes through
each comparison one by one. However, merging can be accelerated through a strategy called gallop-
ing. The TimSort implementation in this book’s source code in tim_sort_without_gallop.rs is the non-
accelerated version, but an accelerated version of Timsort has also been implemented in tim_sort.rs.
Interested readers can refer to and compare the differences between the two algorithms.
Some readers may wonder how TimSort determines if the data to be sorted is block-sorted originally.
In reality, the algorithm cannot determine this; it is a property of the partially ordered data that the author
discovered. In physics, there is a concept of entropy [13] that refers to the degree of disorder in a physical
184
7.14. SUMMARY CHAPTER 7. SORTING
system. The more disorderly a system is, the greater the entropy, and conversely, the more ordered it
is, the smaller the entropy. In most cases, things have a certain degree of order. For example, humans
are ordered animals in reverse entropy, and only become disordered with death. Another example is the
principle of locality of reference [14] , where when accessing data on a hard drive, the CPU will read the
data around it into memory because it is likely to be accessed next. These two phenomena are natural
laws that are consistent with statistical principles, and Tim wrote TimSort based on these laws. This
illustrates the importance of data structures and how different understandings can lead to very different
algorithms.
7.14 Summary
In this chapter, we learned about ten different types of sorting algorithms. Bubble sort, selection
sort, and insertion sort are O(n2 ) algorithms, while most other sorting algorithms have a complexity of
O(nlog2 (n)). Selection sort is an improvement on bubble sort, shell sort improves on insertion sort, heap
sort improves on selection sort, and quicksort and merge sort both use the divide-and-conquer approach.
All of these sorts are based on comparison, but there are also non-comparison sorts that rely solely
on numerical patterns for sorting, such as bucket sort, counting sort, and radix sort. These sorts have a
complexity of approximately O(n) and are suitable for sorting small amounts of data. Counting sort is
a special case of bucket sort, and radix sort is a multi-round bucket sort that can degrade into counting
sort.
In addition to the basic sorting algorithms, we also learned about improved versions of some algo-
rithms, particularly the TimSort algorithm, which is an efficient and stable hybrid sorting algorithm. Its
improved version is already the default sorting algorithm for many languages and platforms.
The table below summarizes the various sorting algorithms, which readers can compare and under-
stand on their own to deepen their understanding.
185
Chapter 8
Trees
8.1 Objectives
• Understanding trees and their usage.
• Implementing a priority queue using a binary heap.
• Understanding binary search trees and balanced binary trees.
• Implementing binary search trees and balanced binary trees.
a b c d e
A tree is a new data structure that expands upon this linear structure by connecting multiple data
items.
b c
d e
f g
Like a natural tree, this data structure has a root, branches, and leaves, which are interconnected.
Trees are used in various fields of computer science, such as operating systems, graphics, databases, and
computer networks. Throughout the rest of the text, we will refer to this data structure as a ”tree.”
186
8.2. WHAT IS TREE? CHAPTER 8. TREES
Before delving into trees, it’s essential to understand some common tree examples, such as a biolog-
ical classification tree. From this graph, we can see the specific location of entry ’People’ (lower left),
which is very helpful for studying relationships and properties.
This type of tree is hierarchical, with well-defined structures comprising seven levels: kingdom,
phylum, class, order, family, genus, and species. The highest level represents the most abstract concept,
while the lowest level represents the most specific. By starting from the root of the tree and follow-
ing the arrows to the bottom, we can obtain a complete path that indicates the full name of a species.
Organisms can find their place on this tree of life, displaying their relative relationships. For instance,
the complete name of the ”human” species is ”animal kingdom-chordate phylum-mammal class-primate
order-hominidae family-homo genus-homo sapiens species.”
Each node’s set of child nodes is independent of another node’s set of child nodes, which makes the
relationships between nodes clear. This property also means that modifying one node’s child nodes will
not affect other nodes. It is especially useful when trees are used as data storage containers, as it allows
tools to modify data on specific nodes while keeping other data unchanged.
Lastly, each leaf node is unique, and there is only one unique path from the root to each leaf node. This
property makes data storage efficient since the unique path can be used as a storage path. The file system
in our computers is based on an improved tree structure. The file system tree and biological classification
tree have many similarities, and the path from the root directory to any subdirectory uniquely identifies
187
8.2. WHAT IS TREE? CHAPTER 8. TREES
that subdirectory and all files within it. If you’re familiar with Unix-like operating systems, you should
recognize paths like /root, /home/user, and /etc as nodes on a tree, where / is the root node.
Trees can also be used to represent web page files, which are collections of resources that have a
hierarchical structure. For example, the web page data of Google.cn’s search interface shows that the <>
tag elements are also arranged hierarchically.
1 <html lang="zh"><head><meta charset="utf-8">
2 <head>
3 <meta charset="utf-8">
4 <title>Google</title>
5 <style>
6 html { background: #fff; margin: 0 1em; }
7 body { font: .8125em/1.5 arial, sans-serif; }
8 </style>
9 </head>
10 <body>
11 <div>
12 <h1><a href="http://www.google.com.hk/webhp?hl=zh-CN">
13 <strong id="target">google.com.hk</strong></a></h1>
14 <p>Please save our website</p>
15 </div>
16 <ul>
17 <li><a href="http://translate.google.cn/">translate</a></li
>
18 </ul>
19 </body>
20 </html>
In web page files, the HTML root encapsulates all the other elements, as illustrated in the diagram
below. This hierarchical structure allows web pages to be displayed in a consistent manner across dif-
ferent devices and browsers. By using trees to represent web page files, we can efficiently store and
retrieve data and modify individual elements without affecting the entire page’s layout.
188
8.2. WHAT IS TREE? CHAPTER 8. TREES
a
lc rc
b e
lc rc lc rc
c d f g
lc rc lc rc lc rc lc rc
189
8.2. WHAT IS TREE? CHAPTER 8. TREES
b c
d e f
This structure is straightforward and avoids the complexity of nested arrays for accessing elements.
The key now is how to define tree nodes. One feasible method is to use the struct to define a node.
1 use std::cmp::{max, Ordering::*};
2 use std::fmt::{Debug, Display};
3
4 // Binary tree child node link
5 type Link<T> = Option<Box<BinaryTree<T>>>;
6
7 // Binary tree definition
190
8.2. WHAT IS TREE? CHAPTER 8. TREES
191
8.2. WHAT IS TREE? CHAPTER 8. TREES
192
8.2. WHAT IS TREE? CHAPTER 8. TREES
50
51 // calculate the depth of a tree
52 fn depth(&self) -> usize {
53 let mut left_depth = 1;
54 if let Some(left) = &self.left {
55 left_depth += left.depth();
56 }
57
58 let mut right_depth = 1;
59 if let Some(right) = &self.right {
60 right_depth += right.depth();
61 }
62
63 // return the max depth
64 max(left_depth, right_depth)
65 }
66 }
To manipulate the data of a binary tree node, you need to implement methods for accessing the left
and right child nodes, setting and getting the root node value, and modifying node values. Additionally,
methods for determining the existence of a node value and finding the maximum and minimum node
values can also be helpful.
1 // binary_tree.rs
2
3 impl<T: Clone + Ord + ToString + Debug> BinaryTree<T> {
4 // get left subtree
5 fn get_left(&self) -> Link<T> {
6 self.left.clone()
7 }
8
9 fn get_right(&self) -> Link<T> {
10 self.right.clone()
11 }
12
13 // get and set key
14 fn get_key(&self) -> T {
15 self.key.clone()
16 }
17
18 fn set_key(&mut self, key: T) {
19 self.key = key;
20 }
21
22 // find min/max key in the tree
23 fn min(&self) -> Option<&T> {
24 match self.left {
25 None => Some(&self.key),
26 Some(ref node) => node.min(),
27 }
28 }
29
30 fn max(&self) -> Option<&T> {
193
8.2. WHAT IS TREE? CHAPTER 8. TREES
31 match self.right {
32 None => Some(&self.key),
33 Some(ref node) => node.max(),
34 }
35 }
36
37 // determine whether a key is in the tree
38 fn contains(&self, key: &T) -> bool {
39 match &self.key.cmp(key) {
40 Equal => true,
41 Greater => {
42 match &self.left {
43 Some(left) => left.contains(key),
44 None => false,
45 }
46 },
47 Less => {
48 match &self.right {
49 Some(right) => right.contains(key),
50 None => false,
51 }
52 },
53 }
54 }
55 }
1 *
2 3
To store data in the tree, we follow certain rules based on the tree’s structure. These rules are as
follows:
• If the current symbol is (, a new node is added as the left child node, and we descend to that node.
• If the current symbol is one of ”+”, ”-”, ”*”, or ”/”, we set the root value to that symbol, add a new
right child node, and descend to the right child node.
• If the current symbol is a number, we set the root value to that number and return to the parent
node.
• If the current symbol is ), we return to the parent node of the current node.
194
8.2. WHAT IS TREE? CHAPTER 8. TREES
To convert the mathematical expression (1 + (2 * 3)) into the tree shown in the figure, we can use
the data storage rules defined for the tree. The specific steps are as follows:
(1) Create the root node.
(2) Read the symbol (, create a new left child node, and descend to the node.
(3) Read the symbol 1, set the node value to 1, and return to the parent node.
(4) Read the symbol +, set the node value to +, create a new right child node, and descend to the
node.
(5) Read the symbol (, create a new left child node, and descend to the node.
(6) Read the symbol 2, set the node value to 2, and return to the parent node.
(7) Read the symbol *, set the node value to *, create a new right child node, and descend to the
node.
(8) Read the symbol 3, set the node value to 3, and return to the parent node.
(9) Read the symbol ), return to the parent node.
Using a tree allows for the storage of the arithmetic expression while maintaining the structural in-
formation of the data. In fact, programming languages use trees to store all code during compilation and
generate abstract syntax trees. By analyzing the functionality of each part of the syntax tree, intermedi-
ate code is generated, optimized, and then final code is generated. If you understand the principles of
compilation, this should be familiar to you.
Pre-order traversal is the first method, and it starts from the root node, then moves to the left subtree,
and finally the right subtree. To traverse a tree algorithmically using pre-order traversal, recursively call
the pre-order traversal method on the left subtree from the root node. For example, in the book tree
described above, the pre-order traversal algorithm visits the table of contents first, followed by chapter
1, section 1, section 2, and so on. Once chapter 1 is finished, the algorithm returns to the table of contents
and recursively calls pre-order traversal on chapter 2, visiting chapter 2, section 1, section 2, and so on.
195
8.2. WHAT IS TREE? CHAPTER 8. TREES
The pre-order traversal algorithm is simple and can be implemented as an internal method or an
external function.
1 // binary_tree.rs
2
3 impl<T: Clone + Ord + ToString + Debug> BinaryTree<T> {
4 fn preorder(&self) {
5 println!("key: {:?}", &self.key);
6 match &self.left {
7 Some(node) => node.preorder(),
8 None => (),
9 }
10 match &self.right {
11 Some(node) => node.preorder(),
12 None => (),
13 }
14 }
15 }
16
17 // pre-order: implemented externally [by recursion]
18 fn preorder<T: Clone + Ord + ToString + Debug>(bt: Link<T>) {
19 if !bt.is_none() {
20 println!("key: {:?}", bt.as_ref().unwrap().get_key());
21 preorder(bt.as_ref().unwrap().get_left());
22 preorder(bt.as_ref().unwrap().get_right());
23 }
24 }
The post-order traversal starts from the left subtree, followed by the right subtree, and finally the root
node.
1 // binary_tree.rs
2
3 impl<T: Clone + Ord + ToString + Debug> BinaryTree<T> {
4 fn postorder(&self) {
5 match &self.left {
6 Some(node) => node.postorder(),
7 None => (),
8 }
9 match &self.right {
10 Some(node) => node.postorder(),
11 None => (),
12 }
13 println!("key: {:?}", &self.key);
14 }
15 }
16
17 // post-order: implemented externally [by recursion]
18 fn postorder<T: Clone + Ord + ToString + Debug>(bt: Link<T>) {
19 if !bt.is_none() {
20 postorder(bt.as_ref().unwrap().get_left());
21 postorder(bt.as_ref().unwrap().get_right());
22 println!("key: {:?}", bt.as_ref().unwrap().get_key());
196
8.2. WHAT IS TREE? CHAPTER 8. TREES
23 }
24 }
The in-order traversal starts from the left subtree, followed by the root node, and finally the right
subtree.
1 // binary_tree.rs
2
3 impl<T: Clone + Ord + ToString + Debug> BinaryTree<T> {
4 fn inorder(&self) {
5 if self.left.is_some() {
6 self.left.as_ref().unwrap().inorder();
7 }
8 println!("key: {:?}", &self.key);
9 if self.right.is_some() {
10 self.right.as_ref().unwrap().inorder();
11 }
12 }
13 }
14
15 // in-order: implemented externally [by recursion]
16 fn inorder<T: Clone + Ord + ToString + Debug>(bt: Link<T>) {
17 if !bt.is_none() {
18 inorder(bt.as_ref().unwrap().get_left());
19 println!("key: {:?}", bt.as_ref().unwrap().get_key());
20 inorder(bt.as_ref().unwrap().get_right());
21 }
22 }
To evaluate an arithmetic expression like (1 + (2 * 3)) that is stored in a tree, we need to retrieve
the operator and operands in the correct order. While pre-order traversal starts as much as possible from
the root node, it is not suitable for computing expressions as we need to start from the leaf nodes. Post-
order traversal, on the other hand, is the correct method to retrieve data as it starts from the left subtree,
followed by the right subtree, and finally the root node. By first retrieving the values of the left and right
child nodes, then retrieving the operator of the root node, we can perform one operation and save the
result in the position of the operator, and then continue post-order traversal by visiting the right child
node until we calculate the final value.
Additionally, in-order traversal can be used on the tree that saves the arithmetic expression (1 + (2
* 3)) to retrieve the original expression 1 + 2 * 3. However, since the tree does not save parentheses,
the recovered expression is only in the correct order, but the priority may not be correct. To include
parentheses in the output, we can modify the in-order traversal.
1 // binary_tree.rs
2
3 impl<T: Clone + Ord + ToString + Debug> BinaryTree<T> {
4 // form a expression: [internal implementation]
5 fn iexp(&self) -> String {
6 let mut exp = "".to_string();
7
8 exp += "(";
9 let exp_left = match &self.left {
10 Some(left) => left.iexp(),
11 None => "".to_string(),
197
8.2. WHAT IS TREE? CHAPTER 8. TREES
12 };
13 exp += &exp_left;
14
15 exp += &self.get_key().to_string();
16
17 let exp_right = match &self.right {
18 Some(right) => right.iexp(),
19 None => "".to_string(),
20 };
21 exp += &exp_right;
22 exp += ")";
23
24 exp
25 }
26 }
27
28 // form a expression: [external implementation]
29 fn oexp<T>(bt: Link<T>) -> String
30 where: T: Clone + Ord + ToString + Debug + Display
31 {
32 let mut exp = "".to_string();
33 if !bt.is_none() {
34 exp = "(".to_string() +
35 &oexp(bt.as_ref().unwrap().get_left());
36 exp += &bt.as_ref().unwrap().get_key().to_string();
37 exp += &(oexp(bt.as_ref().unwrap().get_right()) + ")");
38 }
39
40 exp
41 }
In addition to the three traversal methods discussed earlier, there is another method called level-order
traversal that visits the nodes layer by layer. As we have previously implemented the queue data structure
required for level-order traversal, we can directly use it to implement this traversal method.
1 // binary_tree.rs
2
3 impl<T: Clone + Ord + ToString + Debug> BinaryTree<T> {
4 fn levelorder(&self) {
5 let size = self.size();
6 let mut q = Queue::new(size);
7
8 // enqueue the root node
9 let _r = q.enqueue(Box::new(self.clone()));
10 while !q.is_empty() {
11 // dequeue the first node, and output its value
12 let front = q.dequeue().unwrap();
13 println!("key: {:?}", front.get_key());
14
15 // enqueue the left/right child node
16 match front.get_left() {
17 Some(left) => {
18 let _r = q.enqueue(left);
198
8.2. WHAT IS TREE? CHAPTER 8. TREES
19 },
20 None => {},
21 }
22
23 match front.get_right() {
24 Some(right) => {
25 let _r = q.enqueue(right);
26 },
27 None => {},
28 }
29 }
30 }
31 }
32
33 // level-order: implemented externally [by recursion]
34 fn levelorder<T: Clone + Ord + ToString + Debug>(bt: Link<T>) {
35 if bt.is_none() { return; }
36
37 let size = bt.as_ref().unwrap().size();
38 let mut q = Queue::new(size);
39
40 let _r = q.enqueue(bt.as_ref().unwrap().clone());
41 while !q.is_empty() {
42 // dequeue the first node, and print its value
43 let front = q.dequeue().unwrap();
44 println!("key: {:?}", front.get_key());
45
46 match front.get_left() {
47 Some(left) => {
48 let _r = q.enqueue(left);
49 },
50 None => {},
51 }
52
53 match front.get_right() {
54 Some(right) => {
55 let _r = q.enqueue(right);
56 },
57 None => {},
58 }
59 }
60 }
Here are some use examples of binary trees.
1 // binary_tree.rs
2
3 fn main() {
4 basic();
5 order();
6
7 fn basic() {
8 let mut bt = BinaryTree::new(10usize);
199
8.2. WHAT IS TREE? CHAPTER 8. TREES
9
10 let root = bt.get_key();
11 println!("root key: {:?}", root);
12
13 bt.set_key(11usize);
14 let root = bt.get_key();
15 println!("root key: {:?}", root);
16
17 bt.insert_left_tree(2usize);
18 bt.insert_right_tree(18usize);
19
20 println!("left child: {:#?}", bt.get_left());
21 println!("right child: {:#?}", bt.get_right());
22
23 println!("min key: {:?}", bt.min().unwrap());
24 println!("max key: {:?}", bt.max().unwrap());
25
26 println!("tree nodes: {}", bt.size());
27 println!("tree leaves: {}", bt.leaf_size());
28 println!("tree internals: {}", bt.none_leaf_size());
29 println!("tree depth: {}", bt.depth());
30 println!("tree contains '2': {}", bt.contains(&2));
31 }
32
33 fn order() {
34 let mut bt = BinaryTree::new(10usize);
35 bt.insert_left_tree(2usize);
36 bt.insert_right_tree(18usize);
37
38 println!("internal pre-in-post-level order");
39 bt.preorder();
40 bt.inorder();
41 bt.postorder();
42 bt.levelorder();
43
44 let nk = Some(Box::new(bt.clone()));
45 println!("outside pre-in-post-level order");
46 preorder(nk.clone());
47 inorder(nk.clone());
48 postorder(nk.clone());
49 levelorder(nk.clone());
50
51 println!("internal exp: {}", bt.iexp);
52 println!("outside exp: {}", oexp(nk));
53 }
54 }
root key: 10
root key: 11
left child: Some(
BinaryTree {
key: 2,
200
8.2. WHAT IS TREE? CHAPTER 8. TREES
left: None,
right: None,
},
)
right child: Some(
BinaryTree {
key: 18,
left: None,
right: None,
},
)
min key: 2
max key: 18
tree nodes: 3
tree leaves: 2
tree internals: 1
tree depth: 2
tree contains '2': true
internal pre-in-post-level order:
key: 10
key: 2
key: 18
key: 2
key: 10
key: 18
key: 2
key: 18
key: 10
key: 10
key: 2
key: 18
outside pre-in-post-level order:
key: 10
key: 2
key: 18
key: 2
key: 10
key: 18
key: 2
key: 18
key: 10
key: 10
key: 2
key: 18
internal exp: ((2)10(18))
outside exp: ((2)10(18))
To simplify the descriptions of the three traversal orders, we can use abbreviations. Pre-order traver-
sal can be abbreviated as ”rt-l-r” to indicate that we first visit the root, then the left subtree, and finally the
right subtree. Combining the three traversal orders, we have ”rt-l-r” for pre-order traversal, ”l-rt-r” for
in-order traversal, and ”l-r-rt” for post-order traversal. It’s worth noting that there can also be a ”rt-r-l”
traversal order, but this is simply the mirror image of pre-order traversal. Since left and right are relative,
201
8.3. BINARY HEAP CHAPTER 8. TREES
”l-rt-r” and ”r-rt-l” can be regarded as mirror images of each other. Similarly, ”r-l-rt” is the mirror image
of post-order traversal, and ”r-rt-l” is the mirror image of in-order traversal. The table below summarizes
the traversal methods for easy reference:
202
8.3. BINARY HEAP CHAPTER 8. TREES
If a binary heap ’h’ has already been created as a priority queue, the table below shows the results of
the heap operations performed, with the top item of the heap on the right. The priority value of an item
is its value, with smaller values being higher priority and therefore appearing on the right.
9 11
14 18 19 21
33 17 27
To illustrate, let’s consider a heap [0,5,9,11,14,18,19,21,33,17,27] stored using a Vec, and the cor-
responding tree structure displayed above. As the parent and child nodes are stored in a linear data
structure, their relationship is straightforward to compute. If a node is at index p, then its left child node
is at 2p, and its right child node is at 2p + 1. Here, p starts from index 1, and index 0 is not used for data,
so it is set to 0 as a placeholder.
0 5 9 11 14 18 19 21 33 17 27
0 1 2 3 4 5 6 7 8 9 10
For instance, the index of 5 is p = 1, and its left child node is at index 2 ∗ p = 2, where the value is 9.
Therefore, 9 is the left child node of 5 in the tree structure. Similarly, the parent node of any child node
is located at index p/2. For example, if p = 2 for 9, then its parent node is at index 2/2=1, and if p = 3
203
8.3. BINARY HEAP CHAPTER 8. TREES
for the right child node 11, then its parent node is at index 3/2 = 1 (the division result is rounded down).
Hence, computing the parent node of any child node only requires the expression p/2, while child nodes
are computed using 2p and 2p+1. We previously defined macros for computing parent and child node
indices, and we will continue to use them for calculation purposes.
1 // binary_heap.rs
2
3 // calculate parent node index
4 macro_rules! parent {
5 ($child:ident) => {
6 $child >> 1
7 };
8 }
9
10 // calculate left child node index
11 macro_rules! left_child {
12 ($parent:ident) => {
13 $parent << 1
14 };
15 }
16
17 // calculate right child node index
18 macro_rules! right_child {
19 ($parent:ident) => {
20 ($parent << 1) + 1
21 };
22 }
To begin with, the binary heap is defined as a data structure that includes a field representing the size
of the heap. The size field does not include the first data item 0, which is considered a placeholder. The
data saved in the heap is assumed to be i32. When initializing the heap, data is present at index 0, but
the size is set to 0.
1 // binary_heap.rs
2
3 // Binary heap definision
4 // Implement Debug and Clone Trait
5 #[derive(Debug, Clone)]
6 struct BinaryHeap {
7 size: usize, // data count
8 data: Vec<i32>, // data container
9 }
10
11 impl BinaryHeap {
12 fn new() -> Self {
13 BinaryHeap {
14 size: 0,
15 data: vec![0] // first 0 not count in total
16 }
17 }
18
19 fn size(&self) -> usize {
20 self.size
204
8.3. BINARY HEAP CHAPTER 8. TREES
21 }
22
23 fn is_empty(&self) -> bool {
24 0 == self.size
25 }
26
27 // Get the minimum data in the heap
28 fn min(&self) -> Option<i32> {
29 if 0 == self.size {
30 None
31 } else {
32 // Some(self.data[1].clone());
33 // clone used for genric type
34 Some(self.data[1])
35 }
36 }
37 }
When adding data to the heap, adding it to the end of the heap will disrupt the balance, so the data
needs to be moved up to maintain balance.
9 11
14 18 19 21
33 17 27 7
9 11
14 7 19 21
33 17 27 18
7 11
14 9 19 21
33 17 27 18
205
8.3. BINARY HEAP CHAPTER 8. TREES
As each data item is added, the size of the heap is increased, and the data is moved up if necessary
to maintain balance.
1 // binary_heap.rs
2
3 impl BinaryHeap {
4 // Add a data to the end and adjust the heap
5 fn push(&mut self, val: i32) {
6 self.data.push(val);
7 self.size += 1;
8 self.move_up(self.size);
9 }
10
11 // little data move up.
12 // c(child, current), p(parent)
13 fn move_up(&mut self, mut c: usize) {
14 loop {
15 // calculate the parent index of current node
16 let p = parent!(c);
17 if p <= 0 {
18 break;
19 }
20
21 // If the current node's data is smaller than
22 // the parent node's data, swap them
23 if self.data[c] < self.data[p] {
24 self.data.swap(c, p);
25 }
26
27 // The parent node becomes the current node
28 c = p;
29 }
30 }
31 }
To retrieve the minimum value from the heap, three cases need to be considered: when there is no
data in the heap, return None; when there is only one data item, pop it directly; when there are multiple
data items, swap the top and end data of the heap, adjust the heap, and then return the minimum value at
the end. The move_down function is used to move elements down to maintain balance, and the min_child
function is used to find the minimum child node.
1 // binary_heap.rs
2
3 impl BinaryHeap {
4 // pop out the top value
5 fn pop(&mut self) -> Option<i32> {
6 if 0 == self.size {
7 // no data, return None
8 None
9 } else if 1 == self.size {
10 self.size -= 1;
11
12 self.data.pop()
206
8.3. BINARY HEAP CHAPTER 8. TREES
13 } else {
14 // swap data and then adjust the heap
15 self.data.swap(1, self.size);
16 let val = self.data.pop();
17 self.size -= 1;
18 self.move_down(1);
19
20 val
21 }
22 }
23
24 // bigger data move down
25 fn move_down(&mut self, mut c: usize) {
26 loop {
27 let lc = left_child!(c);
28 if lc > self.size { break; }
29
30 // the index of mininum child node of current node
31 let mc = self.min_child(c);
32 if self.data[c] > self.data[mc] {
33 self.data.swap(c, mc);
34 }
35
36 // the mininum child node becomes current node
37 c = mc;
38 }
39 }
40
41 // Calculate the index of the minimum child node
42 fn min_child(&self, c: usize) -> usize {
43 let (lc, rc) = (left_child!(c), right_child!(c));
44
45 if rc > self.size {
46 // right child node is out of range,
47 // left child node is the minimum child node
48 lc
49 } else if self.data[lc] < self.data[rc] {
50 // left child node is smaller than right child node
51 lc
52 } else {
53 // right child node is smaller than left child node
54 rc
55 }
56 }
57 }
The process of deleting the minimum element from the heap involves taking out the element at the
top of the heap and moving the last element to the top. The top element is not the minimum value at
this point and does not meet the heap definition, so the heap needs to be rebuilt. To rebuild the heap,
the top element needs to be moved down using the move_down function. This function uses a macro to
calculate whether to swap with the left or right child node based on the node index, eventually swapping
the top element with its minimum child node until the heap is restored.
207
8.3. BINARY HEAP CHAPTER 8. TREES
5 18
7 11
14 9 19 21
33 17 27
5 7
9 11
14 18 19 21
33 17 27
Finally, data can be added to the heap in batches. For instance, a slice of data [0,5,4,3,1,2] can be
added to the heap at once to avoid frequent calls to the push function. The original data in the heap can
either be kept unchanged while adding the slice data one by one or deleted before adding the slice data.
The latter approach results in a new heap that includes only the data in the slice.
5 5 1
4 3 1 3 5 3
1 2 4 2 4 2
To implement the binary min heap, we define the build_new and build_add functions, as shown
below.
1 // binary_heap.rs
2
3 impl BinaryHeap {
4 // build a new heap
5 fn build_new(&mut self, arr: &[i32]) {
6 // delete all data
7 for _i in 0..self.size {
8 let _rm = self.data.pop();
9 }
10
11 // add new data
12 for &val in arr {
208
8.3. BINARY HEAP CHAPTER 8. TREES
13 self.data.push(val);
14 }
15
16 // change the size
17 self.size = arr.len();
18
19 // adjust the heap to make it a min-heap
20 let size = self.size;
21 let mut p = parent!(size);
22 while p > 0 {
23 self.move_down(p);
24 p -= 1;
25 }
26 }
27
28 // add slice data one by one
29 fn build_add(&mut self, arr: &[i32]) {
30 for &val in arr {
31 self.push(val);
32 }
33 }
34 }
With these functions, we have completed the construction of a binary min heap. The entire process
should be easy to understand, and based on this, one can write the code for a binary max heap as well.
Here’s an example of using a binary heap.
1 // binary_heap.rs
2
3 fn main() {
4 let mut bh = BinaryHeap::new();
5 let nums = [-1,0,2,3,4];
6 bh.push(10); bh.push(9);
7 bh.push(8); bh.push(7); bh.push(6);
8
9 bh.build_add(&nums);
10 println!("empty: {:?}", bh.is_empty());
11 println!("min: {:?}", bh.min());
12 println!("pop min: {:?}", bh.pop());
13
14 bh.build_new(&nums);
15 println!("size: {:?}", bh.len());
16 println!("pop min: {:?}", bh.pop());
17 }
The following are the outputs after execution.
empty: false
min: Some(-1)
size: 10
pop min: Some(-1)
size: 5
pop min: Some(-1)
209
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
210
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
70
31 93
14 73 94
23
In the example above, 70 is the root node, 31 is smaller and becomes the left node, and 93 is larger
and becomes the right node. Then 14 is inserted, which is smaller than 70 and descends to 31, and as
it is smaller than 31, it becomes the left node of 31. Similar steps are taken to insert other data, and
finally, a binary search tree is formed. The inorder traversal of the tree is [14, 23, 31, 70, 73, 93, 94],
which is sorted from small to large, so the binary search tree can also be used to sort data. Using inorder
traversal, we obtain the ascending order sorting result, and using the mirrored inorder traversal, i.e., the
”right-root-left” order traversal, we obtain the descending order sorting result.
To implement the binary search tree, we define it as a struct BST (BinarySearchTree), which includes
key values and links to left and right child nodes. Following the definition of the abstract data type, the
implementation of the binary search tree is shown below.
1 // bst.rs
2 use std::cmp::{max, Ordering::*};
3 use std::fmt::Debug;
4
5 // Binary search tree node link
6 type Link<T,U> = Option<Box<BST<T,U>>>;
7
8 // Definition of binary search tree
9 #[derive(Debug,Clone)]
10 struct BST<T,U> {
11 key: Option<T>,
12 val: Option<U>,
13 left: Link<T,U>,
14 right: Link<T,U>,
15 }
16
17 impl<T,U> BST<T,U>
18 where T: Copy + Ord + Debug,
19 U: Copy + Debug
20 {
21 fn new() -> Self {
22 Self {
23 key: None,
211
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
24 val: None,
25 left: None,
26 right: None,
27 }
28 }
29
30 fn is_empty(&self) -> bool {
31 self.key.is_none()
32 }
33
34 fn size(&self) -> usize {
35 self.calc_size(0)
36 }
37
38 // Recursively count the number of nodes
39 fn calc_size(&self, mut size: usize) -> usize {
40 if self.key.is_none() { return size; }
41
42 // Add current node count to total node count 'size'
43 size += 1;
44 // Count left and right child nodes
45 if !self.left.is_none() {
46 size = self.left.as_ref().unwrap().calc_size(size);
47 }
48 if !self.right.is_none() {
49 size = self.right
50 .as_ref().unwrap().calc_size(size);
51 }
52
53 size
54 }
55
56 // Count leaf nodes
57 fn leaf_size(&self) -> usize {
58 // If both left and right are empty,
59 // current node is a leaf node, return 1
60 if self.left.is_none() && self.right.is_none() {
61 return 1;
62 }
63
64 // Count leaf nodes in left and right subtree
65 let left_leaf = match &self.left {
66 Some(left) => left.leaf_size(),
67 None => 0,
68 };
69 let right_leaf = match &self.right {
70 Some(right) => right.leaf_size(),
71 None => 0,
72 };
73
74 // total sum of leaf nodes
75 left_leaf + right_leaf
212
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
76 }
77
78 // Count non-leaf nodes
79 fn none_leaf_size(&self) -> usize {
80 self.size() - self.leaf_size()
81 }
82
83 // Calculate tree depth
84 fn depth(&self) -> usize {
85 let mut left_depth = 1;
86 if let Some(left) = &self.left {
87 left_depth += left.depth();
88 }
89
90 let mut right_depth = 1;
91 if let Some(right) = &self.right {
92 right_depth += right.depth();
93 }
94
95 max(left_depth, right_depth)
96 }
97
98 fn insert(&mut self, key: T, val: U) {
99 // If no data, insert directly
100 if self.key.is_none() {
101 self.key = Some(key);
102 self.val = Some(val);
103 } else {
104 match &self.key {
105 Some(k) => {
106 // If key exists, update val
107 if key == *k {
108 self.val = Some(val);
109 return;
110 }
111
112 // If no same key found,
113 // Find the subtree to insert new node
114 let child = if key < *k {
115 &mut self.left
116 } else {
117 &mut self.right
118 };
119
120 // Recursively go down the tree
121 // until insertion
122 match child {
123 Some(ref mut node) => {
124 node.insert(key, val);
125 },
126 None => {
127 let mut node = BST::new();
213
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
214
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
215
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
216
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
63 }
64 }
65
66 // External implementation
67 fn preorder<T, U>(bst: Link<T,U>)
68 where T: Copy + Ord + Debug,
69 U: Copy + Debug
70 {
71 if !bst.is_none() {
72 println!("key: {:?}, val: {:?}",
73 bst.as_ref().unwrap().key.unwrap(),
74 bst.as_ref().unwrap().val.unwrap());
75 preorder(bst.as_ref().unwrap().get_left());
76 preorder(bst.as_ref().unwrap().get_right());
77 }
78 }
79
80 fn inorder<T, U>(bst: Link<T,U>)
81 where T: Copy + Ord + Debug,
82 U: Copy + Debug
83 {
84 if !bst.is_none() {
85 inorder(bst.as_ref().unwrap().get_left());
86 println!("key: {:?}, val: {:?}",
87 bst.as_ref().unwrap().key.unwrap(),
88 bst.as_ref().unwrap().val.unwrap());
89 inorder(bst.as_ref().unwrap().get_right());
90 }
91 }
92
93 fn postorder<T, U>(bst: Link<T,U>)
94 where T: Copy + Ord + Debug,
95 U: Copy + Debug
96 {
97 if !bst.is_none() {
98 postorder(bst.as_ref().unwrap().get_left());
99 postorder(bst.as_ref().unwrap().get_right());
100 println!("key: {:?}, val: {:?}",
101 bst.as_ref().unwrap().key.unwrap(),
102 bst.as_ref().unwrap().val.unwrap());
103 }
104 }
105
106 fn levelorder<T, U>(bst: Link<T,U>)
107 where T: Copy + Ord + Debug,
108 U: Copy + Debug
109 {
110 if bst.is_none() { return; }
111
112 let size = bst.as_ref().unwrap().size();
113 let mut q = Queue::new(size);
114 let _r = q.enqueue(bst.as_ref().unwrap().clone());
217
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
218
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
36 bst.inorder();
37 bst.preorder();
38 bst.postorder();
39 bst.levelorder();
40 println!("outside inorder, preorder, postorder: ");
41 let nk = Some(Box::new(bst.clone()));
42 inorder(nk.clone());
43 preorder(nk.clone());
44 postorder(nk.clone());
45 levelorder(nk.clone());
46 }
47 }
The following are outputs after execution.
bst is empty: false
bst size: 8
bst leaves: 4
bst internals: 4
bst depth: 4
min key: Some(4), min val: Some('a')
max key: Some(11), max val: Some('h')
bst contains 5: true
key: 5, val: 'b'
internal inorder, preorder, postorder:
key: 4, val: 'a'
key: 5, val: 'b'
key: 6, val: 'c'
key: 7, val: 'd'
key: 8, val: 'e'
key: 9, val: 'f'
key: 10, val: 'g'
key: 11, val: 'h'
key: 8, val: 'e'
key: 6, val: 'c'
key: 5, val: 'b'
key: 4, val: 'a'
key: 7, val: 'd'
key: 10, val: 'g'
key: 9, val: 'f'
key: 11, val: 'h'
key: 4, val: 'a'
key: 5, val: 'b'
key: 7, val: 'd'
key: 6, val: 'c'
key: 9, val: 'f'
key: 11, val: 'h'
key: 10, val: 'g'
key: 8, val: 'e'
key: 8, val: 'e'
key: 6, val: 'c'
key: 10, val: 'g'
key: 5, val: 'b'
219
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
70
31 93
14 73 94
23 76
220
8.4. BINARY SEARCH TREE CHAPTER 8. TREES
Deleting a node in a binary search tree is a complex operation. The first step is to find the node to be
deleted, which may not exist in the tree. Once the node is found, we need to determine if it has children,
which can be one of three cases: no children, one child, or two children.
If the node is a leaf node with no children, we can simply remove its reference from its parent node.
If it has one child, we modify the parent node’s reference to point directly to the child node. The most
challenging case is when the node has two children. In this case, we find the minimum node in the
right subtree, called the successor node, and replace the node to be deleted with the successor node. The
successor node may also have children, and we must adjust their relationships accordingly. The specific
situations are shown in the following figure, where the dashed box represents the node k to be deleted,
and the binary tree obtained after deleting the node is shown on the right.
70 70
leaf node
31 93 31 93
14 73 94 14 73 94
23 76 23
70 70
14 73 94 23 73 94
23 76
70 70
14 73 94 14 73 94
23 76 23
Deleting a leaf node is the simplest case, while deleting an internal node with one child is straight-
forward. The most difficult case is when the node has two children, as this requires adjusting the rela-
tionships of multiple nodes.
221
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
20 + 21 + 2i ... + 2h = n
(8.1)
h = log2 (n)
Using the binary tree properties, we can approximate the maximum path length to be h = log2 (n).
Thus, the time complexity of contains() is O(log2 (n)). Insertion, deletion, and modification are based
on search because the element must be located before processing can continue. Their time complexity
depends only on search and can be completed in constant time. Therefore, the performance of contains(),
insert(), and get() methods are all O(log2 (n)). The height h of the binary tree is the limiting factor of
their performance. If the inserted data is always in a sorted state, the binary tree may degenerate into a
linear linked list, and the performance of contains(), insert(), and remove() methods will be O(n). The
figure below illustrates this scenario.
10
20
30
40
50
In this section, we did not implement the remove() function because it is not part of the abstract data
type definition, and binary search trees are primarily used for data insertion and search, not deletion.
However, interested readers can try to implement the remove() function as an exercise.
To improve performance by reducing the tree height, we can convert the binary tree into a multi-way
tree such as B-trees and B+ trees. These trees have many child nodes, resulting in a short tree height
and fast queries. They are commonly used in implementing databases and file systems. For instance,
MySQL database uses B+ trees to store data, with nodes of 16K memory pages. If each data is 1k in size,
one node can store 16 pieces of data. When used to store an index with a bigint key, 8 bytes are used, and
the index is 6 bytes, for a total of 14 bytes. Thus, one node can store approximately 16∗1024/14 = 1170
indexes. With a height of 3, a B+ tree can store around 1170 ∗ 1170 ∗ 16 = 21902400 indexes, which can
hold about 20 million pieces of data. Retrieving data only requires at most two queries, which explains
why database queries are fast. Readers interested in this topic can read MySQL-related books to learn
more.
222
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
a balanced binary search tree that can automatically maintain balance. It is named after its inventors:
G.M. Adelson-Velskii and E.M.Land.
The AVL tree is also an ordinary binary search tree, but with a difference in the way the tree operations
are performed. AVL trees use a balance factor to determine if the tree is balanced during operations. The
balance factor is the difference in height between the left and right subtrees of a node, and it is defined
as:
balanceF actor = height(lef tSubT ree) − height(rightSubT ree) (8.2)
Given this balance factor definition, if the balance factor is greater than zero, then the left subtree is
heavy. If it is less than zero, then the right subtree is heavy, and if it is zero, then the tree is balanced. To
implement an AVL tree efficiently, balance factors of -1, 0, and 1 are all considered balanced because
the difference in height between the left and right subtrees is only 1 in these cases, which is essentially
balanced. Once a node’s balance factor is outside this range, such as 2 or -2, the tree needs to be rotated
to maintain balance. The following figure illustrates the case of unbalanced left and right subtrees, and
each node’s balance factor is indicated by its value.
-2
0 -1
0 -1
0 1 1 1
0 1 0 1 1
0 1 0 0
By analyzing the total number of nodes in the tree, we can derive a formula for the number of nodes
in a tree of height h, as follows: For a tree of height 0, there is only one node; for a tree of height 1, there
are 2 nodes; for a tree of height 2, there are 4 nodes; for a tree of height 3, there are 7 nodes, and so on.
Remarkably, this formula resembles the Fibonacci sequence. With the number of nodes in the tree,
we can obtain the height formula for an AVL tree using the Fibonacci equation, where the ith Fibonacci
223
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
F0 = 0
F1 = 1 (8.4)
Fi = Fi−1 + Fi−2
To calculate the number of nodes in an AVL tree, we can use the following formula where F0 = 1.
Nh = Fh+2 − 1 (8.5)
√
As i increases, the ratio Fi /Fi−1 approaches the golden ratio Φ = (1 + 5)/2, so Φ can be used to
i
represent Fi , which can be calculated as Fi = Φ5 .
Φh
Nh = √ + 1 (8.6)
5
Based on these formulas, we can derive a height formula for AVL trees.
Φh
log(Nh − 1) = log( √ )
5
1
log(Nh − 1) = hlogΦ − log5
2 (8.7)
log(Nh − 1) + 12 log5
h=
logΦ
h = 1.44log(Nh )
In this formula, h is the height of the AVL tree, and Nh is the number of nodes. This implies that
the height of an AVL tree is at most 1.44 times the logarithm of the number of nodes, and the search
complexity is O(logN ), which is highly efficient.
224
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
13 key: T,
14 left: AvlTree<T>, // left subtree
15 right: AvlTree<T>, // right subtree
16 bfactor: i8, // balance factor
17 }
To implement the AVL tree, we need to add two functions: insert and rebalance. To compare node
data, we also introduce a comparison property called Ording. Additionally, we need the replace and max
functions to update values and calculate the tree height.
1 // avl.rs
2
3 use std::cmp::{max, Ordering::*};
4 use std::fmt::Debug;
5 use std::mem::replace;
6 use AvlTree::*;
7
8 impl<T> AvlTree<T> where T : Clone + Ord + Debug {
9 // new tree is Empty
10 fn new() -> AvlTree<T> {
11 Null
12 }
13
14 fn insert(&mut self, key: T) -> (bool, bool) {
15 let ret = match self {
16 Null => {
17 // If there is no node, insert directly
18 let node = AvlNode {
19 key: key,
20 left: Null,
21 right: Null,
22 bfactor: 0,
23 };
24 *self = Tree(Box::new(node));
25
26 (true, true)
27 },
28 Tree(ref mut node) => match node.key.cmp(&key) {
29 // Compare the value of the node and determine
30 // which side to insert from
31 // inserted: whether insertion is performed
32 // deepened: whether the depth is increased
33
34 // If they are equal, no insertion is needed
35 Equal => (false, false),
36 // node value is smaller, insert to the right
37 Less => {
38 let (inserted, deepened)
39 = node.right.insert(key);
40 if deepened {
41 let ret = match node.bfactor {
42 -1 => (inserted, false),
43 0 => (inserted, true),
225
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
226
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
96 (-1,1)
97 };
98
99 // Rotate and update the balance factor
100 self.rotate_right();
101 self.node().right.node().bfactor = a;
102 self.node().bfactor = b;
103 } else if lbf == 1 {
104 let (a, b) = match self.node()
105 .left.node()
106 .right.node()
107 .bfactor
108 {
109 -1 => (1, 0),
110 0 => (0, 0),
111 1 => (0,-1),
112 _ => unreachable!(),
113 };
114
115 // First rotate left, then rotate right
116 // finally update the balance factor
117 self.node().left.rotate_left();
118 self.rotate_right();
119 self.node().right.node().bfactor = a;
120 self.node().left.node().bfactor = b;
121 self.node().bfactor = 0;
122 } else {
123 unreachable!()
124 }
125 },
126 // If the left subtree is heavy
127 2 => {
128 let rbf=self.node().right.node().bfactor;
129 if rbf == 1 || rbf == 0 {
130 let (a, b) = if rbf == 1 {
131 (0, 0)
132 } else {
133 (1,-1)
134 };
135
136 self.rotate_left();
137 self.node().left.node().bfactor = a;
138 self.node().bfactor = b;
139 } else if rbf == -1 {
140 let (a, b) = match self.node()
141 .right.node()
142 .left.node()
143 .bfactor
144 {
145 1 => (-1,0),
146 0 => (0, 0),
147 -1 => (0, 1),
227
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
A B
-2 0
B left rotation A C
-1 0 0
C
0
A B
2 0
B right rotation C A
1 0 0
C
0
228
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
To perform rotations on a subtree, we can use left and right rotation rules. However, sometimes after
performing one rotation, the balance may still be lost in the opposite direction.
A C
-2 2
C left rotation A
1 -1
B B
0 0
229
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
32 *right.left_subtree() = n;
33 *self = right;
34 }
35
36 fn rotate_right(&mut self) {
37 let mut n = replace(self, Null);
38 let mut left = replace(n.left_subtree(), Null);
39 let left_right = replace(left.right_subtree(), Null);
40 *n.left() = left_right;
41 *left.right_subtree() = n;
42 *self = left;
43 }
44 }
To maintain the balance of the tree, we can perform rotation operations. However, we also need
to implement methods to obtain important information about the tree, such as the number of nodes,
node values, tree height, minimum and maximum values, and node queries. This requires implementing
methods such as size, leaf_size, depth, node, min, max, and contains for the balanced binary tree.
1 // avl.rs
2
3 impl<T> AvlTree<T> where T : Ord {
4 // Calculate the nodes number: number of left and right
5 // child nodes + root node, calculated recursively
6 fn size(&self) -> usize {
7 match self {
8 Null => 0,
9 Tree(n) => 1 + n.left.size() + n.right.size(),
10 }
11 }
12
13 // Calculate the leaf nodes number: calculated recursively
14 fn leaf_size(&self) -> usize {
15 match self {
16 Null => 0,
17 Tree(node) => {
18 if node.left == Null && node.right == null {
19 return 1;
20 }
21 let left_leaf = match node.left {
22 Null => 0,
23 _ => node.left.leaf_size(),
24 };
25 let right_leaf = match node.right {
26 Null => 0,
27 _ => node.right.leaf_size(),
28 };
29 left_leaf + right_leaf
30 },
31 }
32 }
33
34 // Calculate the number of non-leaf nodes
230
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
231
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
87 Greater => {
88 match &n.left {
89 Null => false,
90 _ => n.left.contains(key),
91 }
92 },
93 Less => {
94 match &n.right {
95 Null => false,
96 _ => n.right.contains(key),
97 }
98 },
99 }
100 },
101 }
102 }
103 }
In addition to these methods, we can implement the four traversal methods commonly used for bi-
nary trees: inorder, preorder, postorder, and levelorder. These traversal methods allow us to visit and
manipulate each node in the tree in a specific order, which can be useful in various scenarios such as
printing the tree, searching for a specific node, or computing some statistics about the tree.
1 // avl.rs
2
3 impl<T> AvlTree<T> where T : Ord {
4 // Internal implementation of preorder, inorder, postorder,
5 // and level-order traversal
6 fn preorder(&self) {
7 match self {
8 Null => (),
9 Tree(node) => {
10 println!("key: {:?}", node.key);
11 node.left.preorder();
12 node.right.preorder();
13 },
14 }
15 }
16
17 fn inorder(&self) {
18 match self {
19 Null => (),
20 Tree(node) => {
21 node.left.inorder();
22 println!("key: {:?}", node.key);
23 node.right.inorder();
24 },
25 }
26 }
27
28 fn postorder(&self) {
29 match self {
30 Null => (),
232
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
31 Tree(node) => {
32 node.left.postorder();
33 node.right.postorder();
34 println!("key: {:?}", node.key);
35 },
36 }
37 }
38
39 fn levelorder(&self) {
40 let size = self.size();
41 let mut q = Queue::new(size);
42
43 let _r = q.enqueue(self);
44 while !q.is_empty() {
45 let front = q.dequeue().unwrap();
46 match front {
47 Null => (),
48 Tree(node) => {
49 println!("key: {:?}", node.key);
50 let _r = q.enqueue(&node.left);
51 let _r = q.enqueue(&node.right);
52 },
53 }
54 }
55 }
56 }
57
58 // External implementation of preorder, inorder, postorder,
59 // and level-order traversal
60 fn preorder<T: Clone + Ord + Debug>(avl: &AvlTree<T>) {
61 match avl {
62 Null => (),
63 Tree(node) => {
64 println!("key: {:?}", node.key);
65 preorder(&node.left);
66 preorder(&node.right);
67 },
68 }
69 }
70
71 fn inorder<T: Clone + Ord + Debug>(avl: &AvlTree<T>) {
72 match avl {
73 Null => (),
74 Tree(node) => {
75 inorder(&node.left);
76 println!("key: {:?}", node.key);
77 inorder(&node.right);
78 },
79 }
80 }
81
82 fn postorder<T: Clone + Ord + Debug>(avl: &AvlTree<T>) {
233
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
83 match avl {
84 Null => (),
85 Tree(node) => {
86 postorder(&node.left);
87 postorder(&node.right);
88 println!("key: {:?}", node.key);
89 },
90 }
91 }
92
93 fn levelorder<T: Clone + Ord + Debug>(avl: &AvlTree<T>) {
94 let size = avl.size();
95 let mut q = Queue::new(size);
96 let _r = q.enqueue(avl);
97 while !q.is_empty() {
98 let front = q.dequeue().unwrap();
99 match front {
100 Null => (),
101 Tree(node) => {
102 println!("key: {:?}", node.key);
103 let _r = q.enqueue(&node.left);
104 let _r = q.enqueue(&node.right);
105 },
106 }
107 }
108 }
109
110 fn main() {
111 basic();
112 order();
113
114 fn basic() {
115 let mut t = AvlTree::new();
116 for i in 0..5 { let (_r1, _r2) = t.insert(i); }
117
118 println!("empty:{},size:{}",t.is_empty(),t.size());
119 println!("leaves:{},depth:{}",t.leaf_size(),t.depth());
120 println!("internals:{}",t.none_leaf_size());
121 println!("min-max key:{:?}-{:?}",t.min(), t.max());
122 println!("contains 9:{}",t.contains(&9));
123 }
124
125 fn order() {
126 let mut avl = AvlTree::new();
127 for i in 0..5 { let (_r1, _r2) = avl.insert(i); }
128
129 println!("internal pre-in-post-level order");
130 avl.preorder(); avl.inorder();
131 avl.postorder(); avl.levelorder();
132 println!("outside pre-in-post-level order");
133 preorder(&avl); inorder(&avl);
134 postorder(&avl); levelorder(&avl);
234
8.5. BALANCED BINARY SEARCH TREE CHAPTER 8. TREES
135 }
136 }
The following are outputs afterr execution.
empty:false,size:5
leaves:3,depth:3
internals:2
min-max key:Some(0)-Some(4)
contains 9:false
internal pre-in-pos-level order
key: 1
key: 0
key: 3
key: 2
key: 4
key: 0
key: 1
key: 2
key: 3
key: 4
key: 0
key: 2
key: 4
key: 3
key: 1
key: 1
key: 0
key: 3
key: 2
key: 4
outside pre-in-pos-level order
key: 1
key: 0
key: 3
key: 2
key: 4
key: 0
key: 1
key: 2
key: 3
key: 4
key: 0
key: 2
key: 4
key: 3
key: 1
key: 1
key: 0
key: 3
key: 2
key: 4
235
8.6. SUMMARY CHAPTER 8. TREES
8.6 Summary
This chapter introduced us to trees, an efficient data structure that enables us to implement a wide
range of useful algorithms. Trees find extensive applications in storage, networking, and other domains.
Throughout this chapter, we completed the following tasks using trees:
• Parseed and evaluated expressions.
• Implemented binary heaps as priority queues.
• Implemented binary trees, binary search trees, and balanced binary search trees.
In the previous chapters, we learned about several abstract data types used to implement mapping
relationships (Maps), including ordered tables, hash tables, binary search trees, and balanced binary
search trees. The table below compares the worst-case performance of various operations supported
by these data types. While red-black trees are an improvement over AVL trees, their complexity and
performance remain similar to AVL trees, with only differences in coefficients. For further information,
readers are encouraged to consult relevant resources.
236
Chapter 9
Graphs
9.1 Objectives
• Understand the concept and storage format of graphs.
• Implement several graph data structures in Rust.
• Learn two important graph search algorithms.
• Use graphs to solve various real-world problems.
The figure above illustrates a graph with multiple nodes and connections, which demonstrates the
vertical and horizontal relationships between different items. For example, calculus and linear algebra
have no direct connection, but they can jointly incubate neural networks, forming a horizontal relation-
ship. Derivatives, on the other hand, contribute to the construction of calculus, which is a part of calculus,
and this forms a vertical relationship.
237
9.3. GRAPH STORAGE FORMAT CHAPTER 9. GRAPHS
Graphs are versatile data structures that can be used to represent various real-world phenomena, such
as flight maps, social network graphs, and course planning graphs. In this chapter, we will learn about
the concept and storage format of graphs, and how to implement them in Rust. We will also explore two
essential graph search algorithms that can be applied to solve different problems. Research on graphs,
their various applications, and algorithms belong to a specialized discipline called graph theory.
V3
9
7
3
1 V2
8
V4 V5
1 2 4
5
V0 V1
In addition to vertices, edges, and weights, paths are used to represent the order of points. A path is
a vertex sequence that represents the order of point connections, such as (v, w, x, y, z). Cyclic graphs,
which have cycles, such as V 5− > V 2− > V 3− > V 5, may also exist. If there are no cycles in the
graph, it is called an acyclic graph or DAG graph. Many important problems can be represented using
DAG graphs.
238
9.3. GRAPH STORAGE FORMAT CHAPTER 9. GRAPHS
V0 V1 V2 V3 V4 V5
V0 5 2
V1 4
V2 9
V3 7 3
V4 1
V5 1 8
Although the adjacency matrix is easy to understand and analyze for small graphs, it suffers from
the drawback of being very sparse. Most of the cells in the matrix are empty, leading to a significant
waste of space. For a graph with N vertices, an adjacency matrix requires N 2 storage. Therefore, the
adjacency matrix is not an efficient method to store large graphs.
V0 V1 V5
V1 V2
V2 V3
V3 V4 V5
V4 V0
V5 V2 V4
239
9.4. THE GRAPH ABSTRACT DATA TYPE CHAPTER 9. GRAPHS
An efficient way to save graphs is by using an adjacency list, as shown in the figure above. In this
approach, an array stores all the vertices in the graph, and each vertex maintains a linked list that connects
it to other vertices. By accessing the linked lists of various vertices, it is possible to know how many
points it is connected to, which is similar to the chaining method used by HashMap to solve collisions.
The adjacency list stores the linked points in a structure similar to HashMap because edges have
weights, while arrays can only save vertices. Graphs implemented with adjacency lists are compact and
have no memory waste. This structure is also very convenient for storage. The importance of basic data
structures is evident here, and the HashMap implemented in previous chapters is useful for implementing
adjacency lists.
240
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
241
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
9
10 impl Graph {
11 fn new(nodes: usize) -> Self {
12 Self {
13 nodes,
14 graph: vec![vec![Edge::new(); nodes]; nodes],
15 }
16 }
17
18 fn is_empty(&self) -> bool {
19 0 == self.nodes
20 }
21
22 fn len(&self) -> usize { self.nodes }
23
24 // add a edge and set the edge property to true
25 fn add_edge(&mut self, n1: &Vertex, n2: &Vertex) {
26 if n1.id < self.nodes && n2.id < self.nodes {
27 self.graph[n1.id][n2.id] = Edge::set_edge();
28 } else {
29 println!("Error, vertex beyond the graph");
30 }
31 }
32 }
33
34 fn main() {
35 let mut g = Graph::new(4);
36 let n1 = Vertex::new(0,"n1");let n2 = Vertex::new(1,"n2");
37 let n3 = Vertex::new(2,"n3");let n4 = Vertex::new(3,"n4");
38 g.add_edge(&n1,&n2); g.add_edge(&n1,&n3);
39 g.add_edge(&n2,&n3); g.add_edge(&n2,&n4);
40 g.add_edge(&n3,&n4); g.add_edge(&n3,&n1);
41 println!("{:#?}", g);
42 println!("graph empty: {}", g.is_empty());
43 println!("graph nodes: {}", g.len());
44 }
The follwing are outputs after execution.
Graph {
nodes: 4,
graph: [
[
Edge { edge: false, },
Edge { edge: true, },
Edge { edge: true, },
Edge { edge: false, },
],
[
Edge { edge: false, },
Edge { edge: false, },
Edge { edge: true, },
Edge { edge: true, },
242
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
],
[
Edge { edge: true, },
Edge { edge: false, },
Edge { edge: false, },
Edge { edge: true, },
],
[
Edge { edge: false, },
Edge { edge: false, },
Edge { edge: false, },
Edge { edge: false, },
],
],
}
graph empty: false
graph nodes: 4
Next, we will implement an adjacency list graph using HashMap. Since vertices are the core elements
and edges are the relationships between vertices, we need to create a data structure Vertex to represent
the vertex element. For Vertex, we need to define operations such as creating a new vertex, getting the
value of the vertex itself, adding adjacent vertices, getting all adjacent vertices, and getting the weights
of adjacent vertices. The neighbors variable is used to store all adjacent vertices of the current vertex.
1 // graph_adjlist.rs
2
3 use std::hash::Hash;
4 use std::collections::HashMap;
5
6 // Definision of Vertex
7 #[derive(Debug, Clone)]
8 struct Vertex<T> {
9 key: T,
10 neighbors: Vec<(T, i32)>, // store adjacent vertices
11 }
12
13 impl<T: Clone + PartialEq> Vertex<T> {
14 fn new(key: T) -> Self {
15 Self {
16 key: key,
17 neighbors: Vec::new()
18 }
19 }
20
21 // Check if a point is adjacent to the current point
22 fn adjacent_key(&self, key: &T) -> bool {
23 for (nbr, _wt) in self.neighbors.iter() {
24 if nbr == key {
25 return true;
26 }
27 }
28
243
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
29 false
30 }
31
32 fn add_neighbor(&mut self, nbr: T, wt: i32) {
33 self.neighbors.push((nbr, wt));
34 }
35
36 // Get the set of adjacent points
37 fn get_neighbors(&self) -> Vec<&T> {
38 let mut neighbors = Vec::new();
39 for (nbr, _wt) in self.neighbors.iter() {
40 neighbors.push(nbr);
41 }
42
43 neighbors
44 }
45
46 // Return the edge weight to the adjacent point
47 fn get_nbr_weight(&self, key: &T) -> &i32 {
48 for (nbr, wt) in self.neighbors.iter() {
49 if nbr == key {
50 return wt;
51 }
52 }
53
54 &0
55 }
56 }
Graph is a data structure used to implement graphs, which includes a HashMap that maps vertex
names to vertex objects.
1 // graph_adjlist.rs
2
3 // Definish of Graph
4 #[derive(Debug, Clone)]
5 struct Graph <T> {
6 vertnums: u32, // count of vertices
7 edgenums: u32, // count of edges
8 vertices: HashMap<T, Vertex<T>>,
9 }
10
11 impl<T: Hash + Eq + PartialEq + Clone> Graph<T> {
12 fn new() -> Self {
13 Self {
14 vertnums: 0,
15 edgenums: 0,
16 vertices: HashMap::<T, Vertex<T>>::new(),
17 }
18 }
19
20 fn is_empty(&self) -> bool { 0 == self.vertnums }
21
244
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
245
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
74 self.edgenums -= 1;
75 }
76 }
77 }
78
79 old_vertex
80 }
81
82 fn add_edge(&mut self, from: &T, to: &T, wt: i32) {
83 // If the point doesn't exist, add it first
84 if !self.contains(from) {
85 let _fv = self.add_vertex(from);
86 }
87 if !self.contains(to) {
88 let _tv = self.add_vertex(to);
89 }
90
91 // Add an edge
92 self.edgenums += 1;
93 self.vertices.get_mut(from)
94 .unwrap()
95 .add_neighbor(to.clone(), wt);
96 }
97
98 // Determine if two vertices are adjacent
99 fn adjacent(&self, from: &T, to: &T) -> bool {
100 self.vertices.get(from).unwrap().adjacent_key(to)
101 }
102 }
Using Graph, we can create the vertices V0-V5 and their edges as shown in the graph (9.1).
1 // graph_adjlist.rs
2
3 fn main() {
4 let mut g = Graph::new();
5
6 for i in 0..6 { g.add_vertex(&i); }
7 println!("graph empty: {}", g.is_empty());
8
9 let vertices = g.vertex_keys();
10 for vtx in vertices { println!("Vertex: {:#?}", vtx); }
11
12 g.add_edge(&0,&1,5); g.add_edge(&0,&5,2);
13 g.add_edge(&1,&2,4); g.add_edge(&2,&3,9);
14 g.add_edge(&3,&4,7); g.add_edge(&3,&5,3);
15 g.add_edge(&4,&0,1); g.add_edge(&4,&4,8);
16 println!("vert nums: {}", g.vertex_num());
17 println!("edge nums: {}", g.edge_num());
18 println!("contains 0: {}", g.contains(&0));
19
20 let vertex = g.get_vertex(&0).unwrap();
21 println!("key: {}, to nbr 1 weight: {}",
246
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
22 vertex.key, vertex.get_nbr_weight(&1));
23
24 let keys = vertex.get_neighbors();
25 for nbr in keys { println!("nighbor: {nbr}"); }
26
27 for (nbr, wt) in vertex.neighbors.iter() {
28 println!("0 nighbor: {nbr}, weight: {wt}");
29 }
30
31 let res = g.adjacent(&0, &1);
32 println!("0 adjacent to 1: {res}");
33 let res = g.adjacent(&3, &2);
34 println!("3 adjacent to 2: {res}");
35
36 let rm = g.remove_vertex(&0).unwrap();
37 println!("remove vertex: {}", rm.key);
38 println!("left vert nums: {}", g.vertex_num());
39 println!("left edge nums: {}", g.edge_num());
40 println!("contains 0: {}", g.contains(&0));
41 }
The following are outputs after execution.
graph empty: false
Vertex: 3
Vertex: 4
Vertex: 1
Vertex: 2
Vertex: 5
Vertex: 0
vert nums: 6
edge nums: 8
contains 0: true
key: 0, to nbr 1 weight: 5
nighbor: 1
nighbor: 5
0 nighbor: 1, weight: 5
0 nighbor: 5, weight: 2
0 is adjacent to 1: true
3 is adjacent to 2: false
remove vertex: 0
left vert nums: 5
left edge nums: 5
contains 0: false
247
9.5. IMPLEMENTING A GRAPH IN RUST CHAPTER 9. GRAPHS
a b c d e f g
---------------------------------------
FOOL FOOL FOOL FOOL FOOL FOOL FOOL
POOL FOIL FOIL COOL COOL FOUL FOUL
POLL FAIL FAIL POOL POOL FOIL FOIL
POLE FALL FALL POLL POLL FALL FALL
PALE PALL PALL POLE PALL FALL FALL
SALE PALE PALE PALE PALE PALL PALL
SAGE PAGE SALE SALE SALE POLL POLL
SAGE SAGE SAGE SAGE PALE PALE
PAGE SALE
SAGE SAGE
Our goal is to use graph algorithms to find the shortest path between the starting and ending words,
as shown in column a of the figure above. To achieve this, we first convert the words into vertices and
link together the words that can be transformed into each other. If two words differ by only one letter, we
create a bidirectional edge between them, as shown in the figure below. Finally, we use a graph search
algorithm to find the shortest path between the starting and ending vertices.
fail fall
fool
poll page
cool pool
To create a graph model for the word ladder problem, there are various methods available. When
dealing with a list of words with the same length, a vertex can be created for each word in the list. To
connect the words, each word in the list must be compared to the others. If two words being compared
have only one letter different, an edge can be created between them in the graph. This method is feasible
for small word lists, but for larger lists such as the CET-4 vocabulary list with 3000 words or the CET-6
list with 5000 words, the comparison is inefficient, requiring millions of comparisons.
An alternative approach is to consider the letter positions of the words and group them based on
similar patterns. This way, we can search for similar words at each position and group them together.
For instance, if we remove the first letter of SOPE and keep the rest ”_OPE,” all four-letter words ending
in ”OPE” should be connected to SOPE and collected into a set. The same process is used for ”S_PE”
and all the words matching this pattern are collected. Finally, all the words matching this pattern are
collected, as shown in the figure below.
248
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
_OPE P_PE PO_E POP_
POPE POPE POPE POPE
ROPE PIPE POLE POPS
NOPE PAPE PORE
HOPE POSE
LOPE POKE
COPE
This solution can be implemented using a HashMap. The set mentioned earlier stores words with
the same pattern as keys in the HashMap, which stores similar sets of words. Once the word sets are
established, the graph can be created. To start the graph, create a vertex for each word in the HashMap,
and then create edges between all vertices found under the same key in the dictionary. Once the graph
is implemented, the ladder search task can be completed.
fool
The BFS algorithm explores the edges in the graph to find all vertices in G(assuming the distance
between points is 1) that have a path starting from a given vertex s. It does this by first finding all vertices
that are one unit away from s, then all vertices that are two units away, and so on, until all vertices are
found. The search is performed layer by layer, which is why it is called breadth-first search. The points
connected to the starting vertex are considered as one layer, put into a queue first, and then the algorithm
finds the next layer of connecting points for those in the queue and repeats the process until the search
is complete.
During the search, vertices can be colored to indicate their status. Initially, all vertices are white.
As the search progresses, vertices connected to the current search vertex are set to gray. The algorithm
checks each gray vertex, and if it is not the target value, it is set to black, and the search continues for
white vertices connected to the gray vertex. This process continues until the search task is completed or
the entire graph search is complete. This search method is similar to the garbage collection mechanism
of some languages, such as the tricolor garbage collection in the Go language.
249
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
1 // bfs.rs
2
3 use std::rc::Rc;
4 use std::cell::RefCell;
5
6 // Node has multiple shared links,
7 // Box cannot be shared, only Rc can be shared.
8 // Rc is immutable, RefCell with internal mutability
9 // is used to wrap it.
10 type Link = Option<Rc<RefCell<Node>>>;
11
12 // Definition for a Node
13 struct Node {
14 data: usize,
15 next: Link,
16 }
17
18 impl Node {
19 fn new(data: usize) -> Self {
20 Self {
21 data: data,
22 next: None
23 }
24 }
25 }
26
27 // Definition for a Graph
28 struct Graph {
29 first: Link,
30 last: Link,
31 }
32
33 impl Graph {
34 fn new() -> Self {
35 Self { first: None, last: None }
36 }
37
38 fn is_empty(&self) -> bool {
39 self.first.is_none()
40 }
41
42 fn get_first(&self) -> Link {
43 self.first.clone()
44 }
45
46 // Print node
47 fn print_node(&self) {
48 let mut curr = self.first.clone();
49 while let Some(val) = curr {
50 print!("[{}]", &val.borrow().data);
51 curr = val.borrow().next.clone();
250
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
52 }
53
54 print!("\n");
55 }
56
57 // Insert node, RefCell is used to borrow_mut to modify
58 fn insert(&mut self, data: usize) {
59 let node = Rc::new(RefCell::new(Node::new(data)));
60
61 if self.is_empty() {
62 self.first = Some(node.clone());
63 self.last = Some(node);
64 } else {
65 self.last.as_mut()
66 .unwrap()
67 .borrow_mut()
68 .next = Some(node.clone());
69 self.last = Some(node);
70 }
71 }
72 }
The code above shows the implementation of the graph. The following code uses this graph to
implement the basic breadth-first search algorithm. The build_graph function constructs the graph and
encapsulates it into a tuple, which is then saved to a vector. The second value of the tuple indicates
whether the node has been visited, with 0 indicating not visited and 1 indicating visited.
1 // bfs.rs
2
3 // Build the graph based on data
4 fn build_graph(data: [[usize;2];20]) -> Vec<(Graph, usize)> {
5 let mut graphs: Vec<(Graph, usize)> = Vec::new();
6 for _ in 0..9 { graphs.push((Graph::new(), 0)); }
7 for i in 1..9 {
8 for j in 0..data.len() {
9 if data[j][0] == i {
10 graphs[i].0.insert(data[j][1]);
11 }
12 }
13 print!("[{i}]->");
14 graphs[i].0.print_node();
15 }
16 graphs
17 }
18
19 fn bfs(graph: Vec<(Graph, usize)>) {
20 let mut gp = graph;
21 let mut nodes = Vec::new();
22 gp[1].1 = 1;
23 let mut curr = gp[1].0.get_first().clone();
24
25 // Print the graph
26 print!("{1}->");
251
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
252
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
This algorithm uses Vec as a queue to search for the shortest transformation path in a word ladder.
In addition to printing the values of each node’s connections, it also prints all nodes in the order they
are searched. To better represent node colors, an enum representing colors needs to be defined and node
colors need to be updated accordingly. A distance value needs to be added to the node to represent the
distance from the starting point in order to calculate the shortest distance.
1 // word_ladder.rs
2
3 // Enum for colors used to determine
4 // if a node has been searched
5 #[derive(Clone, Debug, PartialEq)]
6 enum Color {
7 White, // White: not yet explored
8 Gray, // Gray: currently being explored
9 Black, // Black: exploration complete
10 }
11
12 // Definition for a Vertex
13 #[derive(Debug, Clone)]
14 struct Vertex<T> {
15 color: Color,
16 distance: u32, // Minimum distance from the starting node
17 // i.e. minimum number of transformations
18 key: T,
19 neighbors: Vec<(T, u32)>, // store all adjacent vertices
20 }
21
22 impl<T: Clone + PartialEq> Vertex<T> {
23 fn new(key: T) -> Self {
24 Self {
25 color: Color::White,
26 distance: 0,
27 key: key,
28 neighbors: Vec::new(),
29 }
30 }
31
32 fn add_neighbor(&mut self, nbr: T, wt: u32) {
33 self.neighbors.push((nbr, wt));
34 }
35
36 // Get adjacent nodes
37 fn get_neighbors(&self) -> Vec<&T> {
38 let mut neighbors = Vec::new();
39
40 for (nbr, _wt) in self.neighbors.iter() {
41 neighbors.push(nbr);
42 }
43
44 neighbors
45 }
46 }
253
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
The code above, providing a color and node, is more complex than the basic BFS algorithm. To add
nodes to the queue, the Queue implementation from Chapter 4 is introduced. The graph definition for
solving the word ladder problem includes the number of nodes, edges, and a hashMap for storing node
values and their structs.
1 // word_ladder.rs
2
3 use std::collections::HashMap;
4 use std::hash::Hash;
5
6 // Definition of Graph
7 #[derive(Debug, Clone)]
8 struct Graph<T> {
9 vertnums: u32,
10 edgenums: u32,
11 vertices: HashMap<T, Vertex<T>>,
12 }
13
14 impl<T: Hash + Eq + PartialEq + Clone> Graph<T> {
15 fn new() -> Self {
16 Self {
17 vertnums: 0,
18 edgenums: 0,
19 vertices: HashMap::<T, Vertex<T>>::new(),
20 }
21 }
22
23 fn contains(&self, key: &T) -> bool {
24 for (nbr, _vertex) in self.vertices.iter() {
25 if nbr == key { return true; }
26 }
27
28 false
29 }
30
31 // add a vertice
32 fn add_vertex(&mut self, key: &T) -> Option<Vertex<T>> {
33 let vertex = Vertex::new(key.clone());
34 self.vertnums += 1;
35 self.vertices.insert(key.clone(), vertex)
36 }
37
38 // add an edge
39 fn add_edge(&mut self, from: &T, to: &T, wt: u32) {
40 // Add node if it doesn't exist
41 if !self.contains(from) {
42 let _fvert = self.add_vertex(from);
43 }
44 if !self.contains(to) {
45 let _tvert = self.add_vertex(to);
46 }
47
254
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
48 self.edgenums += 1;
49 self.vertices
50 .get_mut(from)
51 .unwrap()
52 .add_neighbor(to.clone(), wt);
53 }
54 }
With this graph, a word ladder graph can be constructed by arranging the words according to their
patterns.
1 // word_ladder.rs
2
3 // Construct the graph based on words and patterns
4 fn build_word_graph(words: Vec<&str>) -> Graph<String> {
5 let mut hmap: HashMap<String,Vec<String>> = HashMap::new();
6
7 // Build a word-pattern hashMap
8 for word in words {
9 for i in 0..word.len() {
10 let pattn = word[..i].to_string()
11 + "_"
12 + &word[i + 1..];
13 if hmap.contains_key(&pattn) {
14 hmap.get_mut(&pattn)
15 .unwrap()
16 .push(word.to_string());
17 } else {
18 hmap.insert(pattn, vec![word.to_string()]);
19 }
20 }
21 }
22
23 // Double-ended graph with a distance of 1
24 let mut word_graph = Graph::new();
25 for word in hmap.keys() {
26 for w1 in &hmap[word] {
27 for w2 in &hmap[word] {
28 if w1 != w2 {
29 word_graph.add_edge(w1, w2, 1);
30 }
31 }
32 }
33 }
34
35 word_graph
36 }
The BFS algorithm is then used to search for the shortest path. Note that although three colors are
defined, only white nodes will be added to the queue, so gray nodes can be set to black or left unset.
This BFS function constructs the breadth-first search tree, and the search queue is emptied when the
algorithm finds the result ”sage”.
255
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
1 // word_ladder.rs
2
3 // Word ladder graph - breadth-first search
4 fn word_ladder(g: &mut Graph<String>,
5 start: Vertex<String>,
6 end: Vertex<String>,
7 len: usize) -> u32
8 {
9 // Check if the starting point exists
10 if !g.vertices.contains_key(&start.key) {
11 return 0;
12 }
13
14 if !g.vertices.contains_key(&end.key) {
15 return 0;
16 }
17
18 // Prepare the queue and add the starting point
19 let mut vertex_queue = Queue::new(len);
20 let _r = vertex_queue.enqueue(start);
21
22 while vertex_queue.len() > 0 {
23 // Dequeue the node
24 let curr = vertex_queue.dequeue().unwrap();
25
26 for nbr in curr.get_neighbors() {
27 // Clone to avoid conflicts with data in the graph
28 // Vertices in the Graph are wrapped with RefCell
29 // and do not need to be cloned
30 let mut nbv = g.vertices.get(nbr).unwrap().clone();
31
32 if end.key != nbv.key {
33 // Only white nodes can be added to the queue;
34 // other colors have already been processed
35 if Color::White == nbv.color {
36 // Update the node's color and distance and
37 // add it to the queue
38 nbv.color = Color::Gray;
39 nbv.distance = curr.distance + 1;
40
41 // The color and distance of the node
42 // in the graph also need to be updated
43 g.vertices.get_mut(nbr)
44 .unwrap()
45 .color = Color::Gray;
46 g.vertices.get_mut(nbr)
47 .unwrap()
48 .distance = curr.distance + 1;
49
50 // Add white nodes to the queue
51 let _r = vertex_queue.enqueue(nbv);
256
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
52 }
53 } else {
54 // Since the neighbor of curr contains end,
55 // one more transformation is enough
56 return curr.distance + 1;
57 }
58 }
59 }
60
61 0
62 }
63
64 fn main() {
65 let words = [
66 "FOOL", "COOL", "POOL", "FOUL", "FOIL", "FAIL", "FALL",
67 "POLL", "PALL", "POLE", "PALE", "SALE", "PAGE", "SAGE",
68 ];
69
70 let len = words.len();
71 let mut g = build_word_graph(words);
72
73 // The starting node is added to the queue and is
74 // being explored, so its color changes to gray
75 g.vertices.get_mut("FOOL").unwrap().color = Color::Gray;
76
77 // Retrieve the first and last points
78 let start = g.vertices.get("FOOL").unwrap().clone();
79 let end = g.vertices.get("SAGE").unwrap().clone();
80
81 // Calculate the minimum number of transformations,
82 // which is the distance.
83 let distance = word_ladder(&mut g, start, end, len);
84 println!("the shortest distance: {distance}");
85 // the shortest distance: 6
86 }
During the search process, the breadth-first search tree is constructed, and the node coloring during
the search process is shown in the figure. At the beginning, all nodes adjacent to ”fool”, including ”pool”,
”foil”, ”foul”, and ”cool”, are taken, and these nodes are added to the queue for searching.
fool
poll
When bfs checks the node ”cool”, it finds that it is gray, indicating that there is a shorter path to
”cool”. When checking ”pool”, a new node ”poll” is added.
257
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
fool
poll fail
The next vertex on the queue is ”foil”, and the only new node that can be added to the tree is ”fail”.
When bfs continues to process the queue, the next two nodes do not add any new points to the queue.
fool
poll fail
pole pall
pope pale
page sale
sage
searching queue
Finally, the shortest path is the number of levels from ”fool” to ”sage”, which is 6 levels(shortest
path is 6). Readers can test this code with different words.
258
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
To represent the Knight’s tour problem as a graph, each square on the chessboard is represented as a
node, and each legal move of the knight is represented as an edge in the graph. The knight is represented
as a yellow square in the figure below.
20 21 22 23 24
21 23
15 16 17 18 19
15 19
12
10 11 12 13 14
5 9
1 3
5 6 7 8 9
movable vertices
0 1 2 3 4
To construct a complete graph of size n × n, the knight’s eight possible moves must be considered.
By traversing the entire graph, a move list can be created for each position, all of which can be converted
into edges in the graph.
1 // knight_tour.rs
2
3 use std::collections::HashMap;
4 use std::hash::Hash;
259
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
5 use std::fmt::Display;
6
7 // bread-width
8 const BDSIZE: u32 = 8;
9
10 // Color enum used to determine if a node has been searched
11 #[derive(Debug, Clone, PartialEq)]
12 enum Color {
13 White, // White: not yet explored
14 Gray, // Gray: currently being explored
15 }
16
17 // Definition of vertice
18 #[derive(Debug, Clone)]
19 struct Vertex<T> {
20 key: T,
21 color: Color,
22 neighbors: Vec<T>,
23 }
24
25 impl<T: PartialEq + Clone> Vertex<T> {
26 fn new(key: T) -> Self {
27 Self {
28 key: key,
29 color: Color::White,
30 neighbors: Vec::new(),
31 }
32 }
33
34 fn add_neighbor(&mut self, nbr: T) {
35 self.neighbors.push(nbr);
36 }
37
38 fn get_neighbors(&self) -> Vec<&T> {
39 let mut neighbors = Vec::new();
40 for nbr in self.neighbors.iter() {
41 neighbors.push(nbr);
42 }
43 neighbors
44 }
45 }
Once all the movable edges of each node have been found, the Knight’s tour graph can be constructed.
1 // knight_tour.rs
2
3 // Knight's tour graph definition
4 #[derive(Debug, Clone)]
5 struct Graph<T> {
6 vertnums: u32,
7 edgenums: u32,
8 vertices: HashMap<T, Vertex<T>>,
9 }
260
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
10
11 impl<T: Eq + PartialEq + Clone + Hash> Graph<T> {
12 fn new() -> Self {
13 Self {
14 vertnums: 0,
15 edgenums: 0,
16 vertices: HashMap::<T, Vertex<T>>::new(),
17 }
18 }
19
20 fn add_vertex(&mut self, key: &T) -> Option<Vertex<T>> {
21 let vertex = Vertex::new(key.clone());
22 self.vertnums += 1;
23 self.vertices.insert(key.clone(), vertex)
24 }
25
26 fn add_edge(&mut self, src: &T, des: &T) {
27 if !self.vertices.contains_key(src) {
28 let _fv = self.add_vertex(src);
29 }
30 if !self.vertices.contains_key(des) {
31 let _tv = self.add_vertex(des);
32 }
33
34 self.edgenums += 1;
35 self.vertices.get_mut(src)
36 .unwrap()
37 .add_neighbor(des.clone());
38 }
39 }
40
41 // Destination coordinates that can be moved to
42 fn legal_moves(x: u32, y: u32, bdsize: u32) -> Vec<(u32, u32)>
{
43 // Knight moves like a horse, which is a "L" shape move
44 // Horizontal and vertical coordinates of the horse will
45 // increase or decrease accordingly in eight directions
46 let move_offsets = [
47 (-1, 2), ( 1, 2),
48 (-2, 1), ( 2, 1),
49 (-2, -1), ( 2, -1),
50 (-1, -2), ( 1, -2),
51 ];
52
53 // A closure function to check if the new coordinate
54 // is valid (within the chessboard range)
55 let legal_pos = |a: i32, b: i32| { a >= 0 && a < b };
56
57 let mut legal_positions = Vec::new();
58 for (x_offset, y_offset) in move_offsets.iter() {
59 let new_x = x as i32 + x_offset;
60 let new_y = y as i32 + y_offset;
261
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
61
62 // Check the coordinate and add it to
63 // the set of movable points
64 if legal_pos(new_x, bdsize as i32)
65 && legal_pos(new_y, bdsize as i32) {
66 legal_positions.push((new_x as u32, new_y as u32));
67 }
68 }
69
70 // Return the set of movable points
71 legal_positions
72 }
73
74 // Build the graph of movable paths
75 fn build_knight_graph(bdsize: u32) -> Graph<u32> {
76 // A closure function to calculate the point value
77 // Point range is [0, 63]
78 let calc_point = |row: u32, col: u32, size: u32| {
79 (row % size) * size + col
80 };
81
82 // Set edges between points
83 let mut knight_graph = Graph::new();
84 for row in 0..bdsize {
85 for col in 0..bdsize {
86 let dests = legal_moves(row, col, bdsize);
87 for des in dests {
88 let src_p = calc_point(row, col, bdsize);
89 let des_p = calc_point(des.0, des.1, bdsize);
90 knight_graph.add_edge(&src_p, &des_p);
91 }
92 }
93 }
94
95 knight_graph
96 }
The search algorithm used to solve the Knight’s tour problem is called depth-first search. While
breadth-first search explores vertices on the same level as widely as possible, depth-first search explores
multiple levels of the tree as deeply as possible. Various strategies can be employed using depth-first
search to solve the problem. The first strategy prohibits nodes from being visited multiple times, while
the second strategy allows nodes to be visited multiple times during tree construction. When the depth-
first search algorithm encounters a dead (i.e., a point with no movable points) end, it backtracks to the
previous deepest point and continues the exploration.
1 // depth: length of the path traveled,
2 // curr: current node, path: saves visited points
3 fn knight_tour<T>(
4 kg: &mut Graph<T>,
5 curr: Vertex<T>,
6 path: &mut Vec<String>,
7 depth: u32) -> bool
8 where T: Eq + PartialEq + Clone + Hash + Display
262
9.6. BREADTH FIRST SEARCH(BFS) CHAPTER 9. GRAPHS
9 {
10 // Add the string value of the current node to path
11 path.push(curr.key.to_string());
12
13 let mut done = false;
14 if depth < BDSIZE * BDSIZE - 1 {
15 let mut i = 0;
16 let nbrs = curr.get_neighbors();
17
18 // Knight travels between neighboring points
19 while i < nbrs.len() && !done {
20 // Avoid multiple mutable references
21 let nbr =kg.vertices.get(nbrs[i]).unwrap().clone();
22
23 if Color::White == nbr.color {
24 // Update the corresponding point color to gray
25 kg.vertices.get_mut(nbrs[i])
26 .unwrap()
27 .color = Color::Gray;
28 // Search for the next suitable point
29 done = knight_tour(kg, nbr, path, depth + 1);
30 if !done {
31 // If not found, remove the current point
32 // from path and restore the color of the
33 // corresponding point to white
34 let _rm = path.pop();
35 kg.vertices.get_mut(nbrs[i])
36 .unwrap()
37 .color = Color::White;
38 }
39 }
40
41 // Explore the next neighboring point
42 i += 1;
43 }
44 } else {
45 done = true;
46 }
47
48 done
49 }
50
51 fn main() {
52 // Build the knight's tour graph
53 let mut kg: Graph<u32> = build_knight_graph(BDSIZE);
54
55 // Choose a starting point and update the color
56 // of the corresponding point in the graph
57 let point = 0;
58 kg.vertices.get_mut(&point).unwrap().color = Color::Gray;
59 let start = kg.vertices.get(&point).unwrap().clone();
60
263
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
264
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
A
1/ B C
D E F
Next, we visit vertex B, which is adjacent to vertices C and D. Vertex C leads us to the end of a
branch, so it can be colored black, and the end time set to 4. Now, return to vertex B and continue
explore D, which leads us to vertex E. Vertex E has two adjacent vertices, B and F. Since B is already
colored black, we can skip it and visit Vertex F.
A B A B C
1/ 2/ C 1/ 2/ 3/
D E F D E F
A B C A B C
1/ 2/ 3/4 1/ 2/ 3/4
D D E F
5/ E F 5/ 6/ 7/
Vertex F has only one adjacent vertex, C, already colored black, the algorithm has reached the end
of the branch. Therefore, the algorithm must backtrack and continue to search until it encounters a gray
vertex or exits.
A B C A B C
1/ 2/ 3/4 1/ 2/ 3/
D E F D E F
5/ 6/ 7/8 D
5/ E
6/9 F
7/8
A B C A B C
1/ 2/ 3/4 1/ 2/11 3/4
D E F D E F
5/10 6/9 7/8 5/10 6/9 7/8
Eventually, the algorithm bachtracks to original vertex and exits the search, obtaining the visited
path: A → B → D → E → F .
A B C
1/12 2/11 3/4
D E F
5/10 6/9 7/8
265
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
266
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
48 }
49
50 // insert data
51 fn insert(&mut self, data: usize) {
52 let node = Rc::new(RefCell::new(Node::new(data)));
53 if self.is_empty() {
54 self.first = Some(node.clone());
55 self.last = Some(node);
56 } else {
57 self.last.as_mut()
58 .unwrap()
59 .borrow_mut()
60 .next = Some(node.clone());
61 self.last = Some(node);
62 }
63 }
64 }
65
66 // build a graph
67 fn build_graph(data: [[usize;2];20]) -> Vec<(Graph, usize)> {
68 let mut graphsVec<(Graph, usize)> = Vec::new();
69 for _ in 0..9 {
70 graphs.push((Graph::new(), 0));
71 }
72
73 for i in 1..9 {
74 for j in 0..data.len() {
75 if data[j][0] == i {
76 graphs[i].0.insert(data[j][1]);
77 }
78 }
79
80 print!("[{i}]->");
81 graphs[i].0.print_node();
82 }
83
84 graphs
85 }
86
87 fn dfs(graph: Vec<(Graph, usize)>) {
88 let mut gp = graph;
89 let mut nodes: Vec<usize> = Vec::new();
90 let mut temp: Vec<usize> = Vec::new();
91
92 gp[1].1 = 1;
93 let mut curr = gp[1].0.get_first().clone();
94
95 // print graph
96 print!("{1}->");
97 while let Some(val) = curr {
98 nodes.insert(0,val.borrow().data);
99 curr = val.borrow().next.clone();
267
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
100 }
101
102 // print dfs graph
103 loop{
104 if 0 == nodes.len() {
105 break;
106 }else{
107 let data = nodes.pop().unwrap();
108 if 0 == gp[data].1 { // Unvisited
109 // Change state to visited
110 gp[data].1 = 1;
111 print!("{data}->");
112
113 // Add node to temp and perform depth-first
114 // search on it
115 let mut curr = gp[data].0.get_first().clone();
116 while let Some(val) = curr {
117 temp.push(val.borrow().data);
118 curr = val.borrow().next.clone();
119 }
120
121 while !temp.is_empty(){
122 nodes.push(temp.pop().unwrap());
123 }
124 }
125 }
126 }
127
128 println!("");
129 }
130
131 fn main() {
132 let data = [
133 [1,2],[2,1],[1,3],[3,1],[2,4],[4,2],[2,5],
134 [5,2],[3,6],[6,3],[3,7],[7,3],[4,5],[5,4],
135 [6,7],[7,6],[5,8],[8,5],[6,8],[8,6]
136 ];
137 let gp = build_graph(data);
138 dfs(gp);
139 }
Here is the output that displays the adjacent nodes of each vertex and the deepest path.
[1]->[2][3]
[2]->[1][4][5]
[3]->[1][6][7]
[4]->[2][5]
[5]->[2][4][8]
[6]->[3][7][8]
[7]->[3][6]
[8]->[5][6]
1->2->4->5->8->6->3->7->
268
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
Topological sorting is a variant of depth-first search that produces a linear ordering of all vertices in a
directed acyclic graph. This resulting order can be used to indicate event priorities, set project schedules,
generate priority graphs for database queries, and so on. The basic ideas of topological sorting are as
follows:
• Call dfs() on the graph g using depth-first search to find suitable vertices.
• Store the vertices in a stack, with the last one at the bottom.
• Return the data in the stack as the result of the topological sort.
Following these steps, we can transform a graph into a linear relationship, such as the course plan
mentioned above, which can be converted into a topological sequence.
269
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
cnn
3/4
The final topological sort result is presented below. Courses with the same color can be taken in any
order, while courses with different colors must be taken in the order specified by the topological sort.
To implement the graph algorithm for course scheduling, we first define an enumeration of colors to
represent the state of node exploration.
1 // course_topological_sort.rs
2 use std::collections::HashMap;
3 use std::hash::Hash;
4 use std::fmt::Display;
5
6 // Color enum used to determine if a node has been searched
7 #[derive(Debug, Clone, PartialEq)]
8 enum Color {
9 White, // White: not yet explored
10 Gray, // Gray: currently being explored
11 Black, // Black: exploration complete
12 }
Then, we define the node and graph as we have done many times before.
1 // course_topological_sort.rs
2
3 // course vertex definition
4 #[derive(Debug, Clone)]
5 struct Vertex<T> {
6 key: T,
7 color: Color,
8 neighbors: Vec<T>,
9 }
10
270
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
271
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
63 .add_neighbor(des.clone());
64
65 // add edge
66 if !self.edges.contains_key(src) {
67 let _eg = self.edges
68 .insert(src.clone(), Vec::new());
69 }
70
71 self.edges.get_mut(src).unwrap().push(des.clone());
72 }
73 }
With the graph in place, we can start building the course dependency graph. When exploring all
course nodes in the graph, we use the color attribute to indicate whether the node has been visited or not.
The schedule variable stores the result of the course topological sorting. To prevent scheduling errors,
such as circular dependencies, we add a ”has_circle” variable here to control the search process. When
encountering a cycle, the program exits, indicating that the input data is incorrect.
1 // course_topological_sort.rs
2
3 // Build the course dependency graph.
4 fn build_course_graph<T>(pre_requisites:Vec<Vec<T>>)->Graph<T>
5 where T: Eq + PartialEq + Clone + Hash
6 {
7 // Create edge relationships for dependent courses.
8 let mut course_graph = Graph::new();
9 for v in pre_requisites.iter() {
10 let prev = v.first().unwrap();
11 let last = v.last().unwrap();
12 course_graph.add_edge(prev, last);
13 }
14
15 course_graph
16 }
17
18 // Course scheduling.
19 fn course_scheduling<T>(
20 cg: &mut Graph<T>,
21 course: Vertex<T>,
22 schedule: &mut Vec<String>,
23 mut has_circle: bool)
24 where T: Eq + PartialEq + Clone + Hash + Display
25 {
26 // Clone to prevent conflicts with mutable references.
27 let edges = cg.edges.clone();
28
29 // Explore dependent courses.
30 let dependencies = edges.get(&course.key);
31
32 if !dependencies.is_none() {
33 for dep in dependencies.unwrap().iter() {
34 let course = cg.vertices.get(dep)
35 .unwrap().clone();
272
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
36 if Color::White == course.color {
37 cg.vertices.get_mut(dep)
38 .unwrap()
39 .color = Color::Gray;
40
41 course_scheduling(cg, course,
42 schedule, has_circle);
43
44 // Exit when encountering a cycle.
45 if has_circle {
46 return;
47 }
48 } else if Color::Gray == course.color {
49 has_circle = true;
50 return;
51 }
52 }
53 }
54
55 // Change node color and add to schedule.
56 cg.vertices.get_mut(&course.key)
57 .unwrap()
58 .color = Color::Black;
59 schedule.push(course.key.to_string());
60 }
61
62 fn find_topological_order<T>(
63 course_num: usize,
64 pre_requisites: Vec<Vec<T>>)
65 where T: Eq + PartialEq + Clone + Hash + Display
66 {
67 // Build the course dependency graph.
68 let mut cg = build_course_graph(pre_requisites);
69
70 // Get all course nodes into "courses".
71 let vertices = cg.vertices.clone();
72 let mut courses = Vec::new();
73 for key in vertices.keys() {
74 courses.push(key);
75 }
76
77 // Save feasible course schedules.
78 let mut schedule = Vec::new();
79
80 // Determine if there is a cycle.
81 let has_circle = false;
82
83 // Perform topological sorting of courses.
84 for i in 0..course_num {
85 let course = cg.vertices.get(&courses[i])
86 .unwrap()
87 .clone();
273
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
88
89 // Only explore courses that are not explored
90 // and do not have cycles.
91 if !has_circle && Color::White == course.color {
92 // Change the color of the course node to indicate
93 // that it is currently being explored.
94 cg.vertices.get_mut(&courses[i])
95 .unwrap()
96 .color = Color::Gray;
97
98 course_scheduling(&mut cg, course,
99 &mut schedule, has_circle);
100 }
101 }
102
103 if !has_circle {
104 println!("{:#?}", schedule);
105 }
106 }
107
108 fn main() {
109 let course_num = 7;
110
111 // Build course dependency relationships.
112 let mut pre_requisites = Vec::<Vec<&str>>::new();
113 pre_requisites.push(vec!["calculus", "function"]);
114 pre_requisites.push(vec!["calculus", "derivation"]);
115 pre_requisites.push(vec!["linear algebra", "equation"]);
116 pre_requisites.push(vec!["CNN", "calculus"]);
117 pre_requisites.push(vec!["CNN", "probalility thery"]);
118 pre_requisites.push(vec!["CNN", "linear algebra"]);
119
120 // Find the topological sorting result, which is a
121 // reasonable course learning order.
122 find_topological_order(course_num, pre_requisites);
123 }
Here is a feasible course learning order.
[
"function",
"derivation",
"calculus",
"equation",
"linear algebra",
"probability theory",
"CNN",
]
In addition to course selection, cooking can also be abstracted as a topological sort. For example,
consider making pancakes, as shown in the figure below. The recipe is simple: 1 egg, 1 cup of pancake
mix, 1 tablespoon of oil, and 3/4 cup of milk. To make pancakes, you must first turn on the stove and
heat the pan. Then mix all the ingredients together with a spoon and pour the mixture into the pan. When
274
9.7. DEPTH FIRST SEARCH(DFS) CHAPTER 9. GRAPHS
bubbles begin to form, flip the pancake over until both sides are golden brown. Before eating, you can
add sauce.
By using the topological sort algorithm, the above figure can be simplified into a standard cooking
step diagram. In this diagram, steps with the same color can be taken in any order. This step diagram can
help kitchen novices make pancakes successfully. However, making them taste good is another matter
altogether.
1 tablespoon olive oil 1 egg 3/4 cup of milk Mix Heat the sauce
Heat the pan Pour in 1/4 cup Flip the bottom Enjoy
Both cooking and course selection can be abstracted as process arrangements, and therefore, the same
code can theoretically be used to handle them. The implementation of the cooking process topological
sort algorithm is the same as the course planning topological sort algorithm. Therefore, we will not
include most of the code here, but it can be found in the source code files of this book.
1 // cooking_topological_sort.rs
2
3 // Implementation omitted as it is identical to
4 // course_topological_sort
5 fn main() {
6 let operation_num = 9;
275
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
7
8 // constructe a cooking process dependency relationship
9 let mut pre_requisites = Vec::<Vec<&str>>::new();
10 pre_requisites.push(vec!["Mix", "3/4 cup of milk"]);
11 pre_requisites.push(vec!["Mix", "1 egg"]);
12 pre_requisites.push(vec!["Mix", "1 tablespoon olive oil"]);
13 pre_requisites.push(vec!["Pour in 1/4 cup", "Mix"]);
14 pre_requisites.push(vec!["Pour in 1/4 cup",
15 "Heat the pan"]);
16 pre_requisites.push(vec!["Flip the bottom until golden",
17 "Pour in 1/4 cup"]);
18 pre_requisites.push(vec!["Enjoy",
19 "Flip the bottom until golden"]);
20 pre_requisites.push(vec!["Enjoy", "Heat the sauce"]);
21
22 // Find a feasible topological result
23 // which is a reasonable cooking order
24 find_topological_order(operation_num, pre_requisites);
25 }
Here are two feasible cooking sequences.
[
"3/4 cup of milk",
"1 egg",
"1 tablespoon olive oil",
"Mix",
"Heat the pan",
"Pour in 1/4 cup",
"Flip the bottom until golden",
"Heat the sauce",
"Enjoy",
]
[
"Heat the pan",
"3/4 cup of milk",
"1 egg",
"1 tablespoon olive oil",
"Mix",
"Pour in 1/4 cup",
"Flip the bottom until golden",
"Heat the sauce",
"Enjoy",
]
276
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
Stanford University
Tencent Facebook Harvard University
NetEase CNN
Tsinghua University Google
Baidu Yahoo
Peking University Apple
Tesla
Alibaba
OpenAI
Walmart
MSRA MicroSoft
Amazon
One significant feature of these graphs is that certain vertices have an exceptionally high number of
edges. For example, Google is connected to nearly every website globally, while websites like Baidu
and Tencent have numerous links within China. However, many vertices have very few or just one edge,
forming clusters of interconnected nodes. These clusters are known as connected components, and if a
connected component can be reached by any node in it in a finite number of steps, it is called a strongly
connected component. A directed graph is strongly connected if and only if every two vertices in it
are mutually reachable. Strongly connected graphs are similar to nested loops and must have loops. A
directed graph may have multiple strongly connected components, and each of these components is called
a strongly connected component. Figure(9.11) illustrates an example of strongly connected components
in a directed graph.
C B A
C ABEDG
F E D
FHL
L H G
The strongly connected components C ∈ G, where vertices v and w are connected, can be represented
as a point in a simplified graph (shown in the left tricolored areas). To identify strongly connected
components, the Tarjan algorithm, which is based on depth-first search, is commonly used.
Applying the strongly connected component algorithm to a connected graph produces three trees
that correspond to three connected components, as shown in the figure below. The independent area C
277
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
consists of only itself as a node, while F, L, and H form one connected component, and A, E, G, D, and
B form another connected component. The three trees obtained by the strongly connected component
algorithm represent the three connected components. By shifting the problem from the node level to the
connected component level, the algorithm simplifies the problem and facilitates subsequent analysis and
processing.
A C
E F
G B L
D H
278
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
8 enum Color {
9 White, // White: not yet explored
10 Gray, // Gray: currently being explored
11 }
12
13 // Definition of cities
14 #[derive(Debug, Clone)]
15 struct Vertex<T> {
16 key: T,
17 color: Color,
18 neighbors: Vec<T>,
19 }
20 impl<T: PartialEq + Clone> Vertex<T> {
21 fn new(key: T) -> Self {
22 Self {
23 key: key,
24 color: Color::White,
25 neighbors: Vec::new(),
26 }
27 }
28
29 fn add_neighbor(&mut self, nbr: T) {
30 self.neighbors.push(nbr);
31 }
32
33 fn get_neighbors(&self) -> Vec<&T> {
34 let mut neighbors = Vec::new();
35 for nbr in self.neighbors.iter() {
36 neighbors.push(nbr);
37 }
38 neighbors
39 }
40 }
41
42 // Definition of province
43 #[derive(Debug, Clone)]
44 struct Graph<T> {
45 vertnums: u32,
46 edgenums: u32,
47 vertices: HashMap<T, Vertex<T>>,
48 edges: HashMap<T, Vec<T>>,
49 }
50
51 impl<T: Eq + PartialEq + Clone + Hash> Graph<T> {
52 fn new() -> Self {
53 Self {
54 vertnums: 0,
55 edgenums: 0,
56 vertices: HashMap::<T, Vertex<T>>::new(),
57 edges: HashMap::<T, Vec<T>>::new(),
58 }
59 }
279
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
60
61 fn add_vertex(&mut self, key: &T) -> Option<Vertex<T>> {
62 let vertex = Vertex::new(key.clone());
63 self.vertnums += 1;
64 self.vertices.insert(key.clone(), vertex)
65 }
66
67 fn add_edge(&mut self, src: &T, des: &T) {
68 if !self.vertices.contains_key(src) {
69 let _fv = self.add_vertex(src);
70 }
71 if !self.vertices.contains_key(des) {
72 let _tv = self.add_vertex(des);
73 }
74
75 // add a vertex (city)
76 self.edgenums += 1;
77 self.vertices.get_mut(src)
78 .unwrap()
79 .add_neighbor(des.clone());
80 // add an edge
81 if !self.edges.contains_key(src) {
82 let _ = self.edges.insert(src.clone(), Vec::new());
83 }
84 self.edges.get_mut(src).unwrap().push(des.clone());
85 }
86 }
Given the graph and node definitions, we can build a city connection graph. We continually change
the color of each city to gray until a group of cities is fully explored, at which point we find a strongly
connected component, i.e., a province.
1 // find_province_num_bfs.rs
2
3 / Build the city connection graph
4 fn build_city_graph<T>(connected: Vec<Vec<T>>) -> Graph<T>
5 where T: Eq + PartialEq + Clone + Hash {
6 // Set edges between nodes with relationships
7 let mut city_graph = Graph::new();
8 for v in connected.iter() {
9 let src = v.first().unwrap();
10 let des = v.last().unwrap();
11 city_graph.add_edge(src, des);
12 }
13
14 city_graph
15 }
16
17 fn find_province_num_bfs<T>(connected: Vec<Vec<T>>) -> u32
18 where T: Eq + PartialEq + Clone + Hash {
19 let mut cg = build_city_graph(connected);
20
21 // Get keys of all main city nodes
280
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
281
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
10 connected.push(vec!["Chengdu", "Yibin"]);
11 connected.push(vec!["Zigong", "Chengdu"]);
12
13 connected.push(vec!["Guangzhou", "Shenzhen"]);
14 connected.push(vec!["Guangzhou", "Dongguan"]);
15 connected.push(vec!["Guangzhou", "Zhuhai"]);
16 connected.push(vec!["Guangzhou", "Zhongshan"]);
17 connected.push(vec!["Guangzhou", "Shantou"]);
18 connected.push(vec!["Guangzhou", "Foshan"]);
19 connected.push(vec!["Guangzhou", "Zhanjiang"]);
20 connected.push(vec!["Shenzhen", "Guangzhou"]);
21
22 connected.push(vec!["Wuhan", "Jingzhou"]);
23 connected.push(vec!["Wuhan", "Yichang"]);
24 connected.push(vec!["Wuhan", "Xiangyang"]);
25 connected.push(vec!["Wuhan", "Jingmen"]);
26 connected.push(vec!["Wuhan", "Xiaogan"]);
27 connected.push(vec!["Wuhan", "Huanggang"]);
28 connected.push(vec!["Jingzhou", "Wuhan"]);
29
30 // Find all strongly connected components, there are three
31 // provinces: Sichuan, Guangdong, and Hubei
32 let province_num = find_province_num_bfs(connected);
33 println!("province number: {province_num}");
34 // province number: 3
35 }
The time complexity of this algorithm is O(n2 ) because we need to process n city nodes, and each
city node may have a relationship with all remaining cities in the graph (when there is only one province).
Breadth-first search uses a queue that can hold up to all n city nodes, so the space complexity is O(n).
282
9.8. STRONGLY CONNECTED COMPONENTS CHAPTER 9. GRAPHS
283
9.9. SHORTEST PATH PROBLEM CHAPTER 9. GRAPHS
In terms of complexity, it is similar to BFS since it also needs to process n city nodes, and each city
node may have a relationship with all the remaining cities in the graph (when there is only one province).
Therefore, the time complexity is still O(n2 ). Depth-first search uses a stack that can hold up to n city
nodes, so the space complexity is O(n).
Equipment A Equipment C
Router A Router B
Internet
Equipment B Equipment D
The diagram above illustrates the process of internet communication. When a user requests a web-
page from a server using a browser, the request first travels through the local area network and then
through a router to access the internet. The request then propagates through the internet until it reaches
the local area network router where the server is located. The requested webpage is then sent back to
the user’s browser via the same router. The tracepath command can be used to identify the path from a
user’s computer to a specific link. For instance, tracing the website xxx.cn may involve passing through
13 routers, with the first two being gateway routers of the user’s network group.
1?: [LOCALHOST] pmtu 1500
1: _gateway 4.523 ms
1: _gateway 3.495 ms
2: 10.253.0.22 2.981 ms
3: no response
4: ??? 6.166 ms
5: 202.115.254.237 558.609 ms
6: no response
7: no response
8: 101.4.117.54 48.822 ms asymm 16
9: no response
10: 101.4.112.37 48.171 ms asymm 14
11: no response
12: 101.4.114.74 44.981 ms
13: 202.97.15.89 49.560 ms
Each router on the internet is connected to one or more routers. Thus, running tracepath at differ-
ent times of the day may yield different results since the connections between routers have a cost that
varies depending on network traffic. The network connections can be viewed as a weighted graph, with
connections being adjusted based on the network situation.
The goal is to find the path with the minimum total weight to transmit the message. This problem is
similar to the word ladder problem discussed earlier since both involve finding the minimum value, but
the weights in the word ladder problem are all the same.
284
9.9. SHORTEST PATH PROBLEM CHAPTER 9. GRAPHS
14
V4 V5
7 1 10 20
9
V1 V3 2 V7
13 3 9 30
V2 V6
5
In the figure above, we want to find the shortest path from V1 to V7. By visually exploring the
graph, we can see that there are two shortest paths: [V1->V4->V3->V2->V6->V5->V7] and [V1->V4-
>V3->V5->V7], with a shortest distance of 38. If we were to calculate this using Dijkstra’s algorithm,
we would need to track and calculate various distances and then add them up.
To keep track of the total distance from the starting node to each target node, we will use the dist
instance variable in the graph vertex. This variable contains the total weight of the path from the start
to the target node. Dijkstra’s algorithm iterates over each node in the graph, with the iteration order
controlled by a priority queue, and the dist value is used to determine the order of objects in the queue.
When a node is first created, dist is set to 0. Theoretically, it is possible to set dist to infinity, but
in practice, setting it to 0 is feasible, or setting it to a value larger than any real distance, such as the
distance that light travels in one second, which is approximately equal to the distance between the earth
and the moon, because no two points on earth are so far apart.
285
9.9. SHORTEST PATH PROBLEM CHAPTER 9. GRAPHS
10
11 impl<'a> Vertex<'a> {
12 fn new(name: &'a str) -> Vertex<'a> {
13 Vertex {
14 name
15 }
16 }
17 }
18
19 // Vertices visited
20 #[derive(Debug)]
21 struct Visited<V> {
22 vertex: V,
23 distance: usize,
24 }
25
26 // Add total ordering functionality to Visited
27 impl<V> Ord for Visited<V> {
28 fn cmp(&self, other: &Self) -> Ordering {
29 other.distance.cmp(&self.distance)
30 }
31 }
32 impl<V> PartialOrd for Visited<V> {
33 fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
34 Some(self.cmp(other))
35 }
36 }
37
38 impl<V> Eq for Visited<V> {}
39 impl<V> PartialEq for Visited<V> {
40 fn eq(&self, other: &Self) -> bool {
41 self.distance.eq(&other.distance)
42 }
43 }
44
45 // Shortest path algorithm
46 fn dijkstra<'a>(
47 start: Vertex<'a>,
48 adj_list: &HashMap<Vertex<'a>,
49 Vec<(Vertex<'a>, usize)>>) -> HashMap<Vertex<'a>, usize>
50 {
51 let mut distances = HashMap::new();
52 let mut visited = HashSet::new(); // Visited vertices
53 let mut to_visit = BinaryHeap::new(); // Unvisited vertices
54
55 // Set the starting point and
56 // initial distances to all points
57 distances.insert(start, 0);
58 to_visit.push(Visited {
59 vertex: start,
60 distance: 0,
61 });
286
9.9. SHORTEST PATH PROBLEM CHAPTER 9. GRAPHS
62
63 while let Some(Visited { vertex, distance }) =
64 to_visit.pop() {
65 // If visited, continue with the next point
66 if !visited.insert(vertex) { continue; }
67
68 if let Some(nbrs) = adj_list.get(&vertex) {
69 for (nbr, cost) in nbrs {
70 let new_dist = distance + cost;
71 let is_shorter =
72 distances.get(&nbr)
73 .map_or(true,
74 |&curr| new_dist < curr);
75 // If the distance is shorter, insert the new
76 // distance and neighbor
77 if is_shorter {
78 distances.insert(*nbr, new_dist);
79 to_visit.push(Visited {
80 vertex: *nbr,
81 distance: new_dist,
82 });
83 }
84 }
85 }
86 }
87
88 distances
89 }
90
91 fn main() {
92 let v1 = Vertex::new("V1");
93 let v2 = Vertex::new("V2");
94 let v3 = Vertex::new("V3");
95 let v4 = Vertex::new("V4");
96 let v5 = Vertex::new("V5");
97 let v6 = Vertex::new("V6");
98 let v7 = Vertex::new("V7");
99
100 let mut adj_list = HashMap::new();
101 adj_list.insert(v1, vec![(v4, 7), (v2, 13)]);
102 adj_list.insert(v2, vec![(v6, 5)]);
103 adj_list.insert(v3, vec![(v2, 3), (v6, 9), (v5, 10)]);
104 adj_list.insert(v4, vec![(v3, 1), (v5, 14)]);
105 adj_list.insert(v5, vec![(v7, 20)]);
106 adj_list.insert(v6, vec![(v5, 2), (v7, 30)]);
107
108 // Find the shortest path from V1 to any point
109 let distances = dijkstra(v1, &adj_list);
110 for (v, d) in &distances {
111 println!("{}-{}, min distance: {d}", v1.name, v.name);
112 }
113 }
287
9.10. SUMMARY CHAPTER 9. GRAPHS
The following outputs are shortest path between V1 and any point(including itself).
V1-V5, min distance: 18
V1-V2, min distance: 11
V1-V6, min distance: 16
V1-V3, min distance: 8
V1-V7, min distance: 38
V1-V1, min distance: 0
V1-V4, min distance: 7
To solve this problem, other algorithms such as the Distance Vector Routing Protocol [15] and Link
State Routing Protocol [16] are used for network information transmission. These algorithms allow routers
to discover the network maps saved by other routers, which contain information about interconnected
nodes. This approach allows for real-time updates of the network map content and is more efficient,
greatly reducing network capacity requirements.
9.10 Summary
In this chapter, we have explored the abstract data type of graphs and their implementation. Graphs
are widely used in various fields such as course scheduling, networks, transportation, computers, knowl-
edge graphs, and databases. Graphs can be utilized to solve many problems, as long as we can transform
the original problem into a graph representation. Graphs have numerous applications in the following
areas:
• Strongly connected components can be used to simplify graphs.
• Depth-first search can be utilized to explore deep branches of a graph.
• Topological sorting is useful in clarifying complex graph connections.
• Dijkstra’s algorithm can be employed to search for the shortest path in a weighted graph.
• Breadth-first search can be used to search for the shortest path in an unweighted graph.
288
Chapter 10
Practices
10.1 Objectives
• Utilize Rust data structures and algorithms to accomplish diverse practical projects.
• Comprehend and implement data structures and algorithms employed in practical projects.
t r u s t
r r o s t
The Hamming distance is commonly used for error correction in coding. In Hamming codes [17] ,
the algorithm for computing the distance is the Hamming distance. To simplify the code, separate algo-
rithms are implemented for calculating the Hamming distance of numbers and characters. Computing
the Hamming distance of numbers is straightforward as bitwise operations can compare numbers for
similarity and difference. The code for calculating the Hamming distance of numbers is shown below.
289
10.2. EDIT DISTANCE CHAPTER 10. PRACTICES
1 // hamming_distance.rs
2
3 fn hamming_distance1(source: u64, target: u64) -> u32 {
4 let mut count = 0;
5 let mut xor = source ^ target;
6 while xor != 0 {
7 count += xor & 1;
8 xor >>= 1;
9 }
10 count as u32
11 }
12
13 fn main() {
14 let source = 1;
15 let target = 2;
16 let distance = hamming_distance1(source, target);
17 println!("the hamming distance is {distance}");
18 // the hamming distance is 2
19
20 let source = 3;
21 let target = 4;
22 let distance = hamming_distance1(source, target);
23 println!("the hamming distance is {distance}");
24 // the hamming distance is 3
25 }
The XOR operation sets the same bits in ”source” and ”target” numbers to 0 and different bits to 1.
If the result is nonzero, different bits exist. Thus, we calculate the different bits step by step from the
last bit using XOR and AND operations with 1. After computing each bit, we remove it to compare the
previous bits by adding a right shift operation. Note that the implementation requires manually counting
the number of 1’s in the binary system. However, Rust provides a count_ones() function for counting
the number of 1’s in its numbers, simplifying the above code to the following concise form.
1 // hamming_distance.rs
2
3 fn hamming_distance2(source: u64, target: u64) -> u32 {
4 (source ^ target).count_ones()
5 }
With this foundation, we now implement the character version of the Hamming distance.
1 // hamming_distance.rs
2
3 fn hamming_distance_str(source: &str, target: &str) -> u32 {
4 let mut count = 0;
5 let mut source = source.chars();
6 let mut target = target.chars();
7
8 // Compare two strings character by character
9 loop {
10 match (source.next(), target.next()) {
11 // Four situations may arise
12 (Some(cs), Some(ct)) if cs != ct => count += 1,
290
10.2. EDIT DISTANCE CHAPTER 10. PRACTICES
Where i and j are the lengths of the two strings, min(i, j) = 0 means that there is no common
substring.
291
10.2. EDIT DISTANCE CHAPTER 10. PRACTICES
s i t t i n g
0 1 2 3 4 5 6 7
k 1
i 2
t 3
t 4
e 5
n 6
Except for cases where the strings have no common substrings, the edit distance can be increased
through three types of editing operations, each of which increases the edit distance by 1. Thus, we can
calculate the edit distance obtained by each of the three editing operations and then take the minimum
value.
edia,b (i − 1, j) + 1
edia,b (i, j) = min edia,b (i, j − 1) + 1 (10.2)
edia,b (i − 1, j − 1) + 1a̸=b
The formula edia,b (i − 1, j) + 1 means that one character must be deleted to transform string a into
string b, increasing the edit distance by 1. The formula edia,b (i, j − 1) + 1 means that one character
must be inserted to transform string a into string b, also increasing the edit distance by 1. The formula
edia,b (i − 1, j − 1) + 1a̸=b means that one character must be replaced to transform string a into string b,
increasing the edit distance by 1 only if a and b are not equal. These functions are recursively defined,
and their space complexity is O(3m+n−1 ), where m and n are the lengths of the strings.
We have learned that dynamic programing is a good way to solve big problem by split it into several
small ones. Therefore, dynamic programming is used to handle recursion in this case. The algorithm
works by storing the edit distance in a matrix, which represents the state of the problem after various
operations. The most basic case is when an empty string is transformed into a string of varying lengths,
as illustrated in the figure (10.2.2) on next page.
The edit distance between character k and s is calculated next, which can be divided into three cases:
• The accumulated edit distance of 1 deletion above the red cell, plus a deletion operation, results in
an edit distance of 2.
• The accumulated edit distance of 1 insertion to the left of the red cell, plus an insertion operation,
results in an edit distance of 2.
• The accumulated edit distance of 0 substitutions along the diagonal of the red cell, plus a substitu-
tion operation, results in an edit distance of 1.
All the values being processed in this step are from the yellow region in the diagram. Starting from
the top left corner, the edit distance in the red cell is computed by taking the minimum of the three values
in the yellow region and adding 1. This yields the new edit distance.
292
10.2. EDIT DISTANCE CHAPTER 10. PRACTICES
s i t t i n g
0 1 2 3 4 5 6 7
k 1 1
i 2
t 3
t 4
e 5
n 6
Based on the above description and diagram, we can develop the following algorithm to calculate
the edit distance between two strings.
1 // edit_distance.rs
2
3 use std::cmp::min;
4
5 fn edit_distance(source: &str, target: &str) -> usize {
6 // Extreme case: transformation from empty string
7 // to a non-empty string
8 if source.is_empty() {
9 return target.len();
10 } else if target.is_empty() {
11 return source.len();
12 }
13
14 // Establish a matrix to store process values
15 let source_c = source.chars().count();
16 let target_c = target.chars().count();
17 let mut distance = vec![vec![0;target_c+1]; source_c+1];
18 (1..=source_c).for_each(|i| {
19 distance[i][0] = i;
20 });
21 (1..=target_c).for_each(|j| {
22 distance[0][j] = j;
23 })
24
25 // Save the minimum number of steps(insert, delete, modify)
26 for (i, cs) in source.chars().enumerate() {
27 for (j, ct) in target.chars().enumerate() {
293
10.2. EDIT DISTANCE CHAPTER 10. PRACTICES
s i t t i n g
0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7
i 2 2 1 2 3 4 5 6
t 3 3 2 1 2 3 4 5
t 4 4 3 2 1 2 3 4
e 5 5 4 3 2 2 3 4
n 6 6 5 4 3 3 2 3
The entire edit distance state transition matrix can be used to calculate the value in the last row and
last column, which represents the overall edit distance. It is important to note that the matrix is two-
294
10.2. EDIT DISTANCE CHAPTER 10. PRACTICES
dimensional and requires careful use of subscripts. One approach is to place each row of the matrix in an
array, forming a large array with m*n values but smaller dimensions. However, this approach can result
in a lot of intermediate values that waste memory. To optimize the algorithm, we can repeatedly use a
single array to calculate and save values. This approach reduces the matrix to an array with a length of
m + 1 or n + 1. The optimized code for the edit distance algorithm is shown below.
1 // edit_distance.rs
2
3 fn edit_distance2(source: &str, target: &str) -> usize {
4 if source.is_empty() {
5 return target.len();
6 } else if target.is_empty() {
7 return source.len();
8 }
9
10 // The "distances" variable stores the edit distances
11 // to various strings
12 let target_c = target.chars().count();
13 let mut distances = (0..=target_c).collect::<Vec<_>>();
14 for (i, cs) in source.chars().enumerate() {
15 let mut substt = i;
16 distances[0] = substt + 1;
17 // Combinations are continuously calculated
18 // to obtain the distances
19 for (j, ct) in target.chars().enumerate() {
20 let dist = min(
21 min(distances[j],distances[j+1]) + 1,
22 substt + (cs != ct) as usize);
23 substt = distances[j+1];
24 distances[j+1] = dist;
25 }
26 }
27
28 // The last distance value is the answer
29 distances.pop().unwrap()
30 }
31
32 fn main() {
33 let source = "abced";
34 let target = "adcf";
35 let dist = edit_distance2(source, target);
36 println!("distance between {source} and {target}: {dist}");
37 // distance between abced and adcf: 3
38 }
The optimized edit distance algorithm has a worst-case time complexity of O(mn) and a worst-case
space complexity that has been reduced from O(mn) for the matrix to O(min(m, n)) for the array. This
represents a significant improvement.
Microsoft Word software uses a spell check function that does not rely on edit distance. Instead,
it uses a hash table to store commonly used words, allowing it to quickly look up each word as it is
typed. If the word is not found in the hash table, an error is reported. Because hash tables are very fast,
and several hundred thousand words only require a few megabytes of memory, this approach is highly
efficient.
295
10.3. TRIE CHAPTER 10. PRACTICES
10.3 Trie
Trie, also known as a radix tree or prefix tree, is a tree data structure used to search for words or pre-
fixes. It has a wide range of applications, including auto-completion, spell checking, typing prediction,
and more.
While balanced trees and hash tables can also be used to search for words, they cannot quickly find
all words with the same prefix or enumerate all stored words in lexicographical order. Hash tables can
find a word in O(1) time, but as the number of words grows, the hash table may become too large and
increase the time complexity to O(n) due to collisions. In contrast, Trie uses less space to store multiple
words with the same prefix, and its time complexity is only O(m), where m is the length of the word.
The time complexity of searching for a word in a balanced tree is O(mlog(n)).
The structure of Trie is shown in the figure below. Storing words only requires processing 26 letters,
and words with the same prefix share the same prefix, which saves storage space. For example, ”apple”
and ”appeal” share ”app,” and ”boom” and ”box” share ”bo.”
root
a b
p m u o
p t d y o x
l e m
e a
To implement Trie, we first need to abstract the node, similar to the node in the above figure. The
node stores the reference to its child nodes and the state of the current node. The state indicates whether
this node is the end of a word (”end”), which can be used to determine whether the word ends when
searching. In addition, the root node is the entrance to Trie and used to represent the entire Trie.
1 // trie.rs
2
3 // Definition of Trie
4 #[derive(Default, Debug)]
5 struct Trie {
6 root: Node,
7 }
8
9 // Definition of Trie node
10 #[derive(Default)]
11 struct Node {
12 end: bool,
13 children: [Option<Box<Node>>; 26], // List of letter nodes
14 }
15
16 impl Trie {
296
10.3. TRIE CHAPTER 10. PRACTICES
297
10.4. FILTERS CHAPTER 10. PRACTICES
10.4 Filters
In many software projects, determining whether an element exists in a collection is a common task.
For instance, in word processing software, it is essential to verify if an English word is spelled correctly
by checking if it exists in a known dictionary. Similarly, in the FBI, it is necessary to determine if a
suspect’s name appears on the suspect list to issue an FBI Warning. Likewise, in a web crawler, it is
crucial to determine if a URL has been visited. To achieve this, the most straightforward approach is to
store all the elements in the collection in the computer and compare them directly with the new element
encountered. Usually, computer sets use a hash table to store the collection, which is fast and accurate
but consumes more space.
For small data collections, this approach is adequate. However, when the collection is extensive, the
low storage efficiency of hash tables becomes apparent. For example, email providers such as Yahoo or
Gmail need to filter spam regularly. One way to accomplish this is to keep a record of the email addresses
of those who send spam. However, since spammers keep registering new addresses, storing all of them
would require numerous network servers. Storing 100 million email addresses in a hash table requires
about 1.6 GB of memory, which is a considerable amount. Furthermore, hash tables have a load factor
that may not allow the full utilization of space. Additionally, if the data set is stored on a remote server
and input is accepted locally, it may not be feasible to construct a hash table since the dataset is too large
to read into memory at once.
298
10.4. FILTERS CHAPTER 10. PRACTICES
x,y
0 1 0 1 1 0 0 1 0 1 0 1
Initially, all positions in the Bloom Filter are set to 0. When data is inserted, k hash functions are
used to determine the position of the data in the filter and set the corresponding positions to 1. For
instance, when k = 3, three hash values are calculated as indices, and their corresponding positions are
set to 1. When querying, k hash functions are also used to generate k hash values as indices. If all indices
correspond to values that are 1, then the element may exist in the set. In the given figure, x and y are
stored in the Bloom Filter, but the last hash value for z is 0, so it definitely does not exist in the filter. To
experience a Bloom Filter, visit bloomfilter.
To determine the optimal number of elements m to be stored in a Bloom filter of length n with a
tolerable error rate of ϵ, use the formula:
nlnϵ
m=− (10.3)
(ln2)2
To calculate the required number of hash functions k, use the formula:
lnϵ
k=− = −log2 ϵ (10.4)
ln2
For example, if the tolerable error rate ϵ is 8%, then k would be 3, and a larger k value would indicate
a higher error rate. It is possible to simulate k hash functions by combining two basic hash functions and
the number of iterations, without changing the fault-tolerance rate, using the following formula:
gi (x) = h1 (x) + ih2 (x) (10.5)
To implement a Bloom filter, a struct can be used to encapsulate all the necessary information, in-
cluding the set of storage bits and hash functions. Since Bloom filter only determines the existence of
a value and not the value itself, it must be able to accommodate data of any type using generics. The
Boolean values of 1 and 0 can be used to represent the storage bits, and they can be saved to a Vec for
easy implementation. During the judgment process, the Boolean value can indicate whether the value
exists or not.
1 // bloom_filter.rs
2
3 use std::collections::hash_map::DefaultHasher;
4
5 // Definition of bloom filter
6 struct BloomFilter<T> {
7 bits: Vec<bool>, // Bit bucket
8 hash_fn_count: usize, // Number of hash functions
9 hashers: [DefaultHasher; 2], // Two hash functions
10 }
299
10.4. FILTERS CHAPTER 10. PRACTICES
The code above cannot compile since the generic type T is not used by any field, which violates Rust’s
rules. To make it compile, we can use Rust’s phantom data to occupy the space of T. Phantom data can
pretend to use T without actually using it, deceiving the compiler into passing the code. Additionally, to
support data with an undefined size at compile time, we add the ?Sized trait to make the filter work with
such data types. We prefix the unused field with _phantom to indicate that it is not used and occupies
0 bytes. However, it includes T to deceive the compiler. We also simulate k hash functions using two
random hash functions.
1 // bloom_filter.rs
2
3 use std::collections::hash_map::DefaultHasher;
4 use std::marker::PhantomData;
5
6 // Definition of a new bloom filter
7 struct BloomFilter<T: ?Sized> {
8 bits: Vec<bool>,
9 hash_fn_count: usize,
10 hashers: [DefaultHasher; 2],
11 _phantom: PhantomData<T>,
12 // T is a placeholder
13 // for deceiving the compiler
14 }
To implement the filter’s functionality, we need to implement three functions: the initialization func-
tion new, the function to add elements insert, and the function to check contains. We also need auxiliary
functions to implement the first three functions. The new function calculates the size of m based on the
fault tolerance rate and the approximate storage scale, and initializes the filter.
1 // bloom_filter.rs
2
3 use std::hash::{BuildHasher, Hash, Hasher};
4 use std::collections::hash_map::RandomState;
5
6 impl<T: ?Sized + Hash> BloomFilter<T> {
7 fn new(cap: usize, ert: f64) -> Self {
8 let ln22 = std::f64::consts::LN_2.powf(2f64);
9
10 // Calculate the size of the bit bucket and
11 // the number of hash functions
12 let bits_count = -1f64 * cap as f64 * ert.ln() / ln22;
13 let hash_fn_count = -1f64 * ert.log2();
14
15 // Random hash function
16 let hashers = [
17 RandomState::new().build_hasher(),
18 RandomState::new().build_hasher(),
19 ];
20
21 Self {
22 bits: vec![false; bits_count.ceil() as usize],
23 hash_fn_count: hash_fn_count.ceil() as usize,
24 hashers: hashers,
25 _phantom: PhantomData,
300
10.4. FILTERS CHAPTER 10. PRACTICES
26 }
27 }
28
29 // Set the corresponding bit of the bit bucket to true
30 // according to hash_fn_count
31 fn insert(&mut self, elem: &T) {
32 let hashes = self.make_hash(elem);
33 for fn_i in 0..self.hash_fn_count {
34 let index = self.get_index(hashes, fn_i as u64);
35 self.bits[index] = true;
36 }
37 }
38
39 // Data query
40 fn contains(&self, elem: &T) -> bool {
41 let hashes = self.make_hash(elem);
42 (0..self.hash_fn_count).all(|fn_i| {
43 let index = self.get_index(hashes, fn_i as u64);
44 self.bits[index]
45 })
46 }
47
48 // Calculate hash value
49 fn make_hash(&self, elem: &T) -> (u64, u64) {
50 let hasher1 = &mut self.hashers[0].clone();
51 let hasher2 = &mut self.hashers[1].clone();
52 elem.hash(hasher1);
53 elem.hash(hasher2);
54
55 (hasher1.finish(), hasher2.finish())
56 }
57
58 // Get the index of a certain bit in the bit bucket
59 fn get_index(&self, (h1,h2):(u64,u64),fn_i:u64) -> usize {
60 let ih2 = fn_i.wrapping_mul(h2);
61 let h1pih2 = h1.wrapping_add(ih2);
62 ( h1pih2 % self.bits.len() as u64) as usize
63 }
64 }
1 fn main() {
2 let mut bf = BloomFilter::new(100, 0.08);
3 (0..20).for_each(|i| bf.insert(&i));
4 let res1 = bf.contains(&2);
5 let res2 = bf.contains(&200);
6 println!("2 in bf: {res1}, 200 in bf: {res2}");
7 // 2 in bf: true, 200 in bf: false
8 }
By analyzing the Bloom filter, we can conclude that its space complexity is O(m), while the time
complexity of insert and contains is O(k). However, k is typically small, so we can consider the time
complexity as O(1).
301
10.4. FILTERS CHAPTER 10. PRACTICES
0 0 0
1 1 c 1
2 b 2 b 2
h1 (x) relocate h1 (x)
3 3 3
4 c 4 a 4
x x
5 relocate 5 5
h2 (x) 6 a 6 x h2 (x) 6
7 7 7
When inserting an element into the Cuckoo filter, if two buckets are already occupied, the filter
randomly chooses one of the elements to be kicked out to an alternate position, and the previously kicked-
out element takes the newly emptied spot. This filter supports insertion, deletion, and lookup operations.
In the standard Cuckoo hash table shown in the left diagram, inserting a new item into the existing
hash table requires accessing the original item to determine its position and make room for the new item.
However, since the Cuckoo filter only stores fingerprints, there is no way to rehash the original item to
find its alternate position. To overcome this limitation, a technique called partial-key Cuckoo hashing
can be used to obtain an item’s alternate position based on its fingerprint. For item x, the two candidate
bucket indices are calculated using the same hashing function as follows:
h1 (x) = hash(x)
(10.6)
h2 (x) = h1 (x) ⊕ hash(f ingureprint(x))
The XOR operation ⊕ ensures that h1 (x) and h2 (x) can be calculated using the same formula, so it
is not necessary to know the value of x to determine the position of the alternate bucket using Equation
(10.7).
j = i ⊕ hash(f ingureprint(x)) (10.7)
The lookup method is straightforward: first use Equation (10.6) to calculate the fingerprint of the
item to be searched and the positions of the two alternate buckets, then read the values in those buckets. If
any of the buckets contains a value that matches the fingerprint of the item being searched, then it exists
302
10.4. FILTERS CHAPTER 10. PRACTICES
in the filter. The deletion method is similarly simple: check the values in the two alternate buckets, and
if there is a match, delete the copy of the fingerprint in that bucket. It is important to ensure that the item
is inserted before attempting to delete it, otherwise it may accidentally delete other values with the same
fingerprint.
After extensive experimentation and testing, a bucket size of 4 was found to provide excellent per-
formance, even the best performance. The Cuckoo filter boasts four main advantages over Bloom filters:
(1) Supports dynamic addition and deletion of items.
(2) Higher lookup performance than Bloom filters, even when approaching full capacity.
(3) Easier to implement than other Bloom filter alternatives, such as quotient filters.
(4) In practical applications, it uses less space than Bloom filters if the false positive rate ϵ is less
than 3%.
In addition to Bloom filters and Cuckoo filters, there are many other filters with various characteris-
tics that interested readers can search for and refer to.
To implement the Cuckoo Filter, we need to extend the Bloom Filter to two dimensions. This re-
quires us to create two new data structures: FingerPrint to store the fingerprints, and Bucket to store
the FingerPrints. Since the implementation involves random access and hashing operations, we utilize
libraries such as Rng and Serde in the code. We implement the Cuckoo Filter as a Rust library, where
bucket.rs includes the definitions and operations of FingerPrint and Bucket, and util.rs includes the struct
FaI for calculating fingerprints and bucket indexes. The code structure is as follows.
shieber@Kew:cuckoofilter/ tree
/Cargo.toml
/src
|- bucket.rs
|- lib.rs
|- util.rs
The Cuckoo Filter code is quite lengthy, so we will only list the lib.rs file here. For the remaining
code, please refer to the source code provided with the book.
1 // lib.rs
2 mod bucket;
3 mod util;
4
5 use std::fmt;
6 use std::cmp::max;
7 use std::iter::repeat;
8 use std::error::Error;
9 use std::hash::{Hash, Hasher};
10 use std::marker::PhantomData;
11 use std::collections::hash_map::DefaultHasher;
12
303
10.4. FILTERS CHAPTER 10. PRACTICES
13 // Support serialization
14 use rand::Rng;
15 #[cfg(feature = "serde_support")]
16 use serde_derive::{Serialize, Deserialize};
17
18 use crate::util::FaI;
19 use crate::bucket::{Bucket, FingerPrint,
20 BUCKET_SIZE, FIGERPRINT_SIZE};
21
22 const MAX_RELOCATION: u32 = 100;
23 const DEFAULT_CAPACITY: usize = (1 << 20) - 1;
24
25 // Error handling
26 #[derive(Debug)]
27 enum CuckooError {
28 NotEnoughSpace,
29 }
30
31 // add print function
32 impl fmt::Display for CuckooError {
33 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
34 f.write_str("NotEnoughSpace")
35 }
36 }
37
38 impl Error for CuckooError {
39 fn description(&self) -> &str {
40 "Not enough space to save element, operation failed!"
41 }
42 }
43
44 // Definition of cuckoo filter
45 struct CuckooFilter<H> {
46 buckets: Box<[Bucket]>, // Bucket
47 len: usize, // length
48 _phantom: PhantomData<H>,
49 }
50
51 // add default value
52 impl Default for CuckooFilter<DefaultHasher> {
53 fn default() -> Self {
54 Self::new()
55 }
56 }
57
58 impl CuckooFilter<DefaultHasher> {
59 fn new() -> Self {
60 Self::with_capacity(DEFAULT_CAPACITY)
61 }
62 }
63
64 impl<H: Hasher + Default> CuckooFilter<H> {
304
10.4. FILTERS CHAPTER 10. PRACTICES
305
10.5. LEAST RECENTLY USED(LRU) ALGORITHM CHAPTER 10. PRACTICES
117
118 // add fingerprint
119 fn put(&mut self, fp: FingerPrint, i: usize) -> bool {
120 if self.buckets[i % self.len].insert(fp) {
121 self.len += 1;
122 true
123 } else {
124 false
125 }
126 }
127
128 fn remove(&mut self, fp: FingerPrint, i: usize) -> bool {
129 if self.buckets[i % self.len].delete(fp) {
130 self.len -= 1;
131 true
132 } else {
133 false
134 }
135 }
136
137 fn contains<T: ?Sized + Hash>(&self, elem: &T) -> bool {
138 let FaI { fp, i1, i2 } = FaI::from_data::<_, H>(elem);
139 self.buckets[i1 % self.len]
140 .get_fp_index(fp)
141 .or_else(|| {
142 self.buckets[i2 % self.len]
143 .get_fp_index(fp)
144 })
145 .is_some()
146 }
147 }
As we can see from the code, the Cuckoo Filter supports insertion, deletion, and querying operations.
When a client queries user 5, the data is not in the hash-linked list, so it needs to be read from the
database and inserted into the cache. At this point, the rightmost node of the linked list represents the
306
10.5. LEAST RECENTLY USED(LRU) ALGORITHM CHAPTER 10. PRACTICES
most recently accessed user 5, and the leftmost represents the least recently accessed user 1.
Next, when another client accesses user 2, the data is found in the hash-linked list, so user 2 is simply
removed from its current position and reinserted into the rightmost of the linked list.
002
Info b
After these accesses, the hash-linked list looks like the figure shown.
Later, a client accesses user 6, but the data is not in the cache, so it also needs to be inserted.
However, since the cache has reached its limit, the least recently accessed data, user 1, needs to be
deleted first. Then, user 6 can be inserted at the rightmost end of the hash-linked list.
The LRU algorithm is a common cache eviction algorithm that frees up memory by removing the
least recently used data. To implement this algorithm, we need to define the data structure and operations
abstracted from the graph. This includes managing keys, entries, and front and back pointers. The
necessary operation functions include insert, remove, contains, and auxiliary functions.
1 // lru.rs
2 use std::collections::HashMap;
3
4 // Definition for entry of LRU
5 struct Entry<K, V> {
6 key: K,
7 val: Option<V>,
307
10.5. LEAST RECENTLY USED(LRU) ALGORITHM CHAPTER 10. PRACTICES
8 next: Option<usize>,
9 prev: Option<usize>,
10 }
11
12 struct LRUCache<K, V> {
13 cap: usize,
14 head: Option<usize>,
15 tail: Option<usize>,
16 map: HashMap<K, usize>,
17 entries: Vec<Entry<K, V>>,
18 }
To store the keys, we will use a HashMap, and a Vec to store the entries. The head and tail pointers can
be simplified to the index of the Vec. To customize the cache capacity, we can implement a with_capacity
function, with new setting the capacity to 100 by default.
1 // lru.rs
2 use std::hash::Hash;
3
4 const CACHE_SIZE: usize = 100;
5
6 impl<K: Clone + Hash + Eq, V> LRUCache<K, V> {
7 fn new() -> Self {
8 Self::with_capacity(CACHE_SIZE)
9 }
10
11 fn len(&self) -> usize {
12 self.map.len()
13 }
14
15 fn is_empty(&self) -> bool {
16 self.map.is_empty()
17 }
18
19 fn is_full(&self) -> bool {
20 self.map.len() == self.cap
21 }
22
23 fn with_capacity(cap: usize) -> Self {
24 LRUCache {
25 cap: cap,
26 head: None,
27 tail: None,
28 map: HashMap::with_capacity(cap),
29 entries: Vec::with_capacity(cap),
30 }
31 }
32 }
When inserting data, if the key already exists, the value will be updated directly and the original
value will be returned. If the inserted value does not exist, the returned value should be None, hence the
use of Option as the return type. The access method is used to remove the original value and update the
value, and ensure_room is used to remove the least recently used data when the cache reaches capacity.
308
10.5. LEAST RECENTLY USED(LRU) ALGORITHM CHAPTER 10. PRACTICES
1 // lru.rs
2
3 impl<K: Clone + Hash + Eq, V> LRUCache<K, V> {
4 fn insert(&mut self, key: K, val: V) -> Option<V> {
5 if self.map.contains_key(&key) {
6 // Update if key exists
7 self.access(&key);
8 let entry = &mut self.entries[self.head.unwrap()];
9 let old_val = entry.val.take();
10 entry.val = Some(val);
11 old_val
12 } else {
13 // insert if key not exists
14 self.ensure_room();
15
16 // Update the original head pointer
17 let index = self.entries.len();
18 self.head.map(|e| {
19 self.entries[e].prev = Some(index);
20 });
21
22 // The new head node
23 self.entries.push(Entry {
24 key: key.clone(),
25 val: Some(val),
26 prev: None,
27 next: self.head,
28 });
29
30 self.head = Some(index);
31 self.tail = self.tail.or(self.head);
32 self.map.insert(key, index);
33
34 None
35 }
36 }
37
38 fn get(&mut self, key: &K) -> Option<&V> {
39 if self.contains(key) { self.access(key); }
40
41 let entries = &self.entries;
42 self.map.get(key).and_then(move |&i| {
43 entries[i].val.as_ref()
44 })
45 }
46
47 fn get_mut(&mut self, key: &K) -> Option<&mut V> {
48 if self.contains(key) { self.access(key); }
49
50 let entries = &mut self.entries;
51 self.map.get(key).and_then(move |&i| {
309
10.5. LEAST RECENTLY USED(LRU) ALGORITHM CHAPTER 10. PRACTICES
52 entries[i].val.as_mut()
53 })
54 }
55
56 fn contains(&mut self, key: &K) -> bool {
57 self.map.contains_key(key)
58 }
59
60 // Ensure there is enough capacity, remove the
61 // least recently used item if full
62 fn ensure_room(&mut self) {
63 if self.cap == self.len() {
64 self.remove_tail();
65 }
66 }
67
68 fn remove_tail(&mut self) {
69 if let Some(index) = self.tail {
70 self.remove_from_list(index);
71 let key = &self.entries[index].key;
72 self.map.remove(key);
73 }
74 if self.tail.is_none() {
75 self.head = None;
76 }
77 }
78
79 // Get the value of a key, remove the
80 // old position and add to the head
81 fn access(&mut self, key: &K) {
82 let i = *self.map.get(key).unwrap();
83 self.remove_from_list(i);
84 self.head = Some(i);
85 }
86
87 fn remove(&mut self, key: &K) -> Option<V> {
88 self.map.remove(&key).map(|index| {
89 self.remove_from_list(index);
90 self.entries[index].val.take().unwrap()
91 })
92 }
93
94 fn remove_from_list(&mut self, i: usize) {
95 let (prev, next) = {
96 let entry = self.entries.get_mut(i).unwrap();
97 (entry.prev, entry.next)
98 };
99
100 match (prev, next) {
101 // The data item is in the middle of the cache
102 (Some(j), Some(k)) => {
103 let head = &mut self.entries[j];
310
10.6. CONSISTENT HASHING CHAPTER 10. PRACTICES
311
10.6. CONSISTENT HASHING CHAPTER 10. PRACTICES
However, when data volumes grow, and access rates increase, a cluster must be created to distribute data
across multiple machines. For example, if there are 5 machines, the position index of an image would
be index = hash(key) % 5, where key is an index related to the image. However, adding or removing
machines changes N, and previously calculated indices will become invalid. This is where consistent
hashing comes in. It creates a circle that ranges from 0 to 232 − 1, mapping data to a position on the
circle.
232 − 1 0
Node3 Node1
Node2
To achieve caching, data must be added to a specific segment of the circle based on its hash value.
The corresponding node located just clockwise from the segment will store the data.
232 − 1 0
Node3 Node1
Node2
Suppose Node3 fails, causing data between Node2 and Node3 to be transferred to Node1.
232 − 1 0
Node1
Node2
When a new machine, Node4, is added, the data originally stored on Node3 will now be stored on
Node4.
312
10.6. CONSISTENT HASHING CHAPTER 10. PRACTICES
232 − 1 0
Node3 Node1
Node4
Node2
The Consistent Hashing algorithm is fault-tolerant and scalable since it only relocates a small portion
of the data when nodes are added or removed. This feature enables the algorithm to achieve consistency.
To implement Consistent Hashing, a Ring is needed to store nodes representing machines.
1 // conshash.rs
2
3 use std::fmt::Debug;
4 use std::string::ToString;
5 use std::hash::{Hash, Hasher};
6 use std::collections::{BTreeMap, hash_map::DefaultHasher};
7
8 // Ring node definition, used for storing the host, ip and port
9 #[derive(Clone, Debug)]
10 struct Node {
11 host: &'static str,
12 ip: &'static str,
13 port: u16,
14 }
15
16 // implement a to_string function
17 impl ToString for Node {
18 fn to_string(&self) -> String {
19 self.ip.to_string() + &self.port.to_string()
20 }
21 }
22
23 // Definition for the Ring
24 struct Ring<T: Clone + ToString + Debug> {
25 replicas: usize, // partition number
26 ring: BTreeMap<u64, T>, // store data
27 }
Replicas in the Ring are utilized to avoid node clustering and ensure even data distribution across
nodes. Multiple virtual nodes can be created for each node to solve the node clustering problem.
The default hash calculator provided by the standard library, with a default of 10 nodes, can be used
for hash calculation, or custom nodes can be created. For the Consistent Hashing algorithm, at least
node insertion, node deletion, and query functions must be supported. Additionally, batch versions of
insertion and deletion can be implemented for batch processing.
313
10.6. CONSISTENT HASHING CHAPTER 10. PRACTICES
1 // conshash.rs
2
3 const DEFAULT_REPLICAS: usize = 10;
4
5 // Hash calculation function
6 fn hash<T: Hash>(val: &T) -> u64 {
7 let mut hasher = DefaultHasher::new();
8 val.hash(&mut hasher);
9
10 hasher.finish()
11 }
12
13 impl<T> Ring<T> where T: Clone + ToString + Debug {
14 fn new() -> Self {
15 Self::with_capacity(DEFAULT_REPLICAS)
16 }
17
18 // new with a replicas parameter
19 fn with_capacity(replicas: usize) -> Self {
20 Ring {
21 replicas: replicas,
22 ring: BTreeMap::new()
23 }
24 }
25
26 // Batch insertion of nodes
27 fn add_multi(&mut self, nodes: &[T]) {
28 if !nodes.is_empty() {
29 for node in nodes.iter() {
30 self.add(node);
31 }
32 }
33 }
34
35 fn add(&mut self, node: &T) {
36 for i in 0..self.replicas {
37 let key = hash(&(node.to_string()
38 + &i.to_string()));
39 self.ring.insert(key, node.clone());
40 }
41 }
42
43 // Batch deletion of nodes
44 fn remove_multi(&mut self, nodes: &[T]) {
45 if !nodes.is_empty() {
46 for node in nodes.iter() {
47 self.remove(node);
48 }
49 }
50 }
51
314
10.6. CONSISTENT HASHING CHAPTER 10. PRACTICES
315
10.7. BASE58 ENCODE AND DECODE CHAPTER 10. PRACTICES
To avoid ambiguous characters, Base58 removes 0 (zero), O (capital O), I (capital i), l (lowercase
L), plus sign (+), and slash (/) from the Base64 table. The remaining 58 characters are used as encoding
characters. These characters are easily distinguishable and also prevent problems with line breaks when
copying the code.
Base58 encoding is essentially a conversion from a large number base. First, the character is con-
verted to ASCII and then to decimal, followed by 58 base conversion. Finally, the corresponding char-
acter is selected from the encoding table to form the Base58 encoding string. However, due to the need
for base conversion, the efficiency of Base58 encoding is relatively low. The encoding algorithm (10.1)
illustrates the encoding principle.
Decoding is the reverse process of encoding and involves converting from a large number base. The
characters in the Base58 string are first converted to ASCII values, then to 10 decimal values, followed
by conversion to 256 decimal values, and finally to ASCII characters. The decoding process is described
in detail in Algorithm (10.2).
316
10.7. BASE58 ENCODE AND DECODE CHAPTER 10. PRACTICES
Encoding and decoding are essentially two-way string transformations between different spaces,
similar to a mapping in the encoding space. To implement a Base58 encoder and decoder, we first prepare
the encoding characters ALPHABET and the encoding conversion table DIGITS_MAP. Additionally, we
define the maximum conversion base 58 and substitute 1 for the leading 0 as constants. Using constants
for operations is preferable to using magic numbers directly, as it improves code readability.
1 // base58.rs
2
3 // Conversion base 58
4 const BIG_RADIX: u32 = 58;
5
6 // Substitute 1 for the leading 0
7 const ALPHABET_INDEX_0: char = '1';
8
9 // Base58 encoding characters
10 const ALPHABET: &[u8;58] = b"123456789ABCDEFGHJKLMNPQRSTUVWXYZ
11 abcdefghijkmnopqrstuvwxyz";
12
13 // Mapping relationship between the bases
14 const DIGITS_MAP: &'static [u8] = &[
15 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
16 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
17 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
18 255, 0, 1, 2, 3, 4, 5, 6, 7,
8,255,255,255,255,255,255,
19 255, 9, 10, 11, 12, 13, 14, 15, 16,255, 17, 18, 19, 20,
21,255,
20 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32,255,255,255,255,255,
21 255, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,255, 44, 45,
46,
22 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57,255,255,255,255,255,
23 ];
317
10.7. BASE58 ENCODE AND DECODE CHAPTER 10. PRACTICES
To handle potential errors during encoding and decoding, we have implemented a custom error type
for Base58 encoding to manage illegal characters, length errors, and other cases. Two traits, Encoder
and Decoder, have been created for encoding and decoding, respectively. Encoder and Decoder contain
the methods encode_to_base58 and decode_from_base58, respectively.
1 // base58.rs
2
3 // Decoding error type
4 #[derive(Debug, PartialEq)]
5 pub enum DecodeError {
6 Invalid,
7 InvalidLength,
8 InvalidCharacter(char, usize),
9 }
10
11 // Encoding and decoding traits
12 pub trait Encoder {
13 // Encoding method
14 fn encode_to_base58(&self) -> String;
15 }
16
17 pub trait Decoder {
18 // Decoding method
19 fn decode_from_base58(&self) ->Result<String,DecodeError>;
20 }
In the next step, we will implement the methods that correspond to the two traits mentioned earlier.
Although these traits are implemented for strings, it is better to use u8 for internal calculations since the
characters in the string may contain multiple u8 values.
1 // base58.rs
2
3 // Implement Base58 encoding
4 impl Encoder for str {
5 fn encode_to_base58(&self) -> String {
6 // Convert to bytes for processing
7 let str_u8 = self.as_bytes();
8 // Count the number of leading zeros
9 let zero_count = str_u8.iter()
10 .take_while(|&&x| x == 0)
11 .count();
12 // Space required after conversion is log(256)/log(58),
13 // which is approximately 1.38 times the original data.
14 // We don't need leading zeros
15 let size = (str_u8.len() - zero_count) * 138 / 100 + 1;
16 // Convert characters from one base to another
17 let mut i = zero_count;
18 let mut high = size - 1;
19 let mut buffer = vec![0u8; size];
20 while i < str_u8.len() {
21 // j is the decreasing index, corresponding to
22 // counting from the back
23 let mut j = size - 1;
318
10.7. BASE58 ENCODE AND DECODE CHAPTER 10. PRACTICES
24
25 // carry is the character read from the front
26 let mut carry = str_u8[i] as u32;
27
28 // Store the converted data from back to front
29 // in turn
30 while j > high || carry != 0 {
31 carry += 256 * buffer[j] as u32;
32 buffer[j] = (carry % BIG_RADIX) as u8;
33 carry /= BIG_RADIX;
34
35 if j > 0 {
36 j -= 1;
37 }
38 }
39 i += 1;
40 high = j;
41 }
42
43 // Handle multiple leading zeros
44 let mut b58_str = String::new();
45 for _ in 0..zero_count {
46 b58_str.push(ALPHABET_INDEX_0);
47 }
48
49 // Get the encoded characters and concatenate
50 // them into a string
51 let mut j = buffer.iter()
52 .take_while(|&&x| x == 0)
53 .count();
54 while j < size {
55 b58_str.push(ALPHABET[buffer[j] as usize] as char);
56 j += 1;
57 }
58
59 // Return the encoded string
60 b58_str
61 }
62 }
Decoding is the process of converting Base58 encoded data back to its original form, which is es-
sentially a conversion between numeral systems. The specific implementation of the decoding method
is as follows:
1 // base58.rs
2
3 // Implement Base58 decoding
4 impl Decoder for str {
5 fn decode_from_base58(&self) -> Result<String,DecodeError>
6 {
7 // Store conversion characters
8 let mut bin = [0u8; 132];
9 let mut out = [0u32; (132 + 3) / 4];
319
10.7. BASE58 ENCODE AND DECODE CHAPTER 10. PRACTICES
10
11 // Number of remaining bits after processing data
12 // in units of 4
13 let bytes_left = (bin.len() % 4) as u8;
14 let zero_mask = match bytes_left {
15 0 => 0u32,
16 _ => 0xffffffff << (bytes_left * 8),
17 };
18
19 // Count leading zeros
20 let zero_count = self.chars()
21 .take_while(|&x| x == ALPHABET_INDEX_0)
22 .count();
23
24 let mut i = zero_count;
25 let b58: Vec<u8> = self.bytes().collect();
26 while i < self.len() {
27 // Invalid characters
28 if (b58[i] & 0x80) != 0 {
29 return Err(DecodeError::InvalidCharacter(
30 b58[i] as char, i));
31 }
32 if DIGITS_MAP[b58[i] as usize] == 255 {
33 return Err(DecodeError::InvalidCharacter(
34 b58[i] as char, i));
35 }
36
37 // Number system conversion
38 let mut j = out.len();
39 let mut c = DIGITS_MAP[b58[i] as usize] as u64;
40 while j != 0 {
41 j -= 1;
42 let t = out[j] as u64 * (BIG_RADIX as u64) + c;
43 c = (t & 0x3f00000000) >> 32;
44 out[j] = (t & 0xffffffff) as u32;
45 }
46
47 // Data is too long
48 if c != 0 {
49 return Err(DecodeError::InvalidLength);
50 }
51
52 if (out[0] & zero_mask) != 0 {
53 return Err(DecodeError::InvalidLength);
54 }
55
56 i += 1;
57 }
58
59 // Handle remaining bits
60 let mut i = 1;
61 let mut j = 0;
320
10.7. BASE58 ENCODE AND DECODE CHAPTER 10. PRACTICES
321
10.8. BLOCKCHAIN CHAPTER 10. PRACTICES
The following are the results after Base58 encoding and decoding.
"ZiCa"
"abc"
"K8xdoM2VJtK"
Ok(
"loverust",
)
"Jdjxs3pmuK2"
Ok(
"iloveyou",
)
With the completion of the entire Base58 algorithm, similar methods can be used to implement en-
coding and decoding algorithms for other encoding schemes of interest, such as Base32, Base36, Base62,
Base64, Base85, Base92, and so on. In Chapter 1, we used Base64 to create a password generator, but it
can also be replaced with Base58. I have already made the replacement and included the specific code
in the corresponding repository for this chapter.
10.8 Blockchain
Blockchain is a digital technology that has gained significant attention in recent years. This attention
has been due, in part, to the rise in the price of Bitcoin, which is closely tied to blockchain technology,
as well as other related concepts such as Ethereum, virtual currencies, and the digital economy. The
endorsement of virtual currencies by influential figures such as Tesla CEO Elon Musk has further fueled
the development of the field. Additionally, governments around the world have been formulating poli-
cies related to blockchain technology, adding to the momentum of the industry. Ultimately, blockchain
technology is a tool closely tied to economic development and the trade of commodities.
Online transactions rely heavily on financial institutions as third-party intermediaries to process pay-
ment information. However, this credit-based model has inherent weaknesses that limit the feasibility of
small daily payment transactions, increase the cost of transactions, and prevent completely irreversible
transactions due to the involvement of financial intermediaries to coordinate disputes. This limits the
potential of internet trade as both parties in a transaction must trust each other, and merchants must guard
against potential fraud. In contrast, physical cash transactions do not require third-party intermediaries
and thus do not face such limitations.
To address these issues, we need an electronic payment system based on cryptographic principles that
enables two parties to make payments directly without intermediaries. This eliminates the possibility of
rolling back payment transactions, protecting sellers from fraud. We propose a method for generating
electronic transaction proofs using a peer-to-peer distributed timestamp server that records transactions
in chronological order. The system is secure as long as the total computing power controlled by honest
nodes exceeds that of cooperating attackers.
The passage above is the introduction to the Bitcoin whitepaper(Bitcoin: A Peer-to-Peer Electronic
Cash System [19] ), written by Satoshi Nakamoto, the inventor of Bitcoin. It outlines the reason for the
invention of Bitcoin, which was to address the practical problem of the global financial crisis of 2008,
inflation, and its impact on various countries. Nakamoto was dissatisfied with the existing financial
environment and used their professional knowledge to invent the blockchain technology to address this
problem.
322
10.8. BLOCKCHAIN CHAPTER 10. PRACTICES
uses a chain data structure to store and verify data, distributed node consensus algorithms to generate
and update data, cryptography to ensure data security, and smart contracts to program and operate data.
In contrast, Bitcoin is a type of digital currency that was invented using blockchain technology.
To put it simply, blockchain is a decentralized distributed ledger, where data is not only stored on each
node but replicated and shared across the entire network. Blockchain is a ledger that records transactions
and consumes a lot of resources to do so. As transactions are recorded, rewards and transaction fees are
generated in the form of digital currency, which is used to maintain the system’s operation. Bitcoin is
the first digital currency created using blockchain technology and serves as a guarantee of transactions
and a type of incentive.
In summary, blockchain is a distributed transaction medium, and Bitcoin is a type of digital currency
that serves as a guarantee of transactions and an incentive. Blockchain itself consists of components such
as blocks, blockchains, transactions, accounts, miners, transaction fees, and rewards. To implement a
blockchain system, one must start by implementing these basic components.
Hash values play a vital role in the blockchain, as evident from the structure discussed above. Thus,
the initial step is to compute the hash value, and serialization of the block structure is a more efficient
approach.
The basic blockchain’s first function is to perform serialization and hash value computation, demon-
strated in the following code. To handle blocks with varying transaction numbers, we employ the ”Sized”
trait. The serialization process utilizes the ”bincode” library, while the ”crypto’s Sha3” library computes
the hash. For better readability, we convert all hashes to strings. The serialized data is of type ”&[u8]”,
and ”hash_str” retrieves the data and returns it as a string.
1 // serializer.rs
2
3 use bincode;
4 use serde::Serialize;
5 use crypto::digest::Digest;
6 use crypto::sha3::Sha3;
7
8 // Data serialization
9 pub fn serialize<T: ?Sized>(value: &T) -> Vec<u8>
10 where T: Serialize
11 {
12 bincode::serialize(value).unwrap()
13 }
14
15 // calculate the hash value and return it as a string
16 pub fn hash_str(value: &[u8]) -> String {
323
10.8. BLOCKCHAIN CHAPTER 10. PRACTICES
324
10.8. BLOCKCHAIN CHAPTER 10. PRACTICES
24
25 impl Block {
26 pub fn new(txs: String, pre_hash: String) -> Self {
27 // Introduce a 3-second delay for mining
28 println!("Start mining .... ");
29 thread::sleep(Duration::from_secs(3));
30
31 // Prepare timestamp and calculate transaction hash
32 let time = Utc::now().timestamp();
33 let txs_ser = serialize(&txs);
34 let txs_hash = hash_str(&txs_ser);
35
36 let mut block = Block {
37 header: BlockHeader {
38 time: time,
39 txs_hash: txs_hash,
40 pre_hash: pre_hash,
41 },
42 tranxs: txs,
43 hash: "".to_string(),
44 };
45
46 block.set_hash();
47 println!("Produce a new block!\n");
48
49 block
50 }
51
52 // Calculate and set block hash value
53 fn set_hash(&mut self) {
54 let header = serialize(&(self.header));
55 self.hash = hash_str(&header);
56 }
57 }
To form a blockchain, we must link the created blocks together, which can be accomplished using a
Vec to store multiple blocks. The system should enable the generation of the first block (genesis block)
and adding new blocks to the chain. Since the genesis block does not have a pre_hash, it needs to be
set manually. We will utilize the base64 value of ”Bitcoin hit $60000” as the pre_hash for the genesis
block. Lastly, we will require a function to create a new block and update its hash value.
1 // blockchain.rs
2 use crate::block::Block;
3
4 // Genesis block pre_hash
5 const PRE_HASH: &str = "22caaf24ef0aea3522c13d133912d2b7
6 22caaf24ef0aea3522c13d133912d2b7";
7 pub struct BlockChain {
8 pub blocks: Vec<Block>,
9 }
10
11 impl BlockChain {
12 pub fn new() -> Self {
325
10.8. BLOCKCHAIN CHAPTER 10. PRACTICES
326
10.9. SUMMARY CHAPTER 10. PRACTICES
10.9 Summary
In this chapter, we covered a plethora of useful data structures. We began by exploring the implemen-
tation of trie, Bloom filter, and cuckoo filter. Next, we delved into Hamming and edit distance, followed
by an understanding of LRU cache eviction algorithm and consistent hashing algorithm. Lastly, we
learned about the principles of blockchain and implemented a rudimentary blockchain.
Throughout this book, we have covered a wide range of data structures and implemented a consid-
erable amount of Rust code. Although some of the codes may not be optimal, they can still be useful
references for readers. Any feedback from readers via Github to further improve the content is greatly
appreciated.
Finally, I hope that this book proves helpful to readers and serves as a valuable resource to explore
rust and its applications in data structures.
327
Bibliography
328
BIBLIOGRAPHY BIBLIOGRAPHY
[18] Bin Fan and David G Andarsen. Cuckoo filter: Practically better than bloom. Website, 2014.
https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf.
[19] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Website, 2008. https:
//bitcoin.org/bitcoin.pdf.
329